Articles by Wenchen Fan - Databricks Blog

Page 2

Now on Databricks: A Technical Preview of Databricks Runtime 7 Including a Preview of Apache Spark 3.0

May 13, 2020 by Yin Huai, Wenchen Fan and Xiao Li in Platform Blog

Introducing Databricks Runtime 7.0 Beta We’re excited to announce that the Apache Spark TM 3.0.0-preview2 release is available on Databricks as part of...

How to Work with Avro, Kafka, and Schema Registry in Databricks

February 15, 2019 by Wenchen Fan and Michael Armbrust in Solutions

In the previous blog post , we introduced the new built-in Apache Avro data source in Apache Spark and explained how you can...

Apache Avro as a Built-in Data Source in Apache Spark 2.4

November 30, 2018 by Gengliang Wang, Wenchen Fan and Michael Armbrust in Solutions

Try this notebook in Databricks Apache Avro is a popular data serialization format. It is widely used in the Apache Spark and Apache...

Introducing Apache Spark 2.4

November 8, 2018 by Wenchen Fan, Xiao Li and Reynold Xin in Engineering Blog

UPDATED: 11/19/2018 We are excited to announce the availability of Apache Spark 2.4 on Databricks as part of the Databricks Runtime 5.0...

Learn about Apache Spark’s Memory Model and Spark’s State in the Cloud

September 19, 2017 by Wenchen Fan and Nicolas Poggi in Company Blog

Since Apache Spark 1.6, as part of the Project Tungsten , we started an ongoing effort to substantially improve the memory and CPU...

Cost Based Optimizer in Apache Spark 2.2

August 31, 2017 by Ron Hu, Zhenhua Wang, Wenchen Fan and Sameer Agarwal in Engineering Blog

This is a joint engineering effort between Databricks’ Apache Spark engineering team (Sameer Agarwal and Wenchen Fan) and Huawei’s engineering team (Ron Hu...

Scalable Partition Handling for Cloud-Native Architecture in Apache Spark 2.1

December 15, 2016 by Eric Liang, Michael Allman and Wenchen Fan in Engineering Blog

Apache Spark 2.1 is just around the corner: the community is going through voting process for the release candidates. This blog post discusses...

Introducing Apache Spark Datasets

January 4, 2016 by Michael Armbrust, Wenchen Fan, Reynold Xin and Matei Zaharia in Engineering Blog

Developers have always loved Apache Spark for providing APIs that are simple yet powerful, a combination of traits that makes complex analysis possible...