Skip to main content
<
Page 2

Now on Databricks: A Technical Preview of Databricks Runtime 7 Including a Preview of Apache Spark 3.0

May 13, 2020 by Yin Huai, Wenchen Fan and Xiao Li in
Introducing Databricks Runtime 7.0 Beta We’re excited to announce that the Apache Spark TM 3.0.0-preview2 release is available on Databricks as part of...

How to Work with Avro, Kafka, and Schema Registry in Databricks

February 15, 2019 by Wenchen Fan and Michael Armbrust in
In the previous blog post , we introduced the new built-in Apache Avro data source in Apache Spark and explained how you can...

Apache Avro as a Built-in Data Source in Apache Spark 2.4

Try this notebook in Databricks Apache Avro is a popular data serialization format. It is widely used in the Apache Spark and Apache...

Introducing Apache Spark 2.4

November 8, 2018 by Wenchen Fan, Xiao Li and Reynold Xin in
UPDATED: 11/19/2018 We are excited to announce the availability of Apache Spark 2.4 on Databricks as part of the Databricks Runtime 5.0...

Learn about Apache Spark’s Memory Model and Spark’s State in the Cloud

September 19, 2017 by Wenchen Fan and Nicolas Poggi in
Since Apache Spark 1.6, as part of the Project Tungsten , we started an ongoing effort to substantially improve the memory and CPU...

Cost Based Optimizer in Apache Spark 2.2

This is a joint engineering effort between Databricks’ Apache Spark engineering team (Sameer Agarwal and Wenchen Fan) and Huawei’s engineering team (Ron Hu...

Scalable Partition Handling for Cloud-Native Architecture in Apache Spark 2.1

Apache Spark 2.1 is just around the corner: the community is going through voting process for the release candidates. This blog post discusses...

Introducing Apache Spark Datasets

Developers have always loved Apache Spark for providing APIs that are simple yet powerful, a combination of traits that makes complex analysis possible...