Skip to main content
Page 1
Engineering blog

Named Arguments for SQL Functions

Today, we introduce the new availability of named arguments for SQL functions. With this feature, you can invoke functions in more flexible ways...
Engineering blog

Introducing the Support of Lateral Column Alias

September 19, 2023 by Xinyi Yu, Wenchen Fan and Gengliang Wang in Engineering Blog
We are thrilled to introduce the support of a new SQL feature in Apache Spark and Databricks: Lateral Column Alias (LCA). This feature...
Engineering blog

Introducing Apache Spark™ 3.5

Today, we are happy to announce the availability of Apache Spark™ 3.5 on Databricks as part of Databricks Runtime 14.0. We extend our...
Engineering blog

Introducing Apache Spark™ 3.4 for Databricks Runtime 13.0

Today, we are happy to announce the availability of Apache Spark™ 3.4 on Databricks as part of Databricks Runtime 13.0 . We extend...
Engineering blog

Introducing Apache Spark™ 3.3 for Databricks Runtime 11.0

Today we are happy to announce the availability of Apache Spark™ 3.3 on Databricks as part of Databricks Runtime 11.0 . We want...
Engineering blog

Introducing Apache Spark™ 3.2

We are excited to announce the availability of Apache Spark™ 3.2 on Databricks as part of Databricks Runtime 10.0 . We want to...
Engineering blog

Introducing Apache Spark™ 3.1

We are excited to announce the availability of Apache Spark 3.1 on Databricks as part of Databricks Runtime 8.0 . We want to...
Engineering blog

A Comprehensive Look at Dates and Timestamps in Apache Spark™ 3.0

Apache Spark is a very popular tool for processing structured and unstructured data. When it comes to processing structured data, it supports many...
Company blog

Introducing Apache Spark 3.0

We’re excited to announce that the Apache Spark TM 3.0.0 release is available on Databricks as part of our new Databricks Runtime 7.0...
Engineering blog

Adaptive Query Execution: Speeding Up Spark SQL at Runtime

Read Rise of the Data Lakehouse to explore why lakehouses are the data architecture of the future with the father of the data...
Engineering blog

Now on Databricks: A Technical Preview of Databricks Runtime 7 Including a Preview of Apache Spark 3.0

Introducing Databricks Runtime 7.0 Beta We’re excited to announce that the Apache Spark TM 3.0.0-preview2 release is available on Databricks as part of...
Company blog

How to Work with Avro, Kafka, and Schema Registry in Databricks

February 15, 2019 by Wenchen Fan and Michael Armbrust in Company Blog
In the previous blog post , we introduced the new built-in Apache Avro data source in Apache Spark and explained how you can...
Engineering blog

Apache Avro as a Built-in Data Source in Apache Spark 2.4

Try this notebook in Databricks Apache Avro is a popular data serialization format. It is widely used in the Apache Spark and Apache...
Engineering blog

Introducing Apache Spark 2.4

November 8, 2018 by Wenchen Fan, Xiao Li and Reynold Xin in Engineering Blog
UPDATED: 11/19/2018 We are excited to announce the availability of Apache Spark 2.4 on Databricks as part of the Databricks Runtime 5.0...
Company blog

Learn about Apache Spark’s Memory Model and Spark’s State in the Cloud

September 20, 2017 by Wenchen Fan and Nicolas Poggi in Company Blog
Since Apache Spark 1.6, as part of the Project Tungsten , we started an ongoing effort to substantially improve the memory and CPU...
Engineering blog

Cost Based Optimizer in Apache Spark 2.2

This is a joint engineering effort between Databricks’ Apache Spark engineering team (Sameer Agarwal and Wenchen Fan) and Huawei’s engineering team (Ron Hu...
Engineering blog

Scalable Partition Handling for Cloud-Native Architecture in Apache Spark 2.1

Apache Spark 2.1 is just around the corner: the community is going through voting process for the release candidates. This blog post discusses...
Engineering blog

Introducing Apache Spark Datasets

Developers have always loved Apache Spark for providing APIs that are simple yet powerful, a combination of traits that makes complex analysis possible...