Skip to main content
Page 1

Simplify Data Ingestion With the New Python Data Source API

December 10, 2024 by Craig Lukasik and Allison Wang in
Data engineering teams are frequently tasked with building bespoke ingestion solutions for myriad custom, proprietary, or industry-specific data sources. Many teams find that...

PySpark in 2023: A Year in Review

With the releases of Apache Spark 3.4 and 3.5 in 2023, we focused heavily on improving PySpark performance, flexibility, and ease of use...

Simplify PySpark testing with DataFrame equality functions

The DataFrame equality test functions were introduced in Apache Spark™ 3.5 and Databricks Runtime 14.2 to simplify PySpark unit testing. The full set...

Named Arguments for SQL Functions

Today, we introduce the new availability of named arguments for SQL functions. With this feature, you can invoke functions in more flexible ways...

Introducing Python User-Defined Table Functions (UDTFs)

Apache Spark™ 3.5 and Databricks Runtime 14.0 have brought an exciting feature to the table: Python user-defined table functions (UDTFs). In this blog...

Introducing Apache Spark™ 3.5

Today, we are happy to announce the availability of Apache Spark™ 3.5 on Databricks as part of Databricks Runtime 14.0. We extend our...

Introducing English as the New Programming Language for Apache Spark

Introduction We are thrilled to unveil the English SDK for Apache Spark, a transformative tool designed to enrich your Spark experience. Apache Spark™...

What’s New With SQL User-Defined Functions

Since their initial release , SQL user-defined functions have become hugely popular among both Databricks Runtime and Databricks SQL customers. This simple yet...

Introducing SQL User-Defined Functions

October 20, 2021 by Serge Rielau and Allison Wang in
A user-defined function (UDF) is a means for a user to extend the native capabilities of Apache Spark™ SQL. SQL on Databricks has...

Faster SQL: Adaptive Query Execution in Databricks

October 21, 2020 by MaryAnn Xue and Allison Wang in
Earlier this year, Databricks wrote a blog on the whole new Adaptive Query Execution framework in Spark 3.0 and Databricks Runtime 7.0. The...