We have recently shipped the new Databricks Runtime version 4.1 powered by Apache Spark™. Version 4.1 brings improved performance on read/write from sources like S3 or Parquet, improved caching, and a great deal of quality and feature improvements for the preview of Databricks Delta focused on faster query execution and adaptive schema and type validation.
If you are participating in our preview of Databricks Delta on Azure Databricks or Amazon's AWS, it is highly recommended that you upgrade to version 4.1 today.
Let's take a closer look at some of the improvements:
OPTIMIZE
command improves reads by consolidating files. With this release, OPTIMIZE
now executes in parallel - greatly speeding up the time it takes to optimize a table.LIMIT
(Delta): There are also improvements in limit pushdown that reduce intermediate result sets size.UPDATE
, DELETE and MERGE
(Delta): Writes with UPDATE,DELETE
and MERGE
statements in Delta can now use stats and perform data skipping for lower latency executions.ALTER TABLE
DDL. You can learn more about Schema Validation here.Databricks Delta remains in Private Preview, but the updates on version 4.1 represent a candidate release in anticipation of the upcoming general availability (GA) release. If you are not already participating in the Databricks Delta preview, you can still sign up here.
This post touches on only a few select improvements in the 4.1 release. If you’d like to go over the full set of improvements, please visit the release notes for version 4.1 here.
If you’d like to hear more about the features here and more about Databricks Runtime, stop by our booth at the Spark + AI Summit in San Francisco.
Come find out what’s new in Spark, Data, and AI! Register now.