We’re excited to announce the general availability of Databricks Runtime 5.0. Included in this release is Spark 2.4. This release offers substantial performance increases within key areas of the platform. Benchmarking workloads have shown a 16% improvement in total execution time and Databricks Delta benefits from substantial improvements to metadata caching, improving query latency by 30%. Beyond these powerful performance improvements we've packed this release with many new features and improvements. I'll highlight some of these now.
With Databricks Runtime 5.0 we've improved the usage for MERGE commands:
For further information on UPDATE and DELETE commands, please refer to the Databricks Delta Documentation.
In addition to the new features in this release we’ve invested heavily in improvements for Databricks Delta, including work to improve performance and stability for the OPTIMIZE command:
We’ve improved the isolation level for Databricks Delta queries. Any query with multiple references to a single Databricks Delta table (self-joins etc) will read from the same snapshot even if there are concurrent updates to the table.
Lastly, we want to point out the improved query latency for small Databricks Delta tables (release notes for Databricks Runtime 5.0.
We’ve upgraded the streaming source Kafka client to version 2.0.0, which is an important milestone. Databricks now supports kafka.isolation.level to read only committed records from Kafka topics that are written using a transactional producer.
We’ve also included the new Azure Blob Storage file notification based Streaming Source. Instead of listing to find new files for processing, this streaming source, can directly read file event notifications to find new files. This can significantly reduce listing costs for Structured Streaming queries on files in Azure Blob Storage.
To read more about the above new features and to see the full list of improvements included in Databricks Runtime 5.0, refer to the release notes in the following locations:
We recommend all customers upgrade to Databricks Runtime 5.0 to take advantage of these new features and performance optimizations.