We’re excited to announce Databricks Runtime 4.2, powered by Apache Spark™. Version 4.2 includes updated Spark internals, new features, and major performance upgrades to Databricks Delta, as well as general quality improvements to the platform. We are moving quickly toward the Databricks Delta general availability (GA) release and we recommend you upgrade to Databricks Runtime 4.2 to take advantage of these improvements.
I'd like to take a moment to highlight some of the work the team has done to continually improve Databricks Delta:
- Streaming Directly to Delta Tables: Streams can now be directly written to a Databricks Delta table registered in the Hive metastore using df.writeStream.table(...).
- Path Consistency for Delta Commands: All Databricks Delta commands and queries now support referring to a table using its path as an identifier (that is, delta.`/path/to/table`). Previously OPTIMIZE and VACUUM required non-standard use of string literals (that is, '/path/to/table').
We've also included powerful new features to Structured Streaming:
- Robust Streaming Pipelines with Trigger.Once: is now supported in Databricks Delta. Rate limits (for example maxOffsetsPerTrigger or maxFilesPerTrigger) specified as source options or defaults could result in partial execution of available data. These options are now ignored when Trigger.Once is used, allowing all currently available data to be processed. Documentation is available at: Trigger.Once in the Databricks Runtime 4.2 release notes.
- Flexible Streaming Sink to Many Storage Options with foreachBatch(): You can now define a function to process the output of every microbatch using DataFrame operations in Scala. Documentation is available at: foreachBatch(). This can help in new ways of flexibility but most importantly, foreachBatch() can let you write to a range of storage options even if they don’t support streaming as a sink.
- Support for streaming foreach() in Python has also been added. Documentation is available at: foreach().
We included support for the SQL Deny command for table access control enabled clusters. Users can now deny specific permissions in the same way they are granted. A denied permission will supersede a granted one. Detailed technical documentation is available at: SQL DENY.
To read more about the above new features and to see the full list of improvements included in Databricks Runtime 4.2, please refer to the release notes in the following locations:
- Amazon Web Services: Databricks Runtime 4.2 release notes
- Azure: :Databricks Runtime 4.2 release notes