Skip to main content

We’re excited to announce Databricks Runtime 4.2, powered by Apache Spark™.  Version 4.2 includes updated Spark internals, new features, and major performance upgrades to Databricks Delta, as well as general quality improvements to the platform.  We are moving quickly toward the Databricks Delta general availability (GA) release and we recommend you upgrade to Databricks Runtime 4.2 to take advantage of these improvements.

I'd like to take a moment to highlight some of the work the team has done to continually improve Databricks Delta:

  • Streaming Directly to Delta Tables: Streams can now be directly written to a Databricks Delta table registered in the Hive metastore using df.writeStream.table(...).
  • Path Consistency for Delta Commands: All Databricks Delta commands and queries now support referring to a table using its path as an identifier (that is, delta.`/path/to/table`). Previously OPTIMIZE and VACUUM required non-standard use of string literals (that is, '/path/to/table').

We've also included powerful new features to Structured Streaming:

  • Robust Streaming Pipelines with Trigger.Once: is now supported in Databricks Delta.   Rate limits (for example maxOffsetsPerTrigger or maxFilesPerTrigger) specified as source options or defaults could result in partial execution of available data. These options are now ignored when Trigger.Once is used, allowing all currently available data to be processed.  Documentation is available at: Trigger.Once in the Databricks Runtime 4.2 release notes.
  • Flexible Streaming Sink to Many Storage Options with foreachBatch(): You can now define a function to process the output of every microbatch using DataFrame operations in Scala.  Documentation is available at: foreachBatch(). This can help in new ways of flexibility but most importantly, foreachBatch() can let you write to a range of storage options even if they don’t support streaming as a sink.
  • Support for streaming foreach() in Python has also been added. Documentation is available at: foreach().

We included support for the SQL Deny command for table access control enabled clusters. Users can now deny specific permissions in the same way they are granted. A denied permission will supersede a granted one.  Detailed technical documentation is available at: SQL DENY.

To read more about the above new features and to see the full list of improvements included in Databricks Runtime 4.2, please refer to the release notes in the following locations:

Try Databricks for free

Related posts

Monitor Your Databricks Workspace with Audit Logs

June 2, 2020 by Craig Ng and Miklos Christine in
Cloud computing has fundamentally changed how companies operate - users are no longer subject to the restrictions of on-premises hardware deployments such as...

Simplifying Genomics Pipelines at Scale with Databricks Delta

Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. Try this notebook in...

Building a Real-Time Attribution Pipeline with Databricks Delta

August 9, 2018 by Caryl Yuhas and Denny Lee in
Get an early preview of O'Reilly's new ebook for the step-by-step guidance you need to start using Delta Lake. In digital advertising, one...
See all Company Blog posts