Databricks is pleased to announce the release of Databricks Runtime 5.4. This release includes Apache Spark 2.4.3 along with several important improvements and bug fixes . We recommend all users upgrade to take advantage of this new runtime release. This blog post gives a brief overview of some of the new high value features that simplify manageability and improve usability in Databricks.
We continue to make advances in Databricks that simplify data and resource management.
Delta Lake is the best place to store and manage data in an open format. We've included a feature in public preview called Auto Optimize that removes administrative overhead by determining optimum file sizes and performing necessary compaction at write time. It's configured as an individual table property and can be added to existing tables. Optimized tables allow you to query those tables efficiently for analytics.
To try out Auto Optimize, consult the Databricks documentation(Azure | AWS).
We've partnered with the Data Services team at Amazon to bring the Glue Catalog to Databricks. Databricks Runtime can now use Glue as a drop-in replacement for the Hive metastore. This provides several immediate benefits:
Glue as the metastore is currently in public preview, and to start using this feature please consult the Databricks Documentation for configuration instructions.
Databricks Runtime 5.4 includes several new features that improve usability.
A popular feature that has enjoyed wide adoption during public preview, Databricks Connect is a framework that makes it possible to develop applications on the Databricks Runtime from anywhere. This enables two primary use cases:
Databricks Connect allows you to:
For an in depth description, refer to the Databricks Connect blog post, which goes into further detail. To try out Databricks Connect, refer to the getting started documentation(Azure | AWS).
Take advantage of the power of Conda for managing Python dependencies inside Databricks. Conda has become the package and environment management tool of choice in the data science community and we're excited to bring this capability to Databricks. Conda is especially well suited for ML Workloads, and Databricks Runtime with Conda lets you create and manage Python environments from within the scope of a user session. We provide two simplified Databricks Runtime pathways to get started:
For more in depth information, visit the blog post introducing Databricks Runtime with Conda. To get started, refer to the Databricks Runtime with Conda documentation(Azure | AWS).
Databricks Library Utilities enable you to manage Python dependencies within the scope of a single user session. You can add, remove, and update libraries and switch Python environments (if using our new Databricks Runtime with Conda) all from within the scope of a session. When you disconnect, the session is not persisted and is garbage collected and resources are freed up for future user sessions. This has several important benefits:
For an in depth example visit the blog post Introducing Library Utilities. For further information, refer to Library Utilities in the Databricks documentation(Azure | AWS).