Skip to main content

With our launch of Jobs Orchestration, orchestrating pipelines in Databricks has become significantly easier. The ability to separate ETL or ML pipelines over multiple tasks offers a number of advantages with regards to creation and management. With this modular approach, teams can define and work on their respective responsibilities independently, while allowing for parallel processing to reduce overall execution time. This capability was a major step in transforming how our customers create, run, monitor, and manage sophisticated data and machine learning workflows across any cloud. Today, we are excited to share further enhancement in our orchestration capabilities, with the ability to reuse the same cluster across multiple tasks in a job run, saving even more time and money for our customers.

Until now, each task had its own cluster to accommodate for the different types of workloads. While this flexibility allows for fine-grained configuration, it can also introduce a time and cost overhead for cluster startup or underutilization during parallel tasks.

In order to maintain this flexibility, but further improve utilization, we are excited to announce cluster reuse. By sharing job clusters over multiple tasks customers can reduce the time a job takes, reduce costs by eliminating overhead and increase cluster utilization with parallel tasks.

When defining a task, customers will have the option to either configure a new cluster or choose an existing one. With cluster reuse, your list of existing clusters will now contain clusters defined in other tasks in the job. When multiple tasks share a job cluster, the cluster will be initialized when the first relevant task is starting. This cluster will stay on until the last task using this cluster is finished. This way there is no additional startup time after the cluster initialization, leading to a time/cost reduction while using the job clusters which are still isolated from other workloads.

We hope you are as excited as we are with this new functionality. Learn more about cluster reuse and start using shared Job clusters now to save startup time and cost. Please reach out if you have any feedback for us.

Try Databricks for free

Related posts

How to Manage End-to-end Deep Learning Pipelines with Databricks

August 25, 2021 by Oliver Koernig and Ashley Trainor in
Deep Learning (DL) models are being applied to use cases across all industries -- fraud detection in financial services, personalization in media, image...

Now Generally Available: Simplify Data and Machine Learning Pipelines With Jobs Orchestration

November 1, 2021 by Roland Fäustlin in
We are excited to announce the general availability of Jobs orchestration , a new capability that lets Databricks customers easily build data and...

Monitor Your Databricks Workspace with Audit Logs

June 2, 2020 by Craig Ng and Miklos Christine in
Cloud computing has fundamentally changed how companies operate - users are no longer subject to the restrictions of on-premises hardware deployments such as...
See all Product posts