As companies undertake more business intelligence (BI) and artificial intelligence (AI) initiatives, the need for simple, clear and reliable orchestration of data processing tasks has increased. Previously, Databricks customers had to choose whether to run these tasks all in one notebook or use another workflow tool and add to the overall complexity of their environment.
Today, we are pleased to announce that Databricks Jobs now supports task orchestration in public preview -- the ability to run multiple tasks as a directed acyclic graph (DAG). A job is a non-interactive way to run an application in a Databricks cluster, for example, an ETL job or data analysis task you want to run immediately or on a scheduled basis. The ability to orchestrate multiple tasks in a job significantly simplifies creation, management and monitoring of your data and machine learning workflows at no additional cost. Benefits of this new capability include:
Simple task orchestration
Now, anyone can easily orchestrate tasks in a DAG using the Databricks UI and API. This eases the burden on data teams by enabling data scientists and analysts to build and monitor their own jobs, making key AI and ML initiatives more accessible. The following example shows a job that runs seven notebooks to train a recommender machine learning model.
Orchestrate anything, anywhere
Jobs orchestration is fully integrated in Databricks and requires no additional infrastructure or DevOps resources. Customers can use the Jobs API or UI to create and manage jobs and features, such as email alerts for monitoring. Your data team does not have to learn new skills to benefit from this feature. This feature also enables you to orchestrate anything that has an API outside of Databricks and across all clouds, e.g. pull data from CRMs.
Next steps
Task Orchestration will begin rolling out to all Databricks workspaces as a Public Preview starting July 13th. Over the course of the following months, we will also enable you to reuse a cluster across tasks in a job and restart the DAG such that it only runs the tasks again that had previously failed.
Read more about task orchestration and multiple tasks in a Databricks Job, then go to the admin console of your workspace to enable the capability for free.