Announcing General Availability of orchestrating dbt Projects with Databricks Workflows
We are pleased to announce the General Availability (GA) of support for orchestrating dbt projects in Databricks Workflows. Since the start of Public Preview, we have hundreds of customers leverage this integration with dbt to collaboratively transform, test, and document data in Databricks SQL warehouses.
With dbt support in Workflows, your dbt project is retrieved from a Git repository, and a single-node cluster is launched with dbt-core and project dependencies on it. The SQL generated by dbt is run on a serverless SQL warehouse, providing easy debugging and great performance. There are also robust and operational capabilities, such as the ability to repair failed runs and send alerts via Slack or a webhook destination when a dbt task fails, not to mention the ability to manage such jobs and retrieve dbt artifacts such as logs through the Jobs API.
With GA, we have extended support to SQL Pro Warehouses in addition to existing support for serverless SQL Warehouses. Moreover, we are happy to announce support for Databricks on Google Cloud Platform (GCP). Lineage from transforms specified in dbt projects is also automatically captured in Unity Catalog. Finally, even more dbt community packages such as dbt-artifacts now work with Databricks.
To get started with dbt on Databricks, simply run "pip install dbt-databricks." This installs the open source dbt-databricks package built together with dbt Labs and other contributors. You can follow our detailed guide to get started with an example project. Once you commit your source code to a git repository, you can use Databricks Workflows to execute your dbt models in production (see our docs for (AWS | Azure | GCP).