Publish to Multiple Catalogs and Schemas from a Single DLT Pipeline

Simplify syntax, optimize costs and reduce operational complexity

Published: March 12, 2025

by Zoé Durand, Jonathan Chang and Matt Jones

Summary

Multi-Schema & Catalog Support: Publish to multiple schemas and catalogs from a single DLT pipeline.
Simplified Syntax & Reduced Costs: Eliminate the LIVE keyword and reduce infrastructure overhead.
Better Observability: Publish event logs to Unity Catalog and manage data across locations with SQL and Python.

DLT offers a robust platform for building reliable, maintainable, and testable data processing pipelines within Databricks. By leveraging its declarative framework and automatically provisioning optimal serverless compute, DLT simplifies the complexities of streaming, data transformation, and management, delivering scalability and efficiency for modern data workflows.

We’re excited to announce a much-anticipated enhancement: the ability to publish tables to multiple schemas and catalogs within a single DLT pipeline. This capability reduces operational complexity, lowers costs, and simplifies data management by allowing you to consolidate your medallion architecture (Bronze, Silver, Gold) into a single pipeline while maintaining organizational and governance best practices.

With this enhancement, you can:

Simplify pipeline syntax – No need for LIVE syntax to denote dependencies between tables. Fully and partially qualified table names are supported, along with USE SCHEMA and USE CATALOG commands, just like in standard SQL.
Reduce operational complexity – Process and publish all tables within a unified DLT pipeline, eliminating the need for separate pipelines per schema or catalog.
Lower costs – Minimize infrastructure overhead by consolidating multiple workloads into a single pipeline.
Improve observability – Publish your event log as a standard table in the Unity Catalog metastore for enhanced monitoring and governance.

“The ability to publish to multiple catalogs and schemas from one DLT pipeline - and no longer requiring the LIVE keyword - has helped us standardize on pipeline best practices, streamline our development efforts, and facilitate the easy transition of teams from non-DLT workloads to DLT as part of our large-scale enterprise adoption of the tooling.”
— Ron DeFreitas, Principal Data Engineer, HealthVerity

How to Get Started

Creating a Pipeline

All pipelines created from the UI now default to supporting multiple catalogs and schemas. You can set a default catalog and schema at the pipeline level through the UI, the API, or Databricks Asset Bundles (DABs).

From the UI:

Create a new pipeline as usual.
Set the default catalog and schema in the pipeline settings.

From the API:

If you are creating a pipeline programmatically, you can enable this capability by specifying the schema field in the PipelineSettings. This replaces the existing target field, ensuring that datasets can be published across multiple catalogs and schemas.

To create a pipeline with this capability via API, you can follow this code sample (Note: Personal Access Token authentication must be enabled for the workspace):

By setting the schema field, the pipeline will automatically support publishing tables to multiple catalogs and schemas without requiring the LIVE keyword.

From the DAB

Make sure your Databricks CLI has version v0.230.0 or above. If not, upgrade the CLI following the documentation.
Set up the Databricks Asset Bundle (DAB) environment by following the documentation. By following these steps, you should have a DAB directory generated from the Databricks CLI which contains all the configuration and source code files.
Find the YAML file defines the DLT pipeline under:
<your dab folder>/<resource>/<pipeline_name>_pipeline.yml
Set the schema field in the pipeline YAML and remove the target field if it exists.
Run “databricks bundle validate“ to validate that the DAB configuration is valid.
Run “databricks bundle deploy -t <environment>“ to deploy your first DPM pipeline!

“The feature works just like we expect it to work! I was able to split up the different datasets within DLT into our stage, core and UDM schemas (basically a bronze, silver, gold setup) within one single pipeline.”
— Florian Duhme, Expert Data Software Developer, Arvato

Arvato

Publishing Tables to Multiple Catalogs and Schemas

Once your pipeline is set up, you can define tables using fully or partially qualified names in both SQL and Python.

SQL Example

Python Example

Reading Datasets

You can reference datasets using fully or partially qualified names, with the LIVE keyword being optional for backward compatibility.

SQL Example

Python Example

API Behavior Changes

With this new capability, key API methods have been updated to support multiple catalogs and schemas more seamlessly:

dlt.read() and dlt.read_stream()

Previously, these methods could only reference datasets defined within the current pipeline. Now, they can reference datasets across multiple catalogs and schemas, automatically tracking dependencies as needed. This makes it easier to build pipelines that integrate data from different locations without additional manual configuration.

spark.read() and spark.readStream()

In the past, these methods required explicit references to external datasets, making cross-catalog queries more cumbersome. With the new update, dependencies are now tracked automatically, and the LIVE schema is no longer required. This simplifies the process of reading data from multiple sources within a single pipeline.

Using USE CATALOG and USE SCHEMA

Databricks SQL syntax now supports setting active catalogs and schemas dynamically, making it easier to manage data across multiple locations.

SQL Example

Python Example

Managing Event Logs in Unity Catalog

This feature also allows pipeline owners to publish event logs in the Unity Catalog metastore for improved observability. To enable this, specify the event_log field in the pipeline settings JSON. For example:

With that, you can now issue GRANTS on the event log table just like any regular table:

You can also create a view over the event log table:

Besides all of the above, you are also able to stream from the event log table:

What’s next?

Looking ahead, these enhancements will become the default for all newly created pipelines, whether created via UI, API, or Databricks Asset Bundles. Additionally, a migration tool will soon be available to help transition existing pipelines to the new publishing model.

What's next?

November 21, 2024/3 min read

How to present and share your Notebook insights in AI/BI Dashboards

A screenshot of Mosaic AI Model Serving dashboard for deploying and managing fine-tuned LLaMA models.

December 10, 2024/7 min read

Summary

How to Get Started

Creating a Pipeline

From the UI:

From the API:

From the DAB

Publishing Tables to Multiple Catalogs and Schemas

Reading Datasets

API Behavior Changes

dlt.read() and dlt.read_stream()

spark.read() and spark.readStream()

Using USE CATALOG and USE SCHEMA

Managing Event Logs in Unity Catalog

What’s next?

Never miss a Databricks post

Sign up

What's next?

How to present and share your Notebook insights in AI/BI Dashboards

Batch Inference on Fine Tuned Llama Models with Mosaic AI Model Serving