Skip to main content

Build governed pipelines with Delta Live Tables and Unity Catalog

Zoe Durand
Mukul Murthy
Jon Mio
Yuhong Chen
Share this post

We are excited to announce the public preview of Unity Catalog support for Delta Live Tables (DLT). With this preview, any data team can define and execute fine-grained data governance policies on data assets produced by Delta Live Tables. We are bringing the power of Unity Catalog to data engineering pipelines: pipelines and Delta Live Tables can now be governed and managed alongside your other Unity Catalog assets.

Revolutionizing data engineering with Unity Catalog and Delta Live Tables

Unity Catalog is a comprehensive data governance solution designed for lakehouse architectures. Data lakes, such as S3, ADLS, and GCS, have become popular for storing and processing vast amounts of data due to their scalability and cost-effectiveness. However, managing governance in data lakes has been a challenge. Unity Catalog addresses this challenge by offering fine-grained data permissions using standard ANSI SQL or a user-friendly UI. It enables organizations to manage permissions at the row, column, or view level, providing control over data access and ensuring compliance with data governance policies. Unity Catalog goes beyond managing tables and extends governance to other types of data assets, including ML models and files. This allows enterprises to govern all their data and AI assets from a centralized platform.

Delta Live Tables (DLT) is a powerful ETL (Extract, Transform, Load) framework provided by Databricks. It enables data engineers and analysts to build efficient and reliable data pipelines for processing both streaming and batch workloads. DLT simplifies ETL development by allowing users to express data pipelines declaratively using SQL and Python. This declarative approach eliminates the need for manual code stitching and streamlines the development, testing, deployment, and operation of data pipelines. DLT also automates infrastructure management, taking care of cluster sizing, orchestration, error handling, and performance optimization. By automating these operational tasks, data engineers can focus on data transformation and derive valuable insights from their data.

Combining end-to-end data governance with streamlined data engineering processes

By combining the strengths of Unity Catalog and Delta Live Tables, organizations can achieve end-to-end data governance and streamline their data engineering processes. The integration empowers data teams to develop and execute data pipelines using Delta Live Tables while adhering to the governance policies defined in Unity Catalog. This seamless interoperability enables efficient collaboration between data engineers, analysts, and governance teams, ensuring that data assets are properly governed, secured, and compliant throughout the data lifecycle. With Unity Catalog and Delta Live Tables working together, organizations can unlock the full potential of their data Lakehouse architecture while maintaining the highest standards of data governance and security.

Block (formerly Square) has been one of our early preview customers for this integration. As an early adopter of Delta Live Tables for their enterprise data platform, Block is excited about the enormous possibilities afforded by Unity Catalog for their DLT pipelines:

"We are incredibly excited about the integration of Delta Live Tables with Unity Catalog. This integration will help us streamline and automate data governance for our DLT pipelines, helping us meet our sensitive data and security requirements as we ingest millions of events in real time. This opens up a world of potential and enhancements for our business use cases related to risk modeling and fraud detection."
— Yue Zhang, Staff Software Engineer, Block

How is UC enabled in Delta Live Tables?

When creating a Delta Live Table pipeline, in the UI, select "Unity Catalog" in the Destination options.

You will be prompted to choose your target catalog and schema, which is where all your live tables will be published in the three-level namespace (catalog.schema.table).

gif

How can UC be used with DLT?

Read from any source: Hive Metastore and Unity Catalog tables, streaming sources

Unity Catalog + Delta Live Tables expands a DLT pipeline's capability to read data from various sources. A DLT + Unity Catalog pipeline can read from

  • Unity Catalog managed and external tables
  • Hive metastore tables and views
  • Streaming sources (Apache Kafka and Amazon Kinesis)
  • Cloud object storage with Databricks Autoloader or cloud_files()

For example, an organization may want to analyze customer interactions across multiple channels. They can utilize DLT to ingest and process data from sources like customer interaction logs stored in Hive Metastore tables, real-time streams from Kafka, and data from UC-managed tables. This combination of sources provides a comprehensive view of customer interactions, enabling valuable insights and analytics.

Fine-grained access control for DLT-published tables

Unity Catalog's fine-grained access control empowers pipeline creators to easily manage access to live tables. As a DLT pipeline developer, you have full control over who can access specific live tables within the catalog.

Granting or revoking access for a group in the metastore can be accomplished through a simple ANSI SQL command.

For instance, if you have created a live table in UC that contains sensitive customer data, you can selectively grant access to data analysts or data scientists who need to work with that specific table. By using SQL commands like "GRANT SELECT ON TABLE," you can specify the precise level of access and provide a secure and controlled environment for data exploration and analysis.

Enforce the physical isolation of data required by your company

Data isolation is crucial for many organizations to ensure compliance and security. DLT with Unity Catalog enables you to enforce physical separation of data by writing datasets to the appropriate catalog-level storage location.

With this capability, you can store and manage different datasets in distinct storage locations associated with each catalog, based on your organization's requirements. This feature ensures that sensitive data remains separate and isolated from other datasets, providing a strong foundation for data governance and compliance.

Stay tuned for more!

We are continuously working to enhance the capabilities of Delta Live Tables (DLT) and Unity Catalog (UC) to provide an even more robust, secure and seamless data engineering experience. We will continue to strengthen the integration between DLT and UC, enabling you to maximize the potential of your data Lakehouse architecture while maintaining top-notch governance and security.

Try it out today

To experience the power of Delta Live Tables and Unity Catalog firsthand, we encourage you to try them today.

Try Delta Live Tables in Unity Catalog today, or read the documentation (AWS | Azure)

Try Databricks for free

Related posts

Using Structured Streaming with Delta Sharing in Unity Catalog

We are excited to announce that support for using Structured Streaming with Delta Sharing is now generally available (GA) in Azure, AWS, and...

Applying software development & DevOps best practices to Delta Live Table pipelines

April 28, 2023 by Alex Ott in
Databricks Delta Live Tables (DLT) radically simplifies the development of the robust data processing pipelines by decreasing the amount of code that data...

Processing data simultaneously from multiple streaming platforms using Delta Live Tables

One of the major imperatives of organizations today is to enable decision making at the speed of business. Business teams and autonomous decisioning...
See all Platform Blog posts