Skip to main content

What’s new with Unity Catalog at Data and AI Summit 2023

Lakehouse Federation, Governance for AI, Lakehouse Monitoring, Lakehouse Observability and More
Share this post

The fundamental principles of governance – accountability, compliance, quality, and transparency – that are essential for data management have now become equally imperative for AI. Databricks took a pioneering approach with Unity Catalog by releasing the industry's only unified solution for data and AI governance across clouds and data platforms.

Organizations can use Unity Catalog to securely discover, access, monitor and collaborate on files, tables, ML models, notebooks and dashboards across any data platform or cloud, while also leveraging AI to boost productivity and unlock the full potential of the lakehouse environment.

We are excited to announce cutting-edge advancements in Unity Catalog including Lakehouse Federation, Governance for AI, AI-powered Governance (Lakehouse Monitoring, Lakehouse Observability), and many more.

Databricks Unity Catalog
Databricks Unity Catalog

Lakehouse Federation: Discover, govern and query your data wherever it lives 

Lakehouse Federation in Unity Catalog enables organizations to build an open, performant, and secure data mesh architecture. With Lakehouse Federation, organizations can leverage a consistent data management, discovery, and governance experience for all their data across various platforms, including MySQL, PostgreSQL, Amazon Redshift, Snowflake, Azure SQL Database, Azure Synapse, Google BigQuery, and more, all within Databricks. Additionally, Unity Catalog's advanced security features, such as row and column level access controls, along with discovery features like tags and data lineage, are extended to these external data sources, ensuring consistent governance practices.

Query Federation in Unity Catalog
Lakehouse Federation in Unity Catalog

Governance for AI - Unifying data and AI catalogs under one roof

We are also expanding the governance model within Unity Catalog to provide comprehensive management of both AI assets and data in a unified experience. This consolidation simplifies DataOps and MLOps processes, and prepares organizations for AI compliance, by bringing together all the necessary capabilities in one centralized location. Key enhancements include:

Feature Store and Model Registry in Unity Catalog

We announced the public preview of Model Registry in Unity Catalog with the public preview of Feature Store coming later in July. With this capability, Unity Catalog is the only governance solution that brings together all data and ML assets - from data and features to models - into one catalog, ensuring full visibility and fine-grained access controls throughout the AI workflow. This unified approach provides automatic versioning and lineage tracking, centralized governance, and seamless cross-workspace collaboration for simplified MLOps and enhanced productivity. Additionally, with advanced monitoring capabilities, you can now experience improved visibility, quality, understanding and control over your entire AI workflow.

Discover and govern ML models along with your data in Unity Catalog
Discover and govern ML models along with your data in Unity Catalog

Volumes in Unity Catalog: Govern any non-tabular data

There are many use cases, particularly for machine learning and data science workloads, which require access to non-tabular data, such as image, audio, video, or PDF files.

We announced Volumes in Unity Catalog. Volumes is a new type of object that catalogs collections of files and helps you build scalable file-based applications that read and process large collections of data irrespective of its format, including unstructured, semi-structured, and structured. This enables you to manage, govern and track lineage for non-tabular data along with the tabular data in Unity Catalog. Stay tuned for the public preview of Volumes, coming in the next few weeks!

Govern any non-tabular data in Unity Catalog
Govern any non-tabular data in Unity Catalog

AI for governance: Lakehouse Monitoring and Lakehouse Observability

Unity Catalog not only offers robust governance capabilities for AI but also harnesses the power of AI to optimize governance workflows. Key enhancements include:

Lakehouse Monitoring: Monitor the quality of your organization's data and AI assets

Ensuring trust in data and AI models is paramount for the success of any organization. To address this critical requirement, we have introduced Databricks Lakehouse Monitoring, an AI-driven monitoring service that encompasses the entire data pipeline, including data, ML models, and features.

Databricks Lakehouse Monitoring provides proactive alerts for quality issues and errors in data and ML model pipelines, including the automatic classification and identification of personally identifiable information (PII) using AI-based data classification technology from Okera, our recent acquisition. Additionally, data teams can effortlessly share comprehensive data and ML quality reports with stakeholders through auto-generated dashboards.

Proactive alerts in Unity Catalog
Proactive alerts in Unity Catalog

Finally, data teams can effectively debug and perform impact assessment of any issues identified in the monitoring reports by utilizing Unity Catalog's real-time data lineage, down to the column level. This streamlines monitoring and diagnostics workflows, providing a comprehensive end-to-end solution.

Root cause and impact assessment using lineage
Root cause and impact assessment using lineage

Lakehouse Observability: System tables and dashboards for all aspects of lakehouse

Observability is a critical aspect of any Data and AI workload. To address this requirement, we announced the public preview of System Tables for auditing, lineage and billing in Unity Catalog, with additional tables coming later this year.

System Tables serve as a centralized analytical store and provide comprehensive cost and usage analytics, offering valuable insights into resource consumption and expenditure. Additionally, System Tables allow users to perform audit analytics for jobs, notebooks, clusters, and SQL/ML endpoints, track data lineage and access permissions. With the ability to easily query System Tables in Unity Catalog using any language, users can build customized dashboards and notebooks, and leverage the power of AI to transform operational data into actionable business insights. Finally, users can further operationalize this intelligence with DBSQL alerts to systemically drive RoI improvements into their end-to-end intelligent data application lifecycle.

Lakehouse Observability using System Tables in Unity Catalog
Lakehouse Observability using System Tables in Unity Catalog

Additional advancements in governance on the Lakehouse

Row and Column-level data security

To enhance data security effectively at the granular level, Unity Catalog provides row filtering and column masking. Users can leverage standard SQL functions to define row filters and column masks, enabling fine-grained access controls at the level of individual rows and columns. This functionality is in public preview on AWS, Azure, and GCP

Tags for data classification

Unity Catalog goes beyond just discovery and provides contextual insights about the data, enabling users to jumpstart their work and accelerate analytics and AI initiatives. Users can easily describe and tag data assets to improve understanding, gain insights into the popularity of an asset, identify domain experts, and frequently used notebooks/queries/joins, making data enrichment a breeze. 

Data Insights with Unity Catalog
Data Insights with Unity Catalog

LakehouseIQ: The AI-powered engine that uniquely understands your business

We also announced LakehouseIQ, a knowledge engine that learns the unique nuances of your business and the complex layers of your data, enabling seamless natural language access to the right data at the right time. LakehouseIQ is powered by Unity Catalog, which provides the metadata and lineage leveraged by the AI while ensuring the organization's internal security and governance policies are consistently enforced.

Getting Started with Databricks Unity Catalog

By embracing Unity Catalog as the cornerstone of your Lakehouse architecture, you can unlock the power of a flexible and scalable governance implementation that spans your entire data and AI estate. To get started, follow the Unity Catalog guides available for AWS, Azure, and GCP.

Watch the Data+AI Summit 2023 keynote from Matei Zaharia, co-founder and Chief Technology Officer at Databricks, to learn more. Register for Data + AI Summit and explore the top data and AI governance sessions

 

Try Databricks for free

Related posts

Introducing Lakehouse Federation Capabilities in Unity Catalog

Lakehouse Federation is now in public preview! Data teams face many challenges to quickly access the right data primarily due to data fragmentation...

Introducing LakehouseIQ: The AI-Powered Engine that Uniquely Understands Your Business

Today, we are thrilled to announce LakehouseIQ, a knowledge engine that learns the unique nuances of your business and data to power natural...

What’s New with Data Sharing and Collaboration on the Lakehouse

Databricks provides the first open source approach to data sharing and collaboration across data, analytics, and AI. Customers can share live data sets...

Extending Databricks Unity Catalog with an Open Apache Hive Metastore API

Today, we are excited to announce the preview of a Hive Metastore (HMS) interface for Databricks Unity Catalog , which allows any software...
See all Platform Blog posts