We are thrilled to announce that Databricks Unity Catalog is now generally available on Google Cloud Platform (GCP).
Unity Catalog provides a unified governance solution for data, analytics and AI on the lakehouse. With Unity Catalog, data & governance teams benefit from an enterprise wide data catalog with a single interface to manage permissions, centralize auditing, and share data across platforms, clouds and regions.
"At Press Ganey, we manage massive amounts of healthcare data on GCP for one of the most regulated complex data ecosystems. Unity Catalog empowers our data teams to closely collaborate while ensuring proper management of data governance and audit requirements. It is enabling us to manage data in a highly secure, sophisticated and convenient way." - Ed Holsinger, Distinguished Data Engineer, Press Ganey.
Enterprises can now benefit from a common governance model across all three major cloud providers (AWS, GCP, Azure). This simplifies the management of their multi-cloud data architecture and reduces the need to learn cloud-specific security and governance concepts, resulting in lower operational overhead.
Simplifying governance and sharing at scale for Data, Analytics and AI across clouds
With Unity Catalog at the center of your lakehouse architecture, you can achieve a flexible and scalable governance implementation.
Unified and secure view of the entire data estate
Unity Catalog offers a centralized metadata layer to catalog data assets across your lakehouse. In addition, Unity Catalog centralizes identity management, which includes service principals, users, and groups, providing a consistent view across multiple workspaces. Leveraging this centralized metadata layer and user management capabilities, data administrators can define access permissions on objects using a single interface across workspaces, all based on an industry-standard ANSI SQL dialect.
Moreover, Unity Catalog supports a privilege inheritance model, allowing admins to set access policies on entire catalogs or schemas of objects. This simplifies the management of access policies, especially if you have 100s or 1,000s of data objects in many databases. Unity Catalog also offers the same capabilities via REST APIs and Terraform modules to allow integration with existing entitlement request platforms or policies as code platforms.
End-to-end auditability
The Databricks Unity Catalog offers powerful auditing capabilities by capturing a detailed audit log of operations performed by users, including queries, on the data across all workloads running on Databricks. This helps you understand how your data is being used and helps ensure data security and compliance with regulatory requirements.
Unity Catalog also offers automated and real-time data lineage, down to the column level. for all workloads in any language supported by Databricks (Python, SQL, R, and Scala). This helps data teams track sensitive data for compliance and audit reporting, ensure data quality across all workloads, perform impact analysis and change management of any data changes across the lakehouse, and conduct root cause analysis of any errors in their data pipelines.
Additionally, Unity Catalog provides a user-friendly interface, APIs, and system tables to query access control lists (ACLs) on resources. This enables real-time auditing of access by resources or users, without waiting for system synchronization or audit data consolidation.
Open data collaboration with Delta Sharing
Unity Catalog natively supports Delta Sharing, an open standard for secure data sharing. This enables smooth exchange of data across clouds, regions, and data platforms, while maintaining a single copy of the data. With Delta Sharing, data providers can easily share their existing data with recipients using just a few UI clicks or SQL commands.
Data recipients can directly consume shared data in their Databricks workspaces without any ETL or interactive querying. Data recipients can also leverage a rich ecosystem of native connectors for various programming languages and BI tools, such as Microsoft Power BI, Tableau, Rust, C++, R, Go, Java, Node.JS, and Microsoft Excel. This allows them to access ready-to-query data from their preferred tool without any ETL requirement or needing to be on the Databricks platform.
Using Delta Sharing eliminates the need to load data into multiple data-sharing platforms with disparate and proprietary data formats. This reduces the cost of data sharing and avoids the hassle of maintaining multiple copies of the data for different recipients.
Getting Started with Unity Catalog on GCP
Unity Catalog is included at no extra cost with Databricks Premium tier on GCP. If you already are a Databricks customer, follow the quick start Guide. If you are not an existing Databricks customer, sign up for a free trial with a Premium workspace.
Download this free ebook on Data, analytics and AI governance to learn more about best practices to build an effective governance strategy for your data lakehouse.