As the volume, velocity and variety of data grows, organizations are increasingly relying on staunch data governance practices to ensure their core business outcomes are adequately met. Unity Catalog is a fine-grained governance solution for data and AI powering the Databricks Lakehouse. It helps simplify the security and governance of your enterprise data assets by providing a centralized mechanism to administer and audit data access.
Taking a journey down memory lane, before Unity Catalog unified the permission model for files, tables and added support for all languages, customers were implementing fine-grained data access control on Databricks using the legacy workspace-level Table ACL (TACL), which were essentially restricted to certain cluster configurations and worked only for Python & SQL. Both Unity Catalog & TACL let you control access to securable objects like catalogs, schemas (databases), tables, views, but there are some nuances in how each access model works.
A good understanding of the object access model is essential for implementing data governance at scale using Unity Catalog. Even more so, if you have already implemented the Table ACL model and are looking to upgrade to Unity Catalog to take advantage of all the newest features, such as multi-language support, centralized access control and data lineage.
The Axioms of Unity Catalog access model
Some more complex axioms
Interesting patterns
There are many governance patterns that can be achieved using the Unity Catalog access model.
Example 1 - Consistent permissions across workspaces
Axiom 1 allows product team to define permissions for their data product within their own workspace, and having those reflected and enforced across all other workspaces, no matter where their consumers are coming from
Example 2 - Setting boundary for data sharing
Axiom 2 allows catalog/schema owners to set up default access rules for their data. For example the following commands enable the machine learning team to create tables within a schema and read each other's tables:
More interestingly, axiom 4 now allows catalog/schema owners to limit how far individual schema and table owners can share data they produce. A table owner granting SELECT to another user does not allow that user read access to the table unless they also have been granted USE CATALOG privileges on its parent catalog as well as USE SCHEMA privileges on its parent schema.
In the below example, sample_catalog is owned by user A, user B created a sample_schema schema, and table 42. Even though USE SCHEMA and SELECT permission is granted to the analysts team, they still cannot query the table, due to permission boundary set by user A
Example 3 - Easier sharing of business logic
Data consumers have a need to share their workings and transformation logic, and a reusable way of doing it is by creating and sharing views to other consumers.
Axiom 5 unlocks the ability for data consumers to do this seamlessly, without requiring manual back and forth with the table owners.
Example 4 - No more data leakage
Thanks to axiom 6, data owners can be certain that there will be no unauthorized access to their data due to cluster misconfiguration. Any cluster that is not configured with the correct access mode will not be able to access data in Unity Catalog.
Users can check that their clusters can access Unity Catalog data thanks to this handy tooltip on the Create Clusters page
Now that data owners can understand the data privilege model and access control, they can leverage Unity Catalog to simplify access policy management at scale.
There are upcoming features that will further empower data administrators and owners to author even more complex access policy: