by Vuong Nguyen, Zeashan Pappa and Mattia Zeni
Unity Catalog provides a unified governance solution for all data and AI assets in your lakehouse on any cloud. As customers adopt Unity Catalog, they want to do this programmatically and automatically, using infrastructure as a code approach. With Unity Catalog, there is a single metastore per region, which is the top-level container of objects in Unity Catalog. It stores data assets (tables and views) and the permissions that govern access.
This presents a new challenge for organizations that do not have centralized platform/governance teams to own the Unity Catalog management function. Specifically, teams within these organizations now have to collaborate and work together on a single metastore, i.e. how to govern access and perform auditing in complete isolation from each other.
In this blog post, we will discuss how customers can leverage the support for Unity Catalog objects in the Databricks Terraform provider to manage a distributed governance pattern on the lakehouse effectively.
We present two solutions:
As a one-off bootstrap activity, customers need to create a Unity Catalog metastore per region they operate in. This requires an account administrator, which is a highly-privileged that is only accessed in breakglass scenarios, i.e. username & password stored in a secret vault that requires approval workflows to be used in automated pipelines.
An account administrator needs to authenticate using their username & password on AWS:
Or using their AAD token on Azure:
The Databricks Account Admin needs to provide:
The Terraform code will be similar to below (AWS example)
Teams can choose not to use this default location and identity for their tables by setting a location and identity for managed tables per individual catalog, or even more fine-grained at the schema level. When managed tables are created, the data will then be stored using the schema location (if present) falling back to the catalog location (if present), and only fall back to the metastore location if the prior two locations have not been set.
When creating a metastore, we nominated the unity_admin_group
as the metastore administrator. To avoid having a central authority that can list and manage access to all objects in the metastore, we will keep this group empty
Users can be added to the group for exceptional break-glass scenarios which require a high powered admin (e.g., setting up initial access, changing ownership of catalog if catalog owner leaves the organization).
Each team is responsible for creating their own catalogs and managing access to its data. Initial bootstrap activities are required for each new team to get the necessary privileges to operate independently.
The account admin then needs to perform the following:
team-admins
Grant CREATE CATALOG, CREATE EXTERNAL LOCATION
, and optionally GRANT CREATE SHARE, PROVIDER, RECIPIENT
if using Delta Sharing to this teamWhen a new team onboards, place the trusted team admins in the team-admins group
Members of the team-admins
group can now easily create new catalogs and external locations for their team without interaction from the account administrator or metastore administrator.
During the process of adding a new team to Databricks, initial activities from an account administrator is required so that the new team is free to set up their workspaces / data assets to their preference:
Some organizations may not want to make teams autonomous in creating assets in their central metastore. In fact, giving multiple teams the ability to create such assets can be difficult to govern, naming conventions cannot be enforced and keeping the environment clean is hard.
In such a scenario, we suggest a model where each team files a request with a list of assets they want admins to create for them. The team will be made owner of the assets so they can be autonomous in assigning permissions to others.
To automate such requests as much as possible, we present how this is done using a CI/CD. The admin team owns a central repository in their preferred versioning system where they have all the scripts that deploy Databricks in their organization. Each team is allowed to create branches on this repository to add the Terraform configuration files for their own environments using a predefined template (Terraform Module). When the team is ready, they create a pull request. At this point, the central admin has to review (this can be also automated with the appropriate checks) the pull request and merge it to the main branch, which will trigger the deployment of the resources for the team.
This approach allows one to have more control over what individual teams do, but it involves some (limited, automatable) activities on the central admins' team.
In this scenario, the Terraform scripts below are executed automatically by the CI/CD pipelines using a Service Principal (00000000-0000-0000-0000-000000000000), which is made account admin. The one-off operation of making such service principal account admin must be manually executed by an existing account admin, for example:
When a new team wants to be onboarded, they need to file a request that will create the following objects (Azure example):
team_X_admins
, which contains the Account Admin Service Principal (to allow future modifications to the assets) plus the members of the groupOnce these objects are created the team is autonomous in developing the project, giving access to other team members and/or partners if necessary.
Teams are not allowed to modify assets autonomously in Unity Catalog either. To do this they can file a new request with the central team by modifying the files they have created and make a new pull request.
This is true also if they need to create new assets such as new storage credentials, external locations and catalogs.
Above, we walked through some guidelines on leveraging built-in product features and recommended best practices to handle enablement and ongoing management hurdles for Unity Catalog.
Visit the Unity Catalog documentation [AWS, Azure], and our Unity Catalog Terraform guide [AWS, Azure] to learn more