We’re making it easier than ever for Databricks customers to run secure, scalable Apache Spark™ workloads on Unity Catalog Compute with Unity Catalog Lakeguard. In the past few months, we’ve simplified cluster creation, provided fine-grained access control everywhere, and enhanced service credential integrations—so that you can focus on building workloads, instead of managing infrastructure.
What’s new? Standard clusters (formerly shared) are the new default classic compute type, already trusted by over 9,000 Databricks customers. Dedicated clusters (formerly single-user) support fine-grained access control and can now be securely shared with a group. Plus, we’re introducing Unity Catalog Service Credentials for seamless authentication with third-party services.
Let’s dive in!
Databricks offers two classic compute access modes secured by Unity Catalog Lakeguard:
Along with updated access mode names, we’re also rolling out Auto mode, a smart new default selector that automatically picks the recommended compute access mode based on your cluster’s configuration. The redesigned UI simplifies cluster creation by incorporating Databricks-recommended best practices, helping to set up clusters more efficiently and with greater confidence. Whether you're an experienced user or new to Databricks, this update ensures that you automatically choose the optimal compute for your workloads. Please see our documentation (AWS, Azure, GCP) for more information.
Dedicated clusters used for workloads requiring privileged machine access, now support fine-grained access control and can be shared with a group!
Starting with Databricks Runtime (DBR) 15.4, dedicated clusters support secure READ operations on tables with row- and column-level masking (RLS/CM), views, dynamic views, materialized views, and streaming tables. We are also adding support for WRITES to tables with RLS/CM using MERGE INTO - sign-up for the private preview!
Since Spark overfetches data when processing queries accessing data protected by FGAC, such queries are transparently processed on serverless background compute to ensure that only data respecting UC permissions is processed on the cluster. Serverless filtering is priced at the rate of serverless jobs - you'll pay based on the compute resources you use, ensuring a cost-effective pricing model.
FGAC will automatically work when using DBR 15.4 or later with Serverless compute enabled in your workspace. For detailed guidance, refer to the Databricks FGAC documentation (AWS, Azure, GCP).
We’re excited to announce that dedicated clusters can now be shared with a group, so that for example a data scientist team can share a cluster using the machine learning runtime and GPUs for development. This enhancement reduces administrative toil and lowers costs by eliminating the need for provisioning separate clusters for each user.
Due to privileged machine access, dedicated clusters are “single-identity” clusters: they run using either a user or group identity. When assigning the cluster to a group, group members can automatically attach to the cluster. The individual user’s permissions are adjusted to the group’s permissions when running workloads on the dedicated group cluster, enabling secure sharing of the cluster across members of the same group.
Audit logs for commands executed on a dedicated group cluster capture both the group that executed the command (run_as) and whose permissions were used for the execution, and the user who run the command (run_by), in the new identity_metadata column of the audit system table, as illustrated below.
Dedicated group clusters are available in Public Preview when using DBR 15.4 or later, on AWS, Azure, and GCP. As a workspace admin, go to the Previews overview in your Databricks workspace to opt-in and enable them and start sharing clusters with your team for seamless collaboration and governance.
Unity Catalog Service Credentials, now generally available on AWS, Azure, GCP, provide a secure, streamlined way to manage access to external cloud services (e.g., AWS Secrets Manager, Azure Functions, GCP Secrets Manager) directly from within Databricks. UC Service Credentials eliminate the need for instance profiles on a per-compute basis. This enhances security, reduces misconfigurations, and allows per-user access control (service credentials) instead of per-machine access control to cloud services (instance profiles).
Service credentials can be managed via UI, API, or Terraform. They support all Unity Catalog compute (Standard and Dedicated clusters, SQL warehouses, Delta Live Tables (DLT) and serverless compute). Once configured, users can seamlessly access cloud services without modifying existing code, simplifying integrations and governance.
To try out UC Service Credentials, go to External Data > Credentials in Databricks Catalog Explorer to configure service credentials. You can also automate the process using the Databricks API or Terraform. Our official documentation pages (AWS, Azure, GCP) provide detailed instructions.
In the coming months, we have some exciting updates coming:
Check out these capabilities using the latest Databricks Runtime release. To learn more about compute best practices for running Apache Spark™ workloads, please refer to the compute configuration recommendation guides (AWS, Azure, GCP).