Skip to main content

What’s new in Unity Catalog Compute

Simplified cluster creation, fine-grained access control everywhere, and service credentials!

What’s new in Unity Catalog Compute

Summary

  • Cluster creation is now simpler with clearer access modes: Standard (Shared), Dedicated (Single-User), and a new Auto mode for optimal selection.
  • Dedicated clusters now allow group sharing and more control over access.
  • Unity Catalog Service Credentials are now available for managing access to external cloud services securely.

We’re making it easier than ever for Databricks customers to run secure, scalable Apache Spark™ workloads on Unity Catalog Compute with Unity Catalog Lakeguard. In the past few months, we’ve simplified cluster creation, provided fine-grained access control everywhere, and enhanced service credential integrations—so that you can focus on building workloads, instead of managing infrastructure.

What’s new? Standard clusters (formerly shared) are the new default classic compute type, already trusted by over 9,000 Databricks customers. Dedicated clusters (formerly single-user) support fine-grained access control and can now be securely shared with a group. Plus, we’re introducing Unity Catalog Service Credentials for seamless authentication with third-party services.

Let’s dive in!

Simplified Cluster Creation with Auto Mode

Databricks offers two classic compute access modes secured by Unity Catalog Lakeguard:

  • Standard Clusters Databricks’ default multi-user compute for workloads in Python, Scala, and SQL. Standard clusters are the base architecture for Databricks’ serverless products.
  • Dedicated Clusters: Compute designed for workloads requiring privileged machine access, such as ML, GPU, and R, exclusively assigned to a single user or group.

Along with updated access mode names, we’re also rolling out Auto mode, a smart new default selector that automatically picks the recommended compute access mode based on your cluster’s configuration. The redesigned UI simplifies cluster creation by incorporating Databricks-recommended best practices, helping to set up clusters more efficiently and with greater confidence. Whether you're an experienced user or new to Databricks, this update ensures that you automatically choose the optimal compute for your workloads. Please see our documentation (AWS, Azure, GCP) for more information.

Dedicated clusters: Fine-grained access control and sharing

Dedicated clusters used for workloads requiring privileged machine access, now support fine-grained access control and can be shared with a group!

Fine-grained access control (FGAC) on dedicated clusters is GA

Starting with Databricks Runtime (DBR) 15.4, dedicated clusters support secure READ operations on tables with row- and column-level masking (RLS/CM), views, dynamic views, materialized views, and streaming tables. We are also adding support for WRITES to tables with RLS/CM using MERGE INTO - sign-up for the private preview!

Since Spark overfetches data when processing queries accessing data protected by FGAC, such queries are transparently processed on serverless background compute to ensure that only data respecting UC permissions is processed on the cluster. Serverless filtering is priced at the rate of serverless jobs - you'll pay based on the compute resources you use, ensuring a cost-effective pricing model.

FGAC will automatically work when using DBR 15.4 or later with Serverless compute enabled in your workspace. For detailed guidance, refer to the Databricks FGAC documentation (AWS, Azure, GCP).

Dedicated group clusters to securely share compute

We’re excited to announce that dedicated clusters can now be shared with a group, so that for example a data scientist team can share a cluster using the machine learning runtime and GPUs for development. This enhancement reduces administrative toil and lowers costs by eliminating the need for provisioning separate clusters for each user.

Due to privileged machine access, dedicated clusters are “single-identity” clusters: they run using either a user or group identity. When assigning the cluster to a group, group members can automatically attach to the cluster. The individual user’s permissions are adjusted to the group’s permissions when running workloads on the dedicated group cluster, enabling secure sharing of the cluster across members of the same group.

Audit logs for commands executed on a dedicated group cluster capture both the group that executed the command (run_as) and whose permissions were used for the execution, and the user who run the command (run_by), in the new identity_metadata column of the audit system table, as illustrated below.

Dedicated group clusters are available in Public Preview when using DBR 15.4 or later, on AWS, Azure, and GCP. As a workspace admin, go to the Previews overview in your Databricks workspace to opt-in and enable them and start sharing clusters with your team for seamless collaboration and governance.

Introducing Service Credentials for Unity Catalog compute

Unity Catalog Service Credentials, now generally available on AWS, Azure, GCP, provide a secure, streamlined way to manage access to external cloud services (e.g., AWS Secrets Manager, Azure Functions, GCP Secrets Manager) directly from within Databricks. UC Service Credentials eliminate the need for instance profiles on a per-compute basis. This enhances security, reduces misconfigurations, and allows per-user access control (service credentials) instead of per-machine access control to cloud services (instance profiles).

Service credentials can be managed via UI, API, or Terraform. They support all Unity Catalog compute (Standard and Dedicated clusters, SQL warehouses, Delta Live Tables (DLT) and serverless compute). Once configured, users can seamlessly access cloud services without modifying existing code, simplifying integrations and governance.

To try out UC Service Credentials, go to External Data > Credentials in Databricks Catalog Explorer to configure service credentials. You can also automate the process using the Databricks API or Terraform. Our official documentation pages (AWS, Azure, GCP) provide detailed instructions.

What’s coming next

In the coming months, we have some exciting updates coming:

  • We are extending support for fine-grained access controls on dedicated clusters to be able to write to tables with RLS/CM using MERGE INTO - sign-up for the private preview!
  • Single node configuration for standard clusters will allow you to configure small jobs, clusters or pipelines to only use one machine to reduce startup time and save costs
  • New features for UC Python UDFs (available on all UC compute)
    • Use custom dependencies for UC Python UDFs, from PyPi or a wheel from UC volumes or cloud storage
    • Secure authentication to cloud services using UC service credentials
    • Improve performance by processing batches of data using vectorized UDFs
  • We will expand ML support on Standard clusters, too! You will be able to run SparkML workloads on standard clusters - sign-up for the private preview.
  • Updates to UC Volumes:
    • Cluster Log Delivery to Volumes(AWS, Azure, GCP) is available in Public Preview on all 3 clouds. You can now configure cluster log delivery to a Unity Catalog Volume destination for UC-enabled clusters with Shared or Single-user access mode. You can use the UI or API for configuration.
    • You can now upload and download files of any size to UC Volumes using the Python SDK. The previous 5 GB limit has been removed—your only constraint is the cloud provider’s maximum size limit. This feature is currently in Private Preview, with support for Go and Java SDKs, as well as the Files API, coming soon.

Getting started

Check out these capabilities using the latest Databricks Runtime release. To learn more about compute best practices for running Apache Spark™ workloads, please refer to the compute configuration recommendation guides (AWS, Azure, GCP).

Never miss a Databricks post

Subscribe to the categories you care about and get the latest posts delivered to your inbox