by Siddharth Bhai, Lei Ni, Kelly Albano and Anna Shrestinian
We are excited to share new identity and access management features to help simplify the set-up and scale of Databricks for admins. Unity Catalog is at the center of governance on the Databricks Data Intelligence Platform. Part of Unity Catalog is our identity and access management capabilities, designed with the following principles:
In this blog, we'll provide a refresher on existing identity and access management features and introduce new investments to simplify the Databricks admin experience. These investments include simple logins from Power BI and Tableau, simplified single sign-on setup via unified login, OAuth authorization, and running jobs using the identity of a service principal as a security best practice.
Power BI and Tableau are two of the most popular third party data tools on Databricks. The ability to securely connect from Power BI and Tableau to Databricks with single sign-on is now generally available on AWS. Databricks leverages OAuth to allow users to access Databricks from these tools using single sign-on. This simplifies login for users and reduces the risk of leaked credentials. OAuth partner applications for Power BI and Tableau are enabled in your account by default.
To get started, check out our docs page or watch this demo video for Power BI.
Single sign-on (SSO) is a key security best practice and allows you to authenticate your users to Databricks using your preferred identity provider. At Databricks, we offer SSO across all three clouds. On Azure and GCP, we offer SSO for your account and workspaces by default in the form of Microsoft Entra ID (formerly Azure Active Directory) and Google Cloud Identity, respectively. On AWS, Databricks offers support for a variety of identity providers such as Okta, Microsoft Entra ID, and OneLogin using either SAML or OIDC.
This summer, we introduced unified login, a new feature that simplifies SSO for Databricks on AWS accounts and workspaces. Unified login allows you to manage one SSO configuration in your account and every Databricks workspace associated with it. With Single Sign-On (SSO) activated on your account, you can enable unified login for all or specific workspaces. This setup uses an account-level SSO configuration for Databricks access, simplifying user authentication across your account's workspaces. Unified Login is in use on thousands of workspaces in production already.
Unified login is GA and enabled automatically on accounts created after June 21, 2023. The feature is in public preview for accounts created before June 21, 2023. To enable unified login, see set up SSO in your Databricks account console.
We are excited to announce that OAuth for service principals is generally available on AWS. On Azure and GCP, we support OAuth via Azure and Google tokens, respectively. Service principals are Databricks identities for use with automated tools, jobs, and applications. It is a security best practice to use service principals instead of users for production automation workflows for the following reasons:
OAuth is an open standard protocol that authorizes users and service accounts to APIs and other resources without revealing the credentials. OAuth for service principals uses the OAuth client credentials flow to generate OAuth access tokens that can be used to authenticate to Databricks APIs. OAuth for service principals has the following benefits for authenticating to Databricks:
To use OAuth for service principals, see Authentication using OAuth for service principals.
Databricks Workflows orchestrates data processing, machine learning, and analytics pipelines in the Databricks Data Intelligence Platform. A Databricks job is a way to run your data processing and analysis applications in a Databricks workspace. By default, jobs run as the identity of the job owner. This means that the job assumes the permissions of the job owner and can only access data and Databricks objects that the job owner has permission to access.
We are excited to announce that you can now change the identity that the job is running as to a service principal. This means that the job assumes the permissions of that service principal instead of the owner and ensures that the job will not be affected by a user leaving your organization or switching departments. Running a job as a service principal is generally available on AWS, Azure, and GCP. Check out Run a job as a service principal in the docs to get started.
"Running Databricks workflows using service principals allows us to separate the workflows permissions, their execution, and their lifecycle from users, and therefore making them more secure and robust"— George Moldovan, Product Owner, Raiffeisen Bank International
At Databricks, we are committed to scaling with you as your organization grows. We covered a lot in today's blog, highlighting our key investments in our identity and access management platform via Unity Catalog on Databricks. With a slew of new identity and access management features now available, you might wonder what "good" looks like as you build your data governance strategy with Databricks.
We recommend you check out our identity and access management docs pages for the latest best practices (AWS | Azure | GCP) or watch our Data + AI Summit 2023 session "Best Practices for Setting Up Databricks SQL at Enterprise Scale".