Azure Databricks is a Unified Data Analytics Platform that is a part of the Microsoft Azure Cloud. Built upon the foundations of Delta Lake, MLflow, Koalas, Redash and Apache SparkTM, Azure Databricks is a first party PaaS on Microsoft Azure cloud that provides one-click setup, native integrations with other Azure cloud services, interactive workspace, and enterprise-grade security to power Data & AI use cases for small to large global customers. The platform enables true collaboration between different data personas in any enterprise, like Data Engineers, Data Scientists, Data Analysts and SecOps / Cloud Engineering.
In this article, we will share a list of cloud security features and capabilities that an enterprise data team could utilize to bake their Azure Databricks environment as per their governance policy.
Azure Databricks security best practices
Security that unblocks the true potential of your data lake
Learn how Azure Databricks helps address the challenges that come with deploying, operating and securing a cloud-native data analytics platform at scale.
Bring your own network
What does the Azure Databricks platform architecture look like, and how you could set it up in your own enterprise-managed virtual network, in order to do necessary customizations as required by your network security team.
Enable secure cluster connectivity
Deploy your Azure Databricks workspace in private subnets without any inbound access to your network. Clusters will utilize a secure connectivity mechanism to communicate with the Azure Databricks infrastructure, without requiring public IP addresses for the nodes.
Control which networks are allowed to access a workspace
Configure allow-lists and block-lists to control the networks that are allowed to access your Azure Databricks workspace.
Trust but verify with Azure Databricks
Get visibility into relevant platform activity in terms of who’s doing what and when, by configuring Azure Databricks Diagnostic Logs and other related audit logs in the Azure Cloud.
Securely accessing Azure Data sources from Azure Databricks
Understand the different ways of connecting Azure Databricks clusters in your private virtual network to your Azure Data Sources in a cloud-native secure manner.
Data exfiltration protection with Azure Databricks
Learn how to utilize cloud-native security constructs to create a battle-tested secure architecture for your Azure Databricks environment, that helps you prevent Data Exfiltration. Most relevant for organizations working with personally identifiable information (PII), protected health information (PHI) and other types of sensitive data.
Enable customer-managed key for managed services
Azure Databricks notebooks are stored in the scalable management layer powered by Microsoft, and are by default encrypted with a Microsoft-managed key. You could also bring your own-managed per-workspace key to encrypt the notebooks.
Enable customer-managed key for DBFS
Azure Databricks creates a root storage account (DBFS) per workspace in customer’s subscription. By default, the storage account is encrypted with a Microsoft-managed key. You also bring your own-managed key to encrypt the DBFS storage account.
Simplify data lake access with Azure AD Credential Passthrough
Control who has access to what data by using seamless identity federation with Azure AD under the hood, and get cloud-native visibility into who is processing the data and when. Please feel free to refer to cloud-native access control for ADLS Gen 2 and how to configure it using Azure Storage Explorer. Such access management controls, including role-based access controls, are seamlessly utilized by Azure Databricks as outlined in the passthrough article.
Authenticate using Azure Active Directory tokens
Wherever possible, use Azure Active Directory (AAD) tokens to utilize the non-UI capabilities of your Azure Databricks workspace, including REST API, Power BI connectivity and Databricks Connect. For running jobs workloads with REST API, we recommend using Azure Service Principals with AAD Tokens.
Token management for Personal Access Tokens
For use cases where you have to use the Azure Databricks Personal Access Tokens (PAT), we recommend to allow only the required users to be able to configure those tokens. If you cannot use AAD tokens for your jobs workloads, we recommend creating PAT tokens for service principals rather than individual users.
Azure Databricks is HITRUST CSF Certified
Azure Databricks is HITRUST CSF Certified to meet the required level of security and risk controls to support the regulatory requirements of our customers. It is in addition to the HIPAA compliance that’s applicable through Microsoft Azure BAA.
What’s next?
Attend the Azure Databricks Security Best Practices Webinar and bookmark this page, as we’ll keep it updated with the new security-related capabilities & controls. If you want to try out the mentioned features, get started by creating an Azure Databricks workspace in your own managed VNET.