by Abhinav Garg and Anna Shrestinian
Azure Databricks is a Unified Data Analytics Platform that is a part of the Microsoft Azure Cloud. Built upon the foundations of Delta Lake, MLflow, Koalas, Redash and Apache SparkTM, Azure Databricks is a first party PaaS on Microsoft Azure cloud that provides one-click setup, native integrations with other Azure cloud services, interactive workspace, and enterprise-grade security to power Data & AI use cases for small to large global customers. The platform enables true collaboration between different data personas in any enterprise, like Data Engineers, Data Scientists, Data Analysts and SecOps / Cloud Engineering.
In this article, we will share a list of cloud security features and capabilities that an enterprise data team could utilize to bake their Azure Databricks environment as per their governance policy.
Learn how Azure Databricks helps address the challenges that come with deploying, operating and securing a cloud-native data analytics platform at scale.
What does the Azure Databricks platform architecture look like, and how you could set it up in your own enterprise-managed virtual network, in order to do necessary customizations as required by your network security team.
Deploy your Azure Databricks workspace in private subnets without any inbound access to your network. Clusters will utilize a secure connectivity mechanism to communicate with the Azure Databricks infrastructure, without requiring public IP addresses for the nodes.
Configure allow-lists and block-lists to control the networks that are allowed to access your Azure Databricks workspace.
Get visibility into relevant platform activity in terms of who’s doing what and when, by configuring Azure Databricks Diagnostic Logs and other related audit logs in the Azure Cloud.
Understand the different ways of connecting Azure Databricks clusters in your private virtual network to your Azure Data Sources in a cloud-native secure manner.
Learn how to utilize cloud-native security constructs to create a battle-tested secure architecture for your Azure Databricks environment, that helps you prevent Data Exfiltration. Most relevant for organizations working with personally identifiable information (PII), protected health information (PHI) and other types of sensitive data.
Azure Databricks notebooks are stored in the scalable management layer powered by Microsoft, and are by default encrypted with a Microsoft-managed key. You could also bring your own-managed per-workspace key to encrypt the notebooks.
Azure Databricks creates a root storage account (DBFS) per workspace in customer’s subscription. By default, the storage account is encrypted with a Microsoft-managed key. You also bring your own-managed key to encrypt the DBFS storage account.
Control who has access to what data by using seamless identity federation with Azure AD under the hood, and get cloud-native visibility into who is processing the data and when. Please feel free to refer to cloud-native access control for ADLS Gen 2 and how to configure it using Azure Storage Explorer. Such access management controls, including role-based access controls, are seamlessly utilized by Azure Databricks as outlined in the passthrough article.
Wherever possible, use Azure Active Directory (AAD) tokens to utilize the non-UI capabilities of your Azure Databricks workspace, including REST API, Power BI connectivity and Databricks Connect. For running jobs workloads with REST API, we recommend using Azure Service Principals with AAD Tokens.
For use cases where you have to use the Azure Databricks Personal Access Tokens (PAT), we recommend to allow only the required users to be able to configure those tokens. If you cannot use AAD tokens for your jobs workloads, we recommend creating PAT tokens for service principals rather than individual users.
Azure Databricks is HITRUST CSF Certified to meet the required level of security and risk controls to support the regulatory requirements of our customers. It is in addition to the HIPAA compliance that’s applicable through Microsoft Azure BAA.
Attend the Azure Databricks Security Best Practices Webinar and bookmark this page, as we’ll keep it updated with the new security-related capabilities & controls. If you want to try out the mentioned features, get started by creating an Azure Databricks workspace in your own managed VNET.