Azure Databricks is a Unified Data Analytics Platform that is a part of the Microsoft Azure Cloud. Built upon the foundations of Delta Lake, MLflow, Koalas and Apache SparkTM, Azure Databricks is a first party PaaS on Microsoft Azure cloud that provides one-click setup, native integrations with other Azure cloud services, interactive workspace, and enterprise-grade security to power Data & AI use cases for small to large global customers. The platform enables true collaboration between different data personas in any enterprise, like Data Engineers, Data Scientists, Business Analysts and SecOps / Cloud Engineering.
In this article, we will share a list of cloud security features and capabilities that an enterprise data team could utilize to bake their Azure Databricks environment as per their governance policy.
Azure Databricks Security Best Practices
Learn how Azure Databricks helps address the challenges that come with deploying, operating and securing a cloud-native data analytics platform at scale.
What does the Azure Databricks platform architecture look like, and how you could set it up in your own enterprise-managed virtual network, in order to do necessary customizations as required by your network security team.
Get visibility into relevant platform activity in terms of who’s doing what and when, by configuring Azure Databricks Diagnostic Logs and other related audit logs in the Azure Cloud.
Understand the different ways of connecting Azure Databricks clusters in your private virtual network to your Azure Data Sources in a cloud-native secure manner.
Learn how to utilize cloud-native security constructs to create a battle-tested secure architecture for your Azure Databricks environment, that helps you prevent Data Exfiltration. Most relevant for organizations working with personally identifiable information (PII), protected health information (PHI) and other types of sensitive data.
Azure Databricks notebooks are stored in the scalable management layer powered by Microsoft, and are by default encrypted with a Microsoft-managed per-workspace key. You could also bring your own key to encrypt the notebooks.
Control who has access to what data by using seamless identity federation with Azure AD under the hood, and get cloud-native visibility into who is processing the data and when. Please feel free to refer to cloud-native access control for ADLS Gen 2 and how to configure it using Azure Storage Explorer. Such access management controls, including role-based access controls, are seamlessly utilized by Azure Databricks as outlined in the passthrough article.
Azure Databricks is HITRUST CSF Certified to meet the required level of security and risk controls to support the regulatory requirements of our customers. It is in addition to the HIPAA compliance that’s applicable through Microsoft Azure BAA.
Attend the Azure Databricks Security Best Practices Webinar and bookmark this page, as we’ll keep it updated with the new security-related capabilities & controls. If you want to try out the mentioned features, get started by creating an Azure Databricks workspace in your managed VNET.