Announcing the Public Preview of Azure Databricks support for Azure confidential computing
We are excited to announce Azure Databricks support for Azure confidential computing (ACC) in preview! With this announcement, customers can run their Azure Databricks workloads on Azure confidential virtual machines (VMs). With support for ACC, customers can build an end-to-end data platform on the Databricks Lakehouse with increased confidentiality and privacy by encrypting data in use. This builds on support for customer-managed keys (CMK) for encrypting data at rest.
This blog post will discuss confidential computing and its use cases, the security benefits of using Azure Databricks on Azure confidential computing (ACC), and our partnership with Microsoft.
What is Confidential Computing, and what are some potential customer use cases with Databricks?
Confidential computing is an industry term defined by the Confidential Computing Consortium (CCC). The CCC is a community at the Linux Foundation dedicated to defining and accelerating the adoption of confidential computing. They define confidential computing as: The protection of data in use by performing computations in a hardware-based, attested Trusted Execution Environment (TEE).
Organizations that require confidential computing are typically from regulated industries that handle and produce highly sensitive data subject to strict privacy laws and regulatory requirements. Confidential computing also attracts organizations with extremely valuable intellectual property that they want to keep secret.
By leveraging the advanced security of confidential computing, customers can process even their most sensitive data in the cloud, empowering them to unlock the full potential of AI. With this announcement, the Databricks Lakehouse platform offers customers a comprehensive solution for their data, analytics, and AI needs. Typical use cases that may require confidential computing include:
- Anti-money laundering: The rapid transition to digital banking has led to a staggering volume of highly sensitive banking transactions, creating a pressing need for enhanced data protection measures like confidential computing to combat the expanding landscape of money laundering. The Databricks Lakehouse platform empowers organizations to implement scalable Anti-Money Laundering (AML) solutions by combining data lakes and data warehouses, facilitating efficient management and collaboration in AML processes.
- Fraud prevention: Confidential computing is crucial for fraud prevention as it safeguards sensitive data, enhances security during fraud detection processes, and fosters trust by preventing unauthorized access or tampering. With Databricks Lakehouse, financial services institutions can create machine learning fraud detection data pipelines and visualize the data in real-time by leveraging a framework for building modular features from large data sets.
- Adverse Drug Event Detection: Confidential computing is essential for adverse drug detection data. It ensures the secure processing and analysis of sensitive patient information, preserving privacy and facilitating accurate identification of potential adverse drug reactions. Databricks Lakehouse provides a modern, scalable data and AI platform that can provide scientifically rigorous, near real-time insights for healthcare organizations to do this detection effectively.
Build your Data and AI strategy for sensitive data sets with increased security from confidential computing
Customers can now feel empowered to use the Databricks Lakehouse platform for their most sensitive and regulated data. Azure Databricks on Azure confidential computing provides the following security and privacy benefits:
- Protect data in use: Secure your data with in-memory encryption that verifies your underlying cloud environment before processing it. This type of data protection complements existing security controls, such as customer-managed keys for data at rest, and private link via secure protocols like TLS and HTTPS for data in transit.
- Leverage other security, compliance, and privacy-enhancing products from Databricks:
- Unity Catalog provides unified governance for all data, analytics, and AI assets, including files, tables, dashboards, and machine learning models in your lakehouse on any cloud. It creates a single pane of glass for managing access permissions and audit controls to map, secure, and audit your data.
- Delta Sharing provides an open solution to securely share live data from your lakehouse to any computing platform. It allows you to confidently share data assets with suppliers and partners for better coordination of your business while meeting security and compliance needs.
- Available on Azure this summer, Enhanced Security and Compliance ("ESC") provides enhanced hardening of the Databricks environment, specifically designed to protect the most sensitive data and provide the means to run cloud-ready HIPAA, PCI-DSS, and FedRAMP Moderate workloads.
A Powerful Collaboration in Confidential Computing
"Databricks and Microsoft have collaborated towards enabling customers with their Lakehouse workloads. We are pleased to be the first cloud provider to enable Databricks users to analyze their most sensitive data in the cloud by running their clusters on AMD SEV-SNP confidential VMs, allowing protection of this data while it is in use in memory."— Lindsey Allen, General Manager, Azure Databricks, Microsoft
We are excited to collaborate with Microsoft to bring Azure Databricks to Azure confidential computing. Microsoft has long been a thought leader in the field of confidential computing. When Azure introduced "confidential computing" in the cloud, they became the first cloud provider to offer confidential computing virtual machines and confidential container support in Kubernetes for customers to run their most sensitive workloads inside Trusted Execution Environments (TEEs).
Together, Databricks and Azure provide a robust and secure data platform for confidential computing. The confidential VMs used on ACC feature AMD EPYCTM processors that are designed to run a variety of workloads, including high performance computing, while protecting data with memory encryption provided by AMD SEV-SNP technology. These processors provide powerful, cost-effective delivery of a wide range of machine learning and AI workloads on confidential computing.
Getting Started with Azure Databricks on Azure confidential computing
These VMs will be rolled out and made available for Azure Databricks users over the next few days. Review our documentation or watch the demo below to see how easy it is to get up and running quickly - you simply select an ACC VM for your workloads.
Tune into Microsoft Build this week to learn more about the recent innovations with Azure confidential computing. We hope to see you at our Data and AI Summit in San Francisco on June 26-29, 2023. Register today!