We’re excited to announce that PrivateLink connectivity for Databricks workspaces on AWS (Amazon Web Services) is now in public preview, with full support for production deployments. This release applies to all AWS regions supporting E2 architecture, as part of the Enterprise pricing tier. We have received great feedback from our global customers, including large financial services, healthcare and communications organizations, during the feature’s private preview period, as it allows them to deploy private workspaces of the Databricks Lakehouse Platform on AWS. Customers can enforce cloud-native, private-only connectivity for both front-end and back-end interfaces of Databricks workspaces, thus satisfying a major requirement of their enterprise governance policies.
Private Databricks workspaces with AWS PrivateLink overview
A Databricks workspace enables you to leverage enhanced security capabilities through a simple and well-integrated architecture. AWS PrivateLink for Databricks E2 workspaces enables the following benefits:
- Private connectivity to front-end interfaces:Configure AWS VPC (virtual private cloud) Endpoints to Databricks front-end interfaces and ensure that all user/client traffic to Notebooks, SQL Endpoints, REST API (including CLI) and Databricks Connect transits over your private network and AWS network backbone.
- Private connectivity to back-end Interfaces: If you deploy a Databricks workspace in your own-managed VPC using secure cluster connectivity, you can configure AWS VPC Endpoints to Databricks back-end interfaces and ensure that all cluster traffic to secure cluster connectivity relay and internal APIs transits over your private network and AWS network backbone.
- Increased reliability and scalability: Your data platform is now more reliable and scalable for large and extra-large workloads, as there’s no dependency to launch public IPs for cluster nodes and attaching those to the corresponding network interfaces. Additionally, the workspace traffic is not subject to bandwidth availability on public networks.
At a high level, the product architecture consists of a control/management plane and a data plane. The control plane resides in a Databricks AWS account and hosts services such as web application, cluster manager, jobs service, SQL gateway, etc. The data plane that’s in yourAWS account consists of a customer-managed VPC (minimum two subnets), Security Group and a root Amazon S3 bucket known as DBFS.
You can deploy a workspace with PrivateLink for both front-end and back-end interfaces using a combination E2 Account API and AWS CLI/Cloudformation, or using our technical-field managed Terraform Resource Provider. We recommend the latter if you already use Terraform for automating your infrastructure & configuration management.
Getting Started with Private Databricks Workspaces with AWS PrivateLink
Get started with the enhanced security capabilities by deploying Private Databricks Workspaces with AWS PrivateLink. Please refer to the following resources:
AWS PrivateLink Setup for Databricks Workspaces
Secure Cluster Connectivity Documentation
Data exfiltration protection architecture
Securely Accessing External Data Sources from Databricks on AWS
Please refer to Platform Security for Enterprises for a deeper view into how we bring a security-first mindset while building the most popular lakehouse platform on AWS.