Databricks is trusted by the world’s largest organizations to provide a powerful lakehouse platform with high security and scalability. The Databricks multi-tenant deployment on AWS with a premium- tier subscription provides enterprise security and compliance at scale.
The Databricks architecture is split into two separate planes to simplify your permissions, avoid data duplication and reduce risk. The control plane is the management plane where Databricks runs the workspace application and manages notebooks, configuration and clusters. The data plane runs inside your AWS account, processing your data without ever taking it out of your account space.
While certain data, such as your notebooks, configurations, logs, and user information, is present within the control plane, that information is encrypted at rest within the control plane, and communication to and from the control plane is encrypted in transit. You also have choices for where certain data lives: You can host your own store of metadata about your data tables (Hive metastore), store query results in your AWS account, and you can decide whether to use the Databricks Secrets API.
Network and server security
By default, workspace clusters are created in a single AWS Virtual Private Cloud (VPC) that Databricks creates and configures in your AWS account. You can optionally create your Databricks workspaces in your own VPC, a feature known as customer-managed VPC, which can allow you to exercise more control over the infrastructure and help you comply with the specific cloud security and governance standards your organization may require.
In Databricks, all data plane connections are outbound-only. Databricks does not rewrite or change your data structure in your storage, nor does it change or modify any of your security and governance policies. Local firewalls complement security groups to block unexpected inbound connections. Customers at the enterprise tier can also use the IP access list feature on the control plane to limit which IP addresses can connect to the web UI or REST API. For example, to only allow VPN or office IPs.
Databricks clusters are typically short-lived (often terminated after a job completes) and do not persist data after they terminate. Clusters typically share the same permission level (excluding high concurrency clusters, where more robust security controls are in place). Your code is launched in an unprivileged container to maintain system stability. This security design provides protection against persistent attackers and privilege escalation.
Identity and access
Databricks supports robust ACLs, SAML 2.0 and SCIM. Many customers use built-in SAML integrations with Okta, Ping Identity, OneLogin, AAD, GSuite, ADFS or AWS. Customers can block non-SSO logins.
Databricks provides encryption, isolation and auditing. Customers can isolate users at multiple levels:
- Workspace level: Each team or department can use a separate workspace
- Cluster level: Cluster ACLs can restrict the users who can attach to a given cluster
- High-concurrency clusters: Process isolation, JVM whitelisting and limited languages (SQL, Python) allow for the safe coexistence of users of different privilege levels
- Single-user cluster: Users can create a private dedicated cluster.
Databricks supports the following compliance standards on our AWS multi-tenant platform:
- SOC 2 Type II
- ISO 27001
- ISO 27018 (privacy-focused)
- GDPR and CCPA ready