Databricks Enterprise Security

for Apache Spark

A comprehensive security framework you can trust

Built on industry-leading infrastructure, designed with best-in-class security features, and rigorously audited, Databricks takes an innovative and holistic approach to addressing enterprise security for Spark natively within Databricks’ Unified Analytics Platform.

Databricks’ security program is based on the following guiding principles:

TRUST
Long history of third party security attestations and security leaders who have considerable experience working with customer security teams.
TECHNOLOGY
Deliver a holistic approach to security across customer data, application, host, network, physical, logging and monitoring, policies, procedures and awareness.
TRANSPARENCY
Provide full attestation reports (SOC 2 Type 2, HIPAA, ISO 27001), detailed architecture overview, data flows, and penetration testing reports.

Dedicated Security Team

Databricks has a dedicated security team that is responsible for infrastructure security, application security, security operations, and compliance. The Security Team partners with engineering and is involved in all phases of the development process including security design reviews, code testing, security testing of new features, penetration testing, and it provides secure coding training. All members of the team have technical degrees and hold a variety of security certifications including CISSP, CISM, CISA, and CEH.

Defense in Depth

Databricks employs a multi-layered approach to security and data protection — providing the most advanced level of defense for your data and Apache Spark™-based systems from malicious attacks.

Databricks designs and implements its Defense in Depth strategy based on AWS- and Azure- shared responsibility models and security best practices.

Data
  • Data Encryption: We use the latest version of TLS and strong encryption from AWS and KMS and Azure Key Vault.
  • Access Controls: Fine-grained access control to notebooks, workspaces, jobs, and clusters.
  • Databricks Access: Automated control over Databricks access to customer data.
  • Data Governance: Customer data is persisted in designated AWS and Azure regions.
  • Backups: Automated scheduled backups of metadata and systems every 24 hours.
  • Retention and Deletion: Adherence to strict data retention policies in compliance customer requirements.
Application
  • Secure System Development Lifecycle (SSDLC): Adhere to security processes and checks that are an integral part of development.
  • Security QA and Penetration Testing: Rid platform of security defects with rigorous security and pen testing.
  • Developer Security Training: Educate developers on security principals essential for their role.
  • Threat Modeling: Assess major risks to design and implement preventative security controls.
Infrastructure
  • Access Control: Control over inbound and outbound traffic leveraging security groups (AWS) and network security groups (Azure).
  • Logging and Monitoring: Comprehensive logging and monitoring for security events.
Host
  • Hardening: All hosts run the latest stable release on Ubuntu (Data Plane) and CoreOS (Control Plane) operating systems and are hardened according to industry best practices.
  • Scanning: Hosts are scanned monthly for vulnerabilities.
  • Patching and Updates: Hosts are patched periodically for security updates and critical patch fixes.
Logging and Monitoring
  • Databricks has implemented logging and monitoring at all layers leveraging AWS and Azure native functionality, Databricks native functionality, as well as leading logging and monitoring tools.
  • Databricks provides its customers with application Audit Log features and best practices for logging and monitoring of events within customers’ Databricks deployments.
Physical
  • AWS and Azure data centers are frequently audited and comply with a comprehensive set of frameworks including ISO 27001, SOC 1, SOC 2, SOC 3, PCI DSS.
  • AWS and Azure physical data centers are located in secret locations and have stringent physical access controls in place to ensure that no unauthorized access is permitted including biometric access controls and twenty-four-hour armed guards and video surveillance.
End User Security Features

Databricks takes a holistic approach to solving the enterprise security challenge by building all the facets of security — encryption, identity management, role-based access control, data governance, and compliance standards — natively into the Unified Analytics platform.

  • Single Sign-On (SSO): Allows customers to authenticate their employees using customer’s existing identity provider with SAML 2.0 protocol. The following providers are officially supported:
    • OKTA
    • Google for Work
    • OneLogin
    • Ping Identity
    • Microsoft Windows Active Directory
  • Role-based Access Controls: Allows customers to apply their access control policy leveraging Databricks Cluster AWS IAM and Microsoft Active Directory roles, notebook ACL, workspace ACL, jobs ACL, cluster ACL, and library ACL.
  • Audit Logs: Provide customers with insight into event taken within their Databricks deployment.

Databricks Compliance & Assurance Program

Databricks engages with independent CPA firms to perform annual and semi-annual audits. Both firms are registered with Public Company Accounting Oversight Board (PCAOB) and subject to strict auditing standards, inspections, and enforcement.

Below are our certifications and compliance attestations:

SOC 2 Type 2

The SOC 2 report focuses on a business’s non-financial reporting controls as they relate to security, availability, processing integrity, confidentiality, and privacy of a system, as opposed to SOC 1/SSAE 16 which is focused on the financial reporting controls. Each of the principles have defined criteria (controls) which must be met to demonstrate adherence to the principles and produce an unqualified opinion (no significant exceptions found during your audit). The great thing about the trust principles is that the criteria businesses must meet are predefined, which makes it easier for business owners to know what compliance needs are required and for users of the report to read and assess the adequacy. Details about the Trust Service principles that Databricks is audited against are as follows:

  • Security: The system is protected, both logically and physically, against unauthorized access.
  • Availability: The system is available for operation and use as committed or agreed to.
  • Confidentiality: Information that is designated “confidential’ is protected as committed or agreed.
HIPAA

HIPAA (Health Insurance Portability and Accountability Act of 1996) is United States legislation that provides data privacy and security provisions for safeguarding medical information.

Databricks is architected in compliance with HIPAA’s Security Rule technical safeguards including end-to-end encryption, access and authentication, and comprehensive logging and monitoring controls.

Databricks will sign a business associate agreement (BAA) with customers upon request.

ISO 27001

ISO 27001 is a compliance framework that establishes Information Security Management System (ISMS) standards to identify and manage information risks through a comprehensive set of company-wide processes and controls. Additionally, ISMS embodies principles of continuous improvement to keep abreast with changes in the threats landscape to address them proactively.

Download ISO Certificate

ISO 27018

Expected in Q2 2018.

ISO 27018 is a code of practice that focuses on protection of personal data in the cloud. It is based on ISO information security standard 27002 and provides implementation guidance on ISO 27002 controls applicable to public cloud Personally Identifiable Information (PII). It also provides a set of additional controls and associated guidance intended to address public cloud PII protection requirements not addressed by the existing ISO 27002 control set.

FedRAMP

Planned for 2018/2019.

The Federal Risk and Authorization Management Program, or FedRAMP, is a government-wide program that provides a standardized approach to security assessment, authorization, and continuous monitoring for cloud products and services. This approach uses a “do once, use many times” framework that saves an estimated 30-40% of government costs, as well as both time and staff required to conduct redundant agency security assessments. FedRAMP is the result of close collaboration with cybersecurity and cloud experts from the General Services Administration (GSA), National Institute of Standards and Technology (NIST), Department of Homeland Security (DHS), Department of Defense (DOD), National Security Agency (NSA), Office of Management and Budget (OMB), the Federal Chief Information Officer (CIO) Council and its working groups, as well as private industry.