GDPR FAQs - Databricks



What is the GDPR?

GDPR stands for the EU General Data Protection Regulation, and it codifies certain rights related to personal data originating from the European Economic Area (EEA). The GDPR replaces the EU Data Protection Directive (aka Directive 95/46/EC), a EU directive that had been in place regarding data protection since 1995. The GDPR is a regulation, rather than a directive, meaning that instead of prescribing results that must be obtained and allowing each member state of the EU to put in place its own laws, the GDPR mostly harmonizes the approach for data protection and privacy throughout the entire EEA by imposing specific requirements that must be met. It comes into force on May 25, 2018.

Is my company subject to the GDPR?

The short answer is almost certainly yes. You should confirm with your privacy legal counsel, but with few exceptions, the GDPR applies to any company that collects or processes personal data of individuals located in the EEA.

What does the GDPR require me to do?

The GDPR is extraordinarily complex (the regulation spans 99 articles across 88 pages of dense legal text). However, the obligations imposed by the GDPR boil down to seven core principles:

  1. Lawfulness, fairness and transparency. Personal data shall be processed lawfully, fairly and in a transparent manner in relation to the data subject
  2. Purpose limitation. Personal data shall be collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes
  3. Data minimisation. Personal data shall be adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed
  4. Accuracy. Personal data shall be accurate and, where necessary, kept up to date
  5. Storage limitation. Personal data shall be kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed
  6. Integrity and confidentiality. Personal data shall be processed in a manner that ensures appropriate security of the personal data, including protection against unauthorised or unlawful processing and against accidental loss, destruction or damage, using appropriate technical or organisational measures
  7. Accountability. The controller shall be responsible for, and be able to demonstrate compliance with the GDPR.

Can using Databricks help me be GDPR compliant?

While there’s no product out there that can make you GDPR compliant by itself, Databricks offers some truly unique functionality that may help you with your GDPR compliance, particularly if you’re using data lakes to store personal data that might be subject to a data subject request (DSR). Please see the blog we posted on this and a webinar where we discuss how using Databricks Delta can help you process DSRs in a data lake scenario that might otherwise be nearly impossible.

Is Databricks GDPR ready? What do I need to do to do about GDPR if I am using Databricks?

Databricks’ platform is GDPR ready. If your company determines that you are subject to the GDPR and you do not yet have in place an updated data processing addendum (DPA) with us, please review and complete the instructions on our DPA.

What steps has Databricks taken to make sure its platform is able to enable its customers to comply with the GDPR?

One of the most important steps Databricks takes to help you be GDPR compliant is that we minimize the amount of data that we actually receive from you in the first place. Databricks is architected to ensure that the vast majority of customer data, including personal data, does not leave the environments specified by the customer (e.g. their cloud storage). Unlike many vendors that require customers to copy data into the supplier environment, requiring a customer to worry that the supplier won’t properly respond to a deletion request in a sufficient time to allow the customer to meet prescribed GDPR requirements, the Databricks platform is designed to allow customers to keep their data within their own cloud environment under their control. The primary exception to this is our notebooks, which will enable a customer to delete cells or the notebooks entirely in response to a data subject request. So when the customer is required to delete data under a data subject request, or wants to make sure that it knows where its data is, the customer can rest easy knowing that the processes and controls they have already set up for their data remain applicable when using the Databricks platform. You can download the Databricks GDPR Compliance Primer for a brief overview of how we address GDPR requirements.

Databricks has been making improvements to our product over the last year to prepare for GDPR. Chief among those changes are:

  • Creating functionality within our UI for our customers to be able to permanently delete notebooks and cells, along with the corresponding revision history, that may contain personal data, and making sure that, once marked for deletion, those contents are permanently purged within 30 days after being marked for deletion without any need for customers to take action;
  • Using pseudonymization techniques (in short – splitting user-specific records into a piece that cannot, by itself, identify the particular individual and a separately stored piece that can be used to return the data to an identifiable form only when needed) to add an additional layer of protection on personal data (like a user email address) that might be recorded in Databricks’ usage logs;
  • Offering Databricks Delta, a unified data management system built into the Databricks platform, that dramatically simplifies the task of being able to perform data subject requests against data stored in data lakes; and
  • Putting in place systems to be able to process data subject requests in a timely manner.

Databricks is in the process of certifying to Privacy Shield and additionally has, as part of our ongoing commitment to providing and improving our security, certified our services under SOC 2 Type II (report available under NDA) and ISO27001, and recently attested to ISO27018, the internationally recognized industry standard approach for protecting personal data in the cloud.

Does Databricks assist its customers with Data Subject Requests?

Absolutely. Databricks has created many self-service options within our product that can help you satisfy a data subject request with respect to data that you’re holding about a data subject (e.g., deleting notebooks and cells that may contain personal data). Additionally, you (through the admin on your account) may request that we export or delete personal data on behalf of your users that we may hold. If we determine that a data subject request we receive directly relates to data about your users or the individual has let us know that they believe you hold data about them, we will attempt to notify you prior to responding to such request. Please contact us at with any questions.

How can I contact Databricks’ EU representative?

Databricks has appointed a representative in the EU in accordance with Article 27 of the GDPR. You may contact our representative at