This is Blog #2 in a series of blog posts about Databricks security. My colleague David Cook (our CISO) laid out Databricks' approach to Security in blog #1. With this blog, I will be talking in detail about our platform.
Many companies today operate on homegrown DIY big data and AI platforms comprised of various open-source tools and technologies. These patchwork platforms pit data scientists against data engineers and put the entire organization at risk of a security breach. On one hand, data scientists demand the latest AI tools and prioritize speed and productivity. They see IT and security as slow, and inflexible. On the other hand, data engineers are incented to maintain pristine data on a secure, supportable, production-ready platform. Moreover, each of these teams requires their own toolsets to deliver on the needs of their job. These tools are often times disjointed and data workflows cannot be tracked end-to-end. This results in data engineering and data science teams working in silos with the following challenges:
From a productivity perspective:
From a security and regulatory perspective:
To overcome these obstacles, data scientists end up hiring data engineers into their teams, and data engineers do the same, busting budgets and creating duplicative functions. In the end, you’ve got teams that can’t work together, are not accountable, and blame each other for creating an untenable situation. This can create significant exposure of a data breach or regulatory infringement putting companies at risk in ways they may not have even considered. I believe, there’s a better way to meet the needs of data science teams than DIY AI.
Databricks brings data engineering and data science teams together in a unified analytics platform giving the data scientist the agility they want while providing data engineers a consistent, secure and reliable toolset. That unification is key. Unlike DIY solutions, we provide a coherent security model across the entire data workflow. We enable you to set up the security once for your data framework and it can then be used for your data processing and as well as your ML and AI needs.
Our unified analytics platform provides the security you require while enabling teams to work together to drive innovation. At the core of our approach are the following:
1. Security as a Core Design Principle—Databricks is a cloud-native platform that was designed with security as a first-class citizen from day one and is
2. Consistent, Reliable and Compatible Workflows - Data Scientists are constantly looking for the latest software with the latest models. Even a half percent improvement in ML models could impact millions of dollars in revenue. DIY requires teams to integrate their own Data Engineering Tools (Spark, Hive etc) and Data Science tools (SparkML, Tensorflow, Keras etc). Databricks does that work for you by providing a single unified platform with streamlined workflows. No longer do you need to worry about interoperability issues. Patching, configuration is all taken care off and the whole system is pen tested and monitored by world-leading experts. This makes it easy for IT and Security teams to maintain security across the entire workflow with ease.
3. Secure and Transparent Collaboration - Databricks lets both Data Engineering and Data scientist teams work together in a single shared workspace. Databricks notebook can be shared by multiple teams with commenting and versioning much like google docs. Not only does this enhance collaboration and but is a single interface to control, track and audit user access of data. Fine-grained access controls let you govern data not only at a bucket, file level but also at a row and column level. With this level of control, you could give database access to a wide swath of people but block out specific sensitive columns such as credit card numbers and social security numbers.
When leveraged to its full potential, data can be a true differentiator for your company. We want to empower your teams to generate business value using your preferred frameworks and libraries for AI while ensuring data security and regulatory compliance.
If you are currently working through the challenges posed by DIY AI, we can help identify security gaps in your current data analytics set up and show you how you can address those more effectively with a single platform solution.