Adventures in the TCP stack: Uncovering performance regressions in the TCP SACKs vulnerability fixes
Last month, we announced that the Databricks platform was experiencing network performance regressions due to Linux patches for the TCP SACKs vulnerabilities. The regressions were observed in less than 0.2% of cases when running the Databricks Runtime (DBR) on the Amazon Web Services (AWS) platform. In this post, we will dive deeper into our analysis that determined the TCP stack was the source of the degradation. We will discuss the symptoms we were seeing, walk through how we debugged the TCP connections, and explain the root cause in the Linux source. As a quick note before we jump in, Canonical is working on an Ubuntu 16.04 image that resolves these performance regressions. We plan to update the Databricks platform once that image is available and has passed our regression tests.
A failing benchmark
We were first alerted when one of our benchmarks became 6x slower. The regression appeared after upgrading the Amazon Machine Image (AMI) we use to incorporate Ubuntu’s fixes for the TCP SACKs vulnerabilities.Network performance regressions from TCP SACK vulnerability fixes
Update on Aug 2, 2019: Added to the end explaining our kernel patch and additional details we found. On June 17, three vulnerabilities in Linux’s networking stack were published. The most severe one could allow remote attackers to impact the system’s availability. We believe in offering the most secure image available to our customers, so...
How Databricks IAM Credential Passthrough Solves Common Data Authorization Problems
In our first blog post, we introduced Databricks IAM Credential Passthrough as a secure, convenient way for customers to manage access to their data. In this post, we'll take a closer look at how passthrough compares to other Identity and Access Management (IAM) systems. If you’re not familiar with passthrough, we suggest reading the first...
Introducing Databricks AWS IAM Credential Passthrough
As more and more analytics move to the cloud, customers are faced with the challenge of how to control which users have access to what data. Cloud providers like AWS provide a rich set of features for Identity and Access Management (IAM) such as IAM users, roles, and policies. These features allow customers to securely...
Azure Databricks – Bring Your Own VNET
Azure Databricks Unified Analytics Platform is the result of a joint product/engineering effort between Databricks and Microsoft. It’s available as a managed first-party service on Azure Public Cloud. Along with one-click setup (manual/automated), managed clusters (including Delta), and collaborative workspaces, the platform has native integration with other Azure first-party services, such as Azure Blob Storage,...
Securely Accessing External Data Sources from Databricks for AWS
Databricks Unified Analytics Platform, built by the original creators of Apache SparkTM, brings Data Engineers, Data Scientists and Business Analysts together with data on a single platform. It allows them to collaborate and create the next generation of innovative products and services. In order to create the analytics needed to power these next-gen products, Data...
Databricks Security Advisory: Critical Runc Vulnerability (CVE-2019-5736)
Databricks became aware of a new critical runc vulnerability (CVE-2019-5736) on February 12, 2019 that allows malicious container users to gain root access to the host operating system. This vulnerability affects many container runtimes, including Docker and LXC. The Databricks security team has evaluated the vulnerability and confirmed that, due to the Databricks platform architecture,...
Is your AI software vendor taking security shortcuts?
This is the final part of our blog series on security available here. In this blog, I will be talking about our culture of security at Databricks. Artificial intelligence software that can learn and improve human decision-making is transforming business. All sorts of companies are looking to AI to gain an edge over competitors. Unfortunately,...
Are you ready to scale your Data and AI initiatives? How will you scale your security?
This is Blog #3 in a series of blog posts about Databricks security. My colleagues David Cook (our CISO) and David Meyer (SVP products) laid out Databricks' approach to Security in blog #1 & blog #2. With this blog, I will be talking about deploying and operating Databricks at scale while minimizing human error. Democratize...
Do you DIY your AI? You are probably failing on security.
This is Blog #2 in a series of blog posts about Databricks security. My colleague David Cook (our CISO) laid out Databricks' approach to Security in blog #1. With this blog, I will be talking in detail about our platform. DIY Platforms: A Lack of Cohesion in Security Many companies today operate on homegrown DIY big...