Use Databricks Pools to Speed up your Data Pipelines and Scale Clusters Quickly
Data Engineering teams deploy short, automated jobs on Databricks. They expect their clusters to start quickly, execute the job, and terminate. Data Analytics teams run large auto-scaling, interactive clusters on Databricks. They expect these clusters to adapt to increased load and scale up quickly in order to minimize query latency. Databricks is pleased to announce...
Adventures in the TCP stack: Uncovering performance regressions in the TCP SACKs vulnerability fixes
Last month, we announced that the Databricks platform was experiencing network performance regressions due to Linux patches for the TCP SACKs vulnerabilities. The regressions were observed in less than 0.2% of cases when running the Databricks Runtime (DBR) on the Amazon Web Services (AWS) platform. In this post, we will dive deeper into our analysis that determined the TCP stack was the source of the degradation. We will discuss the symptoms we were seeing, walk through how we debugged the TCP connections, and explain the root cause in the Linux source. As a quick note before we jump in, Canonical is working on an Ubuntu 16.04 image that resolves these performance regressions. We plan to update the Databricks platform once that image is available and has passed our regression tests.
A failing benchmark
We were first alerted when one of our benchmarks became 6x slower. The regression appeared after upgrading the Amazon Machine Image (AMI) we use to incorporate Ubuntu’s fixes for the TCP SACKs vulnerabilities.Meltdown and Spectre: Exploits and Mitigation Strategies
In an earlier blog post, we analyzed the performance impact of Meltdown and Spectre on big data workloads in the cloud. In this blog post, we explain these exploits, their mitigation strategies and how they impact Databricks from a security and performance perspective. Meltdown Meltdown breaks a fundamental assumption in operating system security: an...
Meltdown and Spectre’s Performance Impact on Big Data Workloads in the Cloud
Last week, the details of two industry-wide security vulnerabilities, known as Meltdown and Spectre, were released. These exploits enable cross-VM and cross-process attacks by allowing untrusted programs to scan other programs’ memory. On Databricks, the only place where users can execute arbitrary code is in the virtual machines that run Apache Spark clusters. There,...