The current Spark scheduler relies on a single, centralized machine to make all scheduling decisions. However, as Spark is used on larger clusters and for shorter queries, the centralized scheduler will become a bottleneck. This talk will begin by discussing the throughput limitations of the current Spark scheduler. Next, I’ll present Sparrow, a new scheduler that uses a decentralized, random sampling approach to provide dramatically higher throughput than the current scheduler, while also providing scheduling delays of less than 10ms and fast scheduler failover. Sparrow’s superior performance make it the best choice for users who are pushing Spark to larger deployments and lower latencies.
In this talk, I’ll take a deep dive into Spark’s performance on two benchmarks (TPC-DS and the Big Data Benchmark from UC Berkeley) and one production workload and demonstrate that many commonly-held beliefs about performance bottlenecks do not hold. In particular, I’ll demonstrate that CPU (and not I/O) is often the bottleneck, that network performance can improve job completion time by a median of at most 4%, and that the causes of most stragglers can be identified and fixed. After describing the takeaways from the workloads I studied, I’ll give a brief demo of how the (open-source) tools that I developed can be used by others to understand why Spark jobs are taking longer than expected. I’ll conclude by proposing changes to Spark core that, based on my performance study, could significantly improve performance. This talk is based on a research talk that I’ll be giving at NSDI 2015.
This talk will describe Monotasks, a new architecture for the core of Spark that makes performance easier to reason about. In Spark today, pervasive parallelism and pipelining make it difficult to answer even simple performance questions like "what is the bottleneck for this workload?". As a result, it's difficult for developers to know what to optimize, and it's even more difficult for users to understand what hardware to use and what configuration parameters to set to get the best performance. Monotasks is a radically different execution model, designed to make performance easy to reason about. With this design, jobs are broken into units of work called monotasks that each use a single resource (e.g., CPU, disk, or network). Each resource is managed by a per-resource scheduler that assigns dedicated resources to each monotask; e.g., the CPU scheduler assigns each CPU monotask a dedicated CPU core. This design makes it simpler to reason about performance, because each resource is managed by a single scheduler that has complete visibility over how that resource is used. Per-resource schedulers can easily provide high utilization by scheduling concurrent tasks to match the concurrency of the underlying resource (e.g., 8 CPU monotasks to fully utilize 8 CPU cores), and allow the framework to auto-tune based on which resources are currently contended. For example, using monotasks, Spark can automatically determine whether to compress a particular dataset based on the length of the CPU and disk queues. We have implemented monotasks as an API-compatible replacement for Spark's execution layer. This talk will first present results illustrating that Monotasks makes it simple to reason about performance. Next, we'll present results demonstrating new performance optimizations enabled by Monotasks (e.g., automatically determining whether to compress data) that allow Monotasks to provide much better performance than today's Spark.