SparkLint: a Tool for Monitoring, Identifying and Tuning Inefficient Spark Jobs Across Your Cluster

Download Slides

Spark makes it easy to build and deploy complex data processing applications onto shared compute platforms, but tuning them is often overlooked. Uncontrolled, this leads to over specified resource requirements, unnecessary platform load and increases the chances of resource contention, degrading overall performance. By identifying inefficient jobs, development teams and platform administrators can wrestle back control of system resources and ameliorate the effects of the tragedy of the commons that can afflict a widely shared cluster. SparkLint uses the Spark metrics API and a custom event listener to analyze individual Spark jobs for over specified or unbalanced resources, incorrect partitioning and sub optimal worker locality. It is easily attached to any Spark job, presenting data for analysis through a web UI and providing recommendations for common performance pitfalls. Currently a popular internal Groupon tool, this presentation will be an OSS debut for SparkLint and will include details of the development roadmap and how to contribute.

« back
About Simon Whitear

Simon is a Principal Engineer at Groupon specializing in high throughput stream processing and distributed system architectures whilst providing a consultative role across multiple teams and technologies. As a polyglot developer with wide business experience he is adept at defining and driving solutions to complex problems across multiple domains. Previously he built an extensive career working in financial services engineering, building everything from large analytical databases and reporting tools, through complex integration systems, to HFT trading interfaces and data capture systems.