Running Apache Spark on Kubernetes: Best Practices and Pitfalls - Databricks

Running Apache Spark on Kubernetes: Best Practices and Pitfalls

Since initial support was added in Apache Spark 2.3, running Spark on Kubernetes has been growing in popularity. Reasons include the improved isolation and resource sharing of concurrent Spark applications on Kubernetes, as well as the benefit to use an homogeneous and cloud native infrastructure for the entire tech stack of a company. But running Spark on Kubernetes in a stable, performant, cost-efficient and secure manner also presents specific challenges. In this talk, JY and Julien will go over lessons learned while building Data Mechanics, a serverless Spark platform powered by Kubernetes. Topics include:

  • Scalability bottlenecks of Spark on Kubernetes
  • Optimizations for highly concurrent interactive use cases
  • Specificities of data I/O on Kubernetes
  • Secure access to data via Kubernetes role-based access control
  • Automated job configuration tuning.

« back
About Jean-Yves Stephan

Data Mechanics

Jean-Yves is the CEO and Co-Founder of Data Mechanics, an automated performance tuning platform for Apache Spark which works on top of any cloud-based data platform. Prior to that, he was a software engineer and team lead at Databricks where he grew the management of Spark infrastructure from early startup days to hundreds of thousands of nodes launched in the cloud per day. Jean-Yves is passionate about simplifying data engineering operations and making it easy for anyone to operate performant and stable data pipelines at scale. He graduated from Ecole Polytechnique and Stanford University.

About Julien Dumazert

Data Mechanics

Julien is the CTO and Co-Founder of Data Mechanics, a YCombinator-backed startup with the mission to automate the often tedious mechanical work performed manually by data engineers today, starting with Spark performance and stability tuning. He previously worked as a data scientist on optimizing BlaBlaCar’s world-leading carpooling marketplace, and led the data team at the website UX optimization platform ContentSquare. He graduated from Ecole Polytechnique and ETH Zurich.