Tuning Apache Spark for Large-Scale Workloads - Databricks

Tuning Apache Spark for Large-Scale Workloads

Download Slides

Apache Spark is a fast and flexible compute engine for a variety of diverse workloads. Optimizing performance for different applications often requires an understanding of Spark internals and can be challenging for Spark application developers. In this session, learn how Facebook tunes Spark to run large-scale workloads reliably and efficiently. The speakers will begin by explaining the various tools and techniques they use to discover performance bottlenecks in Spark jobs. Next, you’ll hear about important configuration parameters and their experiments tuning these parameters on large-scale production workload. You’ll also learn about Facebook’s new efforts towards automatically tuning several important configurations based on nature of the workload. The speakers will conclude by sharing their results with automatic tuning and future directions for the project.ing several important configurations based on nature of the workload. We will conclude by sharing our result with automatic tuning and future directions for the project.
Session hashtag: #SFexp1

About Gaoxiang Liu

Gaoxiang is a software engineer in the Ads Data Infrastructure team at Facebook. His team focuses on the data warehouse computation and storage efficiency, stability as well as operational intelligence of data infrastructures used in Facebook Ads. He is a contributor to Apache Spark, and is passionate about leveraging efficient batch compute engines to solve real world problems at Facebook scale. Gaoxiang received M.S. in EECS at University of Michigan, Ann Arbor.