Deep Dive Into Apache Spark Multi-User Performance - Databricks

Deep Dive Into Apache Spark Multi-User Performance

Download Slides

When you run an Apache Spark application on a large cluster, you want to make sure you’re getting the most from that cluster. Any CPU or memory left on the table represents either a waste of money or a lost opportunity to speed up your Spark jobs. What many people don’t realize is how sensitive Spark cluster utilization is to the resource manager. Resource managers decide how to allocate cluster resources among the many users and applications contending for them. In this deep dive session, we will discuss how Spark integrates with two common open source resource managers, YARN and Mesos, as well as a new commercial product called IBM Spectrum Conductor with Spark. You will learn how resource managers arbitrate resources in multi-user/multi-tenant Spark clusters, and how this affects application performance. You will come away with new techniques for tuning Spark resource management to optimize goals like speed and fairness. The session will include a demo of a new open source benchmark designed to help analyse Spark multi-user/multi-tenant performance. The benchmark uses Spark SQL and machine learning jobs to load the cluster, and can be used during a pre-production cycle to tune Spark and resource manager configurations.
Session hashtag: #SFdd1

About Mikhail Genkin

Mikhail Genkin is the DCOS Architect at the IBM Platform Symphony group. I focus on analyzing, optimizing and itegrating with IBM products open source resource manager software such as Apache Mesos and YARN. I specialize in performance bench-marking and analysis. In my past roles I managed incubation projects for high-performance analytics solutions, and contributed to many IBM products such as WebSphere Commerce, Rational Application Developer, WebSphere Application Server, WebSphere Process Server, WebSphere Portal, and Power systems servers.

About Peter Lankford

Peter Lankford is founder and director of STAC®, the Securities Technology Analysis Center, which provides hands-on technology research and testing tools to the finance industry. STAC facilitates the STAC Benchmark Council™, a group of leading financial institutions and vendors that discusses technical challenges and specifies standard ways to assess technologies used in the financial markets. Prior to STAC, Peter was SVP of the the $240M market data technology business at Reuters and held management positions at Citibank, First Chicago, and operating-system maker IGC. Peter has an MBA, Masters in International Relations, and Bachelors in Chemistry from the University of Chicago.