Mikhail Genkin is the DCOS Architect at the IBM Platform Symphony group. I focus on analyzing, optimizing and itegrating with IBM products open source resource manager software such as Apache Mesos and YARN. I specialize in performance bench-marking and analysis. In my past roles I managed incubation projects for high-performance analytics solutions, and contributed to many IBM products such as WebSphere Commerce, Rational Application Developer, WebSphere Application Server, WebSphere Process Server, WebSphere Portal, and Power systems servers.
When you run an Apache Spark application on a large cluster, you want to make sure you’re getting the most from that cluster. Any CPU or memory left on the table represents either a waste of money or a lost opportunity to speed up your Spark jobs. What many people don’t realize is how sensitive Spark cluster utilization is to the resource manager. Resource managers decide how to allocate cluster resources among the many users and applications contending for them. In this deep dive session, we will discuss how Spark integrates with two common open source resource managers, YARN and Mesos, as well as a new commercial product called IBM Spectrum Conductor with Spark. You will learn how resource managers arbitrate resources in multi-user/multi-tenant Spark clusters, and how this affects application performance. You will come away with new techniques for tuning Spark resource management to optimize goals like speed and fairness. The session will include a demo of a new open source benchmark designed to help analyse Spark multi-user/multi-tenant performance. The benchmark uses Spark SQL and machine learning jobs to load the cluster, and can be used during a pre-production cycle to tune Spark and resource manager configurations. Session hashtag: #SFdd1