Spark’s YARN support allows scheduling Spark workloads on Hadoop alongside a variety of other data-processing frameworks. The talk will be a deep dive into the architecture and uses of Spark on YARN. We’ll cover the intersection between Spark and YARNâ€™s resource management models. Attention will also be given to the different supported deploy modes and best operational practices. Finally, we’ll also discuss roadmap items, such as executor container resizing and integration with YARN’s application history store.
Sandy is a senior data scientist at Cloudera, focusing on Apache Spark and its ecosystem, and an author of the recent O'Reilly publication "Advanced Analytics with Spark." He's a Spark committer and member of the Apache Hadoop project management committee. He graduated Phi Beta Kappa from Brown University.