While Spark and Mesos emerged together from the AMPLab at Berkeley, Mesos is now one of several clustering options for Spark, along with Hadoop YARN, which is growing in popularity, and Spark’s “standalone” mode. This talk describes in detail the integration between Spark and Mesos to support clustering of Spark jobs, including the sequence of events that occurs during the life cycle of a typical Spark job. We’ll discuss recommendations for optimizing performance and resource utilization, and to avoid known limitations. We’ll also discuss possible future work for Spark on Mesos. Along the way, we’ll understand the abstractions that Spark exposes for clustering, in general. We’ll also compare and contrast Spark on Mesos vs. Spark Standalone mode and Spark on YARN. We’ll offer suggestions for when to choose one option vs. the others.
Timothy Chen is the CTO of Hyperpilot, and also a PMC/committer on Apache Drill and Apache Mesos. Before joining Hyperpilot, Timothy was the lead engineer at Mesosphere working on container runtime and Spark on Mesos.