No One Puts Spark in the Container - Databricks

No One Puts Spark in the Container

Download Slides

The current craze of Docker has everyone sticking their processes inside a container… but do you really understand cgroups and how they work? Do you understand the difference between CPU Sets and CPU Shares? Spark is a Scala application that lives inside a Java Runtime, do you understand the consequence of what impact the cgroup constraints have on the JRE? This talk starts with a deep understand of Java’s memory management and GC characteristics and how JRE characteristics change based on core count. We will continue the talk looking at containers and how resource isolation works. The session will detail specifically the difference between CPU sets and CPU shares and memory management. The session will close with a deep understanding of the consequences of running the JRE in a CPU share environment and the potential for pseudo-random behavior of running in a heterogeneous datacenter.

About Ken Sipe

Ken Sipe is a Cloud Solution Architect and Distributed Systems Engineer at Mesosphere, focused on helping companies simplify the development and operation of large scale infrastructure and distributed systems with Apache Mesos and DC/OS. As a JavaOne Rockstar, Ken is an author and award winning international speaker on the practices of software architecture and engineering, continuous delivery and agile practices. Ken is also an Apache Mesos contributor and an Apache committer on a number of Apache Mesos frameworks such as: Marathon, Myriad, HDFS, Kafka and Cassandra.

About Jorg Schad

Jörg Schad is a Developer Evangelist at Mesosphere who works on DC/OS and Apache Mesos. Prior to this he worked on SAP Hana and in the Information Systems Group at Saarland University. His passions are distributed (database) systems, data analytics, and distributed algorithms and his speaking experience include various Meetups, international conferences, and lecture halls.