Notebooks: they enable our users, but they can cripple our clusters. Let’s fix that. Notebooks have soared in popularity at companies world-wide because they provide an easy, user-friendly way of accessing the cluster-computing power of Spark. But the more users you have hitting a cluster, the harder it is to manage the cluster resources as big, long-running jobs start to starve out small, short-running jobs. While you could have users spin up EMR-style clusters, this reduces the ability to take advantage of the collaborative nature of notebooks. It also quickly becomes expensive as clusters sit idle for long periods of time waiting on single users. What we want is fair, efficient resource utilization on a large single cluster for a large number of users. In this talk we’ll discuss dynamic allocation and the best practices for configuring the current version of Spark as-is to help solve this problem. We’ll also present new improvements we’ve made to address this use case. These include: decommissioning executors without losing cached data, proactively shutting down executors to prevent starvation, and improving the start times of new executors.
Session hashtag: #EUdev8
As an employee of The Weather Company, Craig started using Apache Spark in 2014. Most of his experience with Spark has centered around scaling and operationalizing a wide variety of applications. Since IBM acquired The Weather Company in 2016, he has been focused building a platform others can use to solve problems in the big data space.
Brad is a member of the Spark Technology Center at IBM. Before that he was a data engineer at The Weather Company where he built data pipelines using Spark, Cassandra, Hadoop, and Parquet. These have opened up terabytes of data to TWC's data scientists and business analysts.