BoF Discussion-Scaling Spark to long-running and large workloads

Apache Spark usage is growing in both industry and academia as a performant and reliable framework for processing long running and/or large amounts of data. Our discussion will center around the challenges of pushing the scalability boundaries of Spark for these workloads. We hope to share experiences and learnings among the participants. Scalability challenges that will be discussed will include running Spark jobs with hundreds of thousands of tasks, hundreds of stages, shuffling hundreds of TB, and/or running times of many hours amidst hardware failures.