Apache Spark on Kubernetes Clusters - Databricks

Apache Spark on Kubernetes Clusters

Kubernetes is a fast growing open-source platform which provides container-centric infrastructure. Conceived by Google in 2014, and leveraging over a decade of experience running containers at scale internally, it is one of the fastest moving projects on GitHub with 1400+ contributors and 60,000+ commits. Kubernetes has first class support on Google Cloud Platform, Amazon Web Services, and Microsoft Azure. The Kubernetes and Spark communities have put their heads together over the past year to come up with a new native scheduler for Kubernetes within Apache Spark.

In this talk, we explore all the exciting new things that this native Kubernetes integration makes possible with Apache Spark. We also go over the roadmap and features that the Kubernetes community has planned for the scheduler over the next several releases of Spark. This talk will be technical and is aimed at people who are looking to build modern data pipelines in a Kubernetes native way. The talk assumes basic familiarity with cluster orchestration and containers.

Session hashtag: #ExpSAIS11

About Anirudh Ramanathan

Anirudh Ramanathan is a software engineer on the Kubernetes team at Google. He currently leads the BigData efforts under SIG Big Data in the Kubernetes community with a focus on running batch, data processing and ML workloads. He has worked on native Kubernetes support within Spark, Airflow, Tensorflow, and JupyterHub. Prior to this, he worked on GGC (Google Global Cache) and before that, on the infrastructure team at NVIDIA.

About Sean Suchter

Sean is the co-founder and CTO of Pepperdata. Previously, Sean was the founding GM of Microsoft's Silicon Valley Search Technology Center, where he led the integration of Facebook and Twitter content into Bing search. Prior to Microsoft, Sean managed the Yahoo Search Technology team, the first production user of Hadoop.