Rajesh Thallam is a Machine Learning Specialist at Google Cloud enabling customers to build data science platforms, deploy machine learning pipelines and integrating them with data and analytics services on Google Cloud Platform. He provides guidance and hands-on work to advance and scale machine learning use cases and technologies. He works with product, engineering, and key customers to build repeatable architectures and drive product roadmaps. Previously, he was a data scientist and data architect working with customers in financial and health care organizations, building and shipping analytic pipelines and machine learning applications on distributed platforms.
May 28, 2021 11:40 AM PT
There is no doubt Kubernetes has emerged as the next generation of cloud native infrastructure to support a wide variety of distributed workloads. Apache Spark has evolved to run both Machine Learning and large scale analytics workloads. There is growing interest in running Apache Spark natively on Kubernetes. By combining the flexibility of Kubernetes and scalable data processing with Apache Spark, you can run any data and machine pipelines on this infrastructure while effectively utilizing resources at disposal.
In this talk, Rajesh Thallam and Sougata Biswas will share how to effectively run your Apache Spark applications on Google Kubernetes Engine (GKE) and Google Cloud Dataproc, orchestrate the data and machine learning pipelines with managed Apache Airflow on GKE (Google Cloud Composer). Following topics will be covered: - Understanding key traits of Apache Spark on Kubernetes- Things to know when running Apache Spark on Kubernetes such as autoscaling- Demonstrate running analytics pipelines on Apache Spark orchestrated with Apache Airflow on Kubernetes cluster.