Staff Engineer, VMware
Enrique Corro has worked for VMware since 2006. Currently, he acts as a Staff Engineer focused on Data Science at the VMware’s Office of the CTO. Enrique is part of the team that drives new types of integrations between VMware and other IT industry-leading companies to facilitate the adoption of Machine Learning and Artificial Intelligence by companies of any size and industry. Enrique is currently undergoing a Masters Degree Program in Data Science with the University of Illinois.
In 2020, two significant IT platforms converge. On the one hand, Spark 3 becomes available with the support of Kubernetes as a scheduler. On the other hand, VMware releases project Pacific which is an industry-grade Kubernetes that is natively integrated with the VMware vSphere 7 hypervisor. In this session, we present a reference architecture that integrates these two platforms. With the integration of Spark 3 and VMware Pacific, Spark clusters get deployed on the same Kubernetes + virtual machines platform that is used by dozens of thousands of companies across the world. These are some of the main benefits:
- Scalable and straightforward deployments: With Kubernetes, IT operations teams may add more nodes via some simple command-line instructions to the clusters to expand their capacity. VMware Pacific delivers enterprise-grade Kubernetes to run Spark clusters after a few simple setup steps.
- Reliability: Kubernetes takes care of keeping the specified number of workers, even after hardware failures occur. It can self-heal and bring services back on track in a matter of seconds. Spark clusters may get a significant boost in their levels of availability.
- Predictable performance: VMware Pacific delivers a Kubernetes runtime native to the vSphere hypervisor, which has proven to be more efficient and scalable than the Linux OS to schedule containers. High performance and throughput are paramount attributes for a big-data analytics platform such as Spark.
- Introduction to VMware Pacific's Kubernetes architecture
- How to bring up a hypervisor-native Kubernetes cluster using VMware Pacific (demo).
- Architecture and sizing configuration of vSphere and Kubernetes to run Spark 3.
- How to deploy and configure Spark 3 (demo)
- Running Machine Learning tasks using ML libraries from Spark 3 (demo)