Justin Murray - Databricks

Justin Murray

Technical Marketing Manager, VMware

Justin Murray works as a Technical Marketing Manager at VMware . Justin creates technical material and gives guidance to customers and the VMware field organization to promote the virtualization of big data workloads on VMware’s vSphere platform. Justin has worked closely with VMware’s partner ISVs (Independent Software Vendors) to ensure their products work well on vSphere and continues to bring best practices to the field as the customer base for big data expands.

UPCOMING SESSIONS

Simplify and Boost Spark 3 Deployments with Hypervisor-Native KubernetesSummit 2020

In 2020, two significant IT platforms converge. On the one hand, Spark 3 becomes available with the support of Kubernetes as a scheduler. On the other hand, VMware releases project Pacific which is an industry-grade Kubernetes that is natively integrated with the VMware vSphere 7 hypervisor. In this session, we present a reference architecture that integrates these two platforms. With the integration of Spark 3 and VMware Pacific, Spark clusters get deployed on the same Kubernetes + virtual machines platform that is used by dozens of thousands of companies across the world. These are some of the main benefits:

  • Scalable and straightforward deployments: With Kubernetes, IT operations teams may add more nodes via some simple command-line instructions to the clusters to expand their capacity. VMware Pacific delivers enterprise-grade Kubernetes to run Spark clusters after a few simple setup steps.
  • Reliability: Kubernetes takes care of keeping the specified number of workers, even after hardware failures occur. It can self-heal and bring services back on track in a matter of seconds. Spark clusters may get a significant boost in their levels of availability.
  • Predictable performance: VMware Pacific delivers a Kubernetes runtime native to the vSphere hypervisor, which has proven to be more efficient and scalable than the Linux OS to schedule containers. High performance and throughput are paramount attributes for a big-data analytics platform such as Spark.
Session elements:
  • Introduction to VMware Pacific's Kubernetes architecture
  • How to bring up a hypervisor-native Kubernetes cluster using VMware Pacific (demo).
  • Architecture and sizing configuration of vSphere and Kubernetes to run Spark 3.
  • How to deploy and configure Spark 3 (demo)
  • Running Machine Learning tasks using ML libraries from Spark 3 (demo)
  • Q&A

PAST SESSIONS

Virtualizing Apache Spark and Machine LearningSummit 2018

This talk explains the reasons why virtualizing Spark, in-house or elsewhere, is a requirement in today’s fast-moving and experimental world of data science and data engineering. Different teams want to spin up a Spark cluster “on the fly” to carry out some research and quickly answer business questions. They are not concerned with the availability of the server hardware – or with what any other team might be doing on it at the time. Virtualization provides the means of working within your own sandbox to try out the new query or Machine Learning algorithm. Deep performance test results will be shown that demonstrate that Spark and ML programs perform equally well on virtual machines just like native implementations do. An early introduction is given to the best practices you should adhere to when you do this.

Virtualizing Apache SparkSummit 2017

This talk explains the reasons why virtualizing Spark, in-house or elsewhere, is a requirement in today’s fast-moving and experimental world of data science and data engineering. Different teams want to spin up a Spark cluster “on the fly” to carry out some research and quickly answer business questions. They are not concerned with the availability of the server hardware – or with what any other team might be doing on it at the time. Virtualization provides the means of working within your own sandbox to try out the new query or Machine Learning algorithm. Deep performance test results will be shown that demonstrate that Spark and ML programs perform equally well on virtual machines just like native implementations do. An early introduction is given to the best practices you should adhere to when you do this. If time allows, a short demo will be given of creating an ephemeral, single-purpose Spark cluster, running an ML application test program on that cluster, and bringing it down when finished.