This talk explains the reasons why virtualizing Spark, in-house or elsewhere, is a requirement in today’s fast-moving and experimental world of data science and data engineering. Different teams want to spin up a Spark cluster “on the fly” to carry out some research and quickly answer business questions. They are not concerned with the availability of the server hardware – or with what any other team might be doing on it at the time. Virtualization provides the means of working within your own sandbox to try out the new query or Machine Learning algorithm. Deep performance test results will be shown that demonstrate that Spark and ML programs perform equally well on virtual machines just like native implementations do. An early introduction is given to the best practices you should adhere to when you do this. If time allows, a short demo will be given of creating an ephemeral, single-purpose Spark cluster, running an ML application test program on that cluster, and bringing it down when finished.
Justin Murray works as a Technical Marketing Manager at VMware . Justin creates technical material and gives guidance to customers and the VMware field organization to promote the virtualization of big data workloads on VMware's vSphere platform. Justin has worked closely with VMware's partner ISVs (Independent Software Vendors) to ensure their products work well on vSphere and continues to bring best practices to the field as the customer base for big data expands.