ROCm and Distributed Deep Learning on Spark and TensorFlow - Databricks

ROCm and Distributed Deep Learning on Spark and TensorFlow

Download Slides

ROCm, the Radeon Open Ecosystem, is an open-source software foundation for GPU computing on Linux. ROCm supports TensorFlow and PyTorch using MIOpen, a library of highly optimized GPU routines for deep learning. In this talk, we describe how Apache Spark is a key enabling platform for distributed deep learning on ROCm, as it enables different deep learning frameworks to be embedded in Spark workflows in a secure end-to-end machine learning pipeline. We will analyse the different frameworks for integrating Spark with Tensorflow on ROCm, from Horovod to HopsML to Databrick’s Project Hydrogen. We will also examine the surprising places where bottlenecks can surface when training models (everything from object stores to the Data Scientists themselves), and we will investigate ways to get around these bottlenecks. The talk will include a live demonstration of training and inference for a Tensorflow application embedded in a Spark pipeline written in a Jupyter notebook on Hopsworks with ROCm.



« back
About Jim Dowling

Jim Dowling is the CEO of Logical Clocks AB, as well as an Associate Professor at KTH Royal Institute of Technology in Stockholm, and a Senior Researcher at SICS RISE. He is the lead architect of Hops Hadoop, the world's most fastest and most scalable Hadoop distribution and only Hadoop platform with support for GPUs as a resource. His research concentrates on building systems support for machine learning at scale. He is a regular speaker at Big Data and AI industry conferences, and blogs at O'Reilly on AI.

About Ajit Mathews

As the Corporate Vice President of Machine Learning software engineering, Ajit is the engineering leader responsible for design, development of ROCm (Radeon Open Compute) Machine Intelligence software spanning Deep Learning Frameworks, Compilers, Language Runtimes, Libraries and Linux Compute Kernel. Ajit is also responsible for the Machine Learning Software Roadmap and Strategy. Ajit is passionate about distributed machine learning and high performance computing. Ajit holds Masters in Computer Science and MBA from Kellogg.