Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner Enabled Apache Spark Clusters

Download Slides

The freedom of fast iterations of distributed deep learning tasks is crucial for smaller companies to gain competitive advantages and market shares from big tech giants. Horovod Runner brings this process to relatively accessible spark clusters. There have been, however, no benchmark tests on Horovod Runner per se, and very limited scalability benchmark tests on Horovod, the predecessor requiring custom built GPU clusters. For the first time, we show that Databricks’ Horovod Runner achieves significant lift in scaling efficiency for the convolutional neural network (CNN, hereafter) based tasks on both GPU and CPU clusters.

We also implemented the Rectified Adam optimizer for the first time in Horovod Runner. In addition to show test results, we will also discuss lessons we learned on how to do it such as:

  1. cluster settings
  2. distributed model retrieval
  3. how to avoid Horovod timeline but still get training time
  4. how to use Rectified Adam
  5. what type of models will gain scaling efficiency and why, etc.

 
Try Databricks
« back
About Jing Pan

eHealth Inc.

Dr. Jing Pan is a Sr. Staff Data Scientist/User Experience Researcher at eHealth Inc. She oversees all customer facing modeling projects and technical evaluations of third party services and/or merger-acquisitions. She is passionate about the productionization of deep learning models on Spark clusters. She is the first in the world to apply Rectified Adam optimizer on HorovodRunner enabled spark clusters for distributed deep learning training in 2019. At Fanatics Inc., she was perhaps the first one in the world to serve deep learning model trained on Keras in a distributed fashion on Spark slave nodes in 2017.

About Wendao Liu

eHealth Inc.

Wendao Liu received his master's degree from the prestigious Drexel University's LeBow College of Business. He is a Ph.D student in Business Administration and at the same time works at eHealth, Inc. as a full-stack data scientist. With his rare combination of business mindset and strong technical skills, he can not only tackle data issue, but also leverage data to drive business performance. He identifies business opportunities, optimize product performance and provide recommendations. He builds end to end customer unification data product, which leverage the machine learning techniques to provide reliable linkage across disparate systems.