Dhabaleswar K (DK) Panda

Professor , The Ohio State University

Dhabaleswar K (DK) Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He has published over 400 papers in the area of high-end computing and networking. The RDMA-enabled Apache Spark and Hadoop libraries, designed and developed by his team to exploit HPC technologies under the High-Performance Big Data (HiBD) project (http://hibd.cse.ohio-state.edu), are currently being used by more than 275 organizations in 35 countries. More than 24,500 downloads of these libraries have taken place from the project’s site. Morre details on Prof. Panda are available at http://web.cse.ohio-state.edu/~panda.2/



DLoBD: An Emerging Paradigm of Deep Learning Over Big Data StacksSummit 2018

Deep Learning over Big Data (DLoBD) is becoming one of the most important research paradigms to mine value from the massive amount of gathered data. Many emerging deep learning frameworks start running over Big Data stacks, such as Hadoop and Spark. With the convergence of HPC, Big Data, and Deep Learning, these DLoBD stacks are taking advantage of RDMA and multi-/many-core based CPUs/GPUs. Even though a lot of activities are happening in the field, there is a lack of systematic studies on analyzing the impact of RDMA-capable networks and CPU/GPU on DLoBD stacks. To fill this gap, this talk will present a systematical characterization methodology and extensive performance evaluations on four representative DLoBD stacks (i.e., CaffeOnSpark, TensorFlowOnSpark, MMLSpark, and BigDL) to expose the interesting trends regarding performance, scalability, accuracy, and resource utilization. Our observations show that RDMA-based design for DLoBD stacks can achieve up to 2.7x speedup compared to the IPoIB based scheme. The RDMA scheme can also scale better and utilize resources more efficiently than the IPoIB scheme over InfiniBand clusters. For most cases, GPU-based deep learning can outperform CPU-based designs, but not always. We see that for LeNet on MNIST, CPU + MKL can achieve better performance than GPU and GPU + cuDNN on 16 nodes. Our studies show that there are large rooms to improve the designs of current generation DLoBD stacks. Finally, we will present some in-depth case studies to further accelerate deep learning workloads on Spark framework. Session hashtag: #Res7SAIS