Apache Spark-and-Tensorflow-as-a-Service - Databricks

Apache Spark-and-Tensorflow-as-a-Service

Download Slides

In Sweden, from the Rise ICE Data Center at www.hops.site, we are providing to reseachers both Spark-as-a-Service and, more recently, Tensorflow-as-a-Service as part of the Hops platform. In this talk, we examine the different ways in which Tensorflow can be included in Spark workflows, from batch to streaming to structured streaming applications. We will analyse the different frameworks for integrating Spark with Tensorflow, from Tensorframes to TensorflowOnSpark to Databrick’s Deep Learning Pipelines. We introduce the different programming models supported and highlight the importance of cluster support for managing different versions of python libraries on behalf of users. We will also present cluster management support for sharing GPUs, including Mesos and YARN (in Hops Hadoop). Finally, we will perform a live demonstration of training and inference for a TensorflowOnSpark application written on Jupyter that can read data from either HDFS or Kafka, transform the data in Spark, and train a deep neural network on Tensorflow. We will show how to debug the application using both Spark UI and Tensorboard, and how to examine logs and monitor training.
Session hashtag: #EUai8

About Jim Dowling

Jim Dowling is the CEO of Logical Clocks AB, as well as an Associate Professor at KTH Royal Institute of Technology in Stockholm, and a Senior Researcher at SICS RISE. He is the lead architect of Hops Hadoop, the world's most fastest and most scalable Hadoop distribution and only Hadoop platform with support for GPUs as a resource. His research concentrates on building systems support for machine learning at scale. He is a regular speaker at Big Data and AI industry conferences, and blogs at O'Reilly on AI.