Apache Arrow is new in Spark 2.3, and offers faster interchange between Spark and Python. Apache Arrow also has connections to Tensorflow (and even without those can be fed from Pandas). This talk will look at how to use Arrow to accelerate data copy from Spark to Tensorflow, and how to expose basic functionality in Scala for working with Tensorflow. From there we will dive in to how to construct new Deep Learning ML pipeline stages in Python and make them available to be used by our friends in Scala land.
Session hashtag: #DL7SAIS
Holden is an Apache Spark committer and PMC member who focus on PySpark and Kubernetes support. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that’s a bit more out of date. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. Her current side project is working on a book to teach children distributed systems, http://www.distributedcomputing4kids.com/.