We present HTTP on Spark, a novel integration between Spark with the widely used Hypertext Transfer Protocol (HTTP). This library can be used to integrate any framework into the Spark ecosystem that is capable of communicating through HTTP. Furthermore, HTTP on Spark enables distributed and fault tolerant micro service architectures that commute with Spark’s dynamic allocation and Streaming capabilities. We build upon this work and release a library of idiomatic spark bindings for a wide array of Microsoft Cognitive Services. These bindings allow users to easily add *any* cognitive service as a part of their existing Spark and SparkML machine learning pipelines. Finally, we demonstrate how to use these services to create a large class of custom image classification and object detection systems that can learn without requiring human labeled training examples. We demonstrate the power of these new releases with an automated Snow Leopard Detection system.
Anand is the GM and Chief of Staff for Microsoft AI. Previously he was the Chief of Staff for Microsoft Azure Data Group covering Data Platforms and Machine Learning. In the last decade, he ran the product management and the development teams at Azure Data Services, Visual Studio and Windows Server User Experience teams at Microsoft. Anand holds a PhD in Computational fluid mechanics and worked several years as researcher before joining Microsoft.
Mark is a software engineer and researcher who actively maintains Microsoft ML for Apache Spark (http://aka.ms/spark), a distributed machine learning library. Mark is based at the Microsoft’s New England R+D center where he worked on scalable deep learning for snow leopard conservation and poacher recognition. Mark’s research interests include unsupervised learning, distributed systems, and abstract algebra. Mark loves the outdoors and take every opportunity to ski, hike and climb.