Mark is a software engineer on Microsoft’s Applied AI team and a machine learning PhD student at the MIT Computer Science and AI Lab. Mark leads Microsoft ML for Apache Spark (http://aka.ms/spark), a distributed machine learning and microservice orchestration library. He has applied this work to problems in wildlife conservation, accessibility, and art museum outreach. Mark is currently researching how information theory and abstract algebra can yield new deep learning architectures in professor William T Freeman’s lab.
As AI becomes more ubiquitous and scalable we aim to apply these technologies to help improve the planet. This talk will explore Microsoft's latest contributions to the Apache Spark and Machine Learning communities with a special focus on AI for environmental and social impact. In particular, we will share how to use Azure Databricks, Azure Machine Learning and Microsoft ML for Apache Spark to explore over 5,000 years of human creativity with the Metropolitan Museum of Art, and how Microsoft uses Apache Spark to help the protect endangered species.
We present the Azure Cognitive Services on Spark, a simple and easy to use extension of the SparkML Library to all Azure Cognitive Services. This integration allows Spark Users to embed cloud intelligence directly into their spark computations, enabling a new generation of intelligent applications on Spark. Furthermore, we show that with our new Containerized Cognitive Services, one can embed cloud intelligence directly into the Spark cluster for ultra-low latency, on-prem, and offline applications. We show how using our Integration, one can compose these cognitive services with other services, SQL computations, and Deep Networks to create sophisticated and intelligent heterogenous applications. Moreover, we show how to redeploy these compositions as Restful Services with Spark Serving. We will also explore the architecture of these contributions which leverage HTTP on Spark, a novel integration between Spark with the widely used Hypertext Transfer Protocol (HTTP). This library can integrate any framework into the Spark ecosystem that is capable of communicating through HTTP. Finally, we demonstrate how to use these services to create a large class of intelligent applications such as custom search engines, realtime facial recognition systems, and unsupervised object detectors.
We present Spark Serving, a new spark computing mode that enables users to deploy any Spark computation as a sub-millisecond latency web service backed by any Spark Cluster. Attendees will explore the architecture of Spark Serving and discover how to deploy services on a variety of cluster types like Azure Databricks, Kubernetes, and Spark Standalone. We will also demonstrate its simple yet powerful API for RESTful SparkSQL, SparkML, and Deep Network deployment with the same API as batch and streaming workloads. In addition, we will explore the "dual architecture": HTTP on Spark. This architecture converts any spark cluster into a distributed web client with the familiar and pipelinable SparkML API. These two contributions provide the fundamental spark communication primitives to integrate and deploy any computation framework into the Spark Ecosystem. We will explore how Microsoft has used this work to leverage Spark as a fault-tolerant microservice orchestration engine in addition to an ETL and ML platform. And will walk through two examples drawn from Microsoft's ongoing work on Cognitive Service composition, and unsupervised object detection for Snow Leopard recognition.
We present a novel deep learning approach to create a robust object detection network for use in an infra-red, UAV-based, poacher recognition system. More specifically we have used Microsoft AirSim to generate thousands of hours of simulated drone footage in the African Savannah. We then used deep domain adaptation to translate our simulation into a form that is adversarially indistinguishable from real infrared drone footage. This yields a programmable data generator that can be used to dramatically improve the accuracy of algorithms without requiring expensive human curated annotations. Furthermore, we extend this work and contribute a photorealism extension to AirSim, automating much of the domain specific expertise needed for computer graphics work, and enabling the generation of limitless quantities of photorealistic data for use in reinforcement learning and autonomous vehicles. Session hashtag: #SAISDD2
We present HTTP on Spark, a novel integration between Spark with the widely used Hypertext Transfer Protocol (HTTP). This library can be used to integrate any framework into the Spark ecosystem that is capable of communicating through HTTP. Furthermore, HTTP on Spark enables distributed and fault tolerant micro service architectures that commute with Spark’s dynamic allocation and Streaming capabilities. We build upon this work and release a library of idiomatic spark bindings for a wide array of Microsoft Cognitive Services. These bindings allow users to easily add *any* cognitive service as a part of their existing Spark and SparkML machine learning pipelines. Finally, we demonstrate how to use these services to create a large class of custom image classification and object detection systems that can learn without requiring human labeled training examples. We demonstrate the power of these new releases with an automated Snow Leopard Detection system.