Miruna Oprescu is a Software Engineer at Microsoft specializing in tools and infrastructure for big data and machine learning. Her goal is to make machine learning simple for both developers and end users. As an active MMLSpark (Microsoft Machine Learning for Spark) contributor, she has been working on Python/R wrapper generation for Spark pipeline stages and a robust testing framework for Spark pipelines using Jupyter Notebooks.
With the rapid growth of available datasets, it is imperative to have good tools for extracting insight from big data. The Spark ML library has excellent support for performing at-scale data processing and machine learning experiments, but more often than not, Data Scientists find themselves struggling with issues such as: low level data manipulation, lack of support for image processing, text analytics and deep learning, as well as the inability to use Spark alongside other popular machine learning libraries. To address these pain points, Microsoft recently released The Microsoft Machine Learning Library for Apache Spark (MMLSpark), an open-source machine learning library built on top of SparkML that seeks to simplify the data science process and integrate SparkML Pipelines with deep learning and computer vision libraries such as the Microsoft Cognitive Toolkit (CNTK) and OpenCV. With MMLSpark, Data Scientists can build models with 1/10th of the code through Pipeline objects that compose seamlessly with other parts of the SparkML ecosystem. In this session, we explore some of the main lessons learned from building MMLSpark. Join us if you would like to know how to extend Pipelines to ensure seamless integration with SparkML, how to auto-generate Python and R wrappers from Scala Transformers and Estimators, how to integrate and use previously non-distributed libraries in a distributed manner and how to efficiently deploy a Spark library across multiple platforms. Session hashtag: #EUai7
Azure Machine Learning is an integrated, end-to- data data science experience designed for professionals to prepare data and create, manage and deploy machine learning models at any scale.Azure Machine Learning was developed with the conviction that the scale of the problem you are trying to solve shouldn’t matter, that integrating Spark into your regular workflow shouldn’t present any barriers and that you, the professional data scientist, should be able to focus on solving machine learning problems, rather than software engineering problems. In this session, we demonstrate the power of Apache Spark on Azure Machine Learning by training a model on a variety of targets at the switch of a button, tracking the history of the model and operationalizing it − all in just under 15 minutes.