Keynotes from the European Spark + AI Summit Summit 2019 - Day Two

KEYNOTES
EUROPEAN SPARK + AI SUMMIT 2019

OCTOBER 17, 2019 – Amsterdam

Go To Day One Videos

Simplifying Model Management with MLflow


Matei Zaharia (Databricks), Corey Zumar (Databricks)

Last summer, Databricks launched MLflow, an open source platform to manage the machine learning lifecycle, including experiment tracking, reproducible runs and model packaging. MLflow has grown quickly since then, with over 120 contributors from dozens of companies, including major contributions from R Studio and Microsoft. It has also gained new capabilities such as automatic logging from TensorFlow and Keras, Kubernetes integrations, and a high-level Java API. In this talk, we’ll cover some of the new features that have come to MLflow, and then focus on a major upcoming feature: model management with the MLflow Model Registry. Many organizations face challenges tracking which models are available in the organization and which ones are in production. The MLflow Model Registry provides a centralized database to keep track of these models, share and describe new model versions, and deploy the latest version of a model through APIs. We’ll demonstrate how these features can simplify common ML lifecycle tasks.

Scalable AI for Good


Mark Hamilton (Microsoft), Christina Lee (Microsoft)

As AI becomes more ubiquitous and scalable we aim to apply these technologies to help improve the planet. This talk will explore Microsoft’s latest contributions to the Apache Spark and Machine Learning communities with a special focus on AI for environmental and social impact. In particular, we will share how to use Azure Databricks, Azure Machine Learning and Microsoft ML for Apache Spark to explore over 5,000 years of human creativity with the Metropolitan Museum of Art, and how Microsoft uses Apache Spark to help the protect endangered species.

Forecasting 'What-if' Scenarios in Retail Using ML-Powered Interactive Tools


Johan Vallin (Electrolux)

Reinventing Payments at HSBC with a Unified Platform for Data and AI in the Cloud


Alessio Basso (PayMe)

Democratizing Machine Learning: Perspective from a scikit-learn Creator


Gael Varoquaux

Once an obscure branch of applied mathematics, machine learning is now the darling of tech. I will talk about lessons learned democratizing machine learning. How libraries like scikit-learn were designed to empower users: simplifying but avoiding ambiguous behaviors. How the Python data ecosystem was built from scientific computing tools: the importance of good numerics. How some machine-learning patterns easily provide value to real-world situations. I will also discuss remain challenges to address and the progresses that we are making. Scaling up brings different bottlenecks to numerics. Integrating data in the statistical models, a hurdle to data-science practice requires to rethink data cleaning pipelines.

This talk will drawn from my experience as a scikit-learn developer, but also as a researcher in machine learning and applications.

Imaging the Unseen: Taking the First Picture of a Black Hole


Katie Bouman

This talk will present the methods and procedures used to produce the first image of a black hole from the Event Horizon Telescope. It has been theorized for decades that a black hole will leave a “shadow” on a background of hot gas. Taking a picture of this black hole shadow could help to address a number of important scientific questions, both on the nature of black holes and the validity of general relativity.

Unfortunately, due to its small size, traditional imaging approaches require an Earth-sized radio telescope. In this talk, I discuss techniques we have developed to photograph a black hole using the Event Horizon Telescope, a network of telescopes scattered across the globe. Imaging a black hole’s structure with this computational telescope required us to reconstruct images from sparse measurements, heavily corrupted by atmospheric error. The resulting image is the distilled product of an observation campaign that collected approximately five petabytes of data over four evenings in 2017.