Thunder Shiviah - Databricks

Thunder Shiviah

Solutions Architect, Databricks

Databricks Solutions Architect and ex-McKinsey Machine Learning Engineer focused on productionizing machine learning at scale.

UPCOMING SESSIONS

PAST SESSIONS

Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasSummit Europe 2019

Does more data always improve ML models? Is it better to use distributed ML instead of single node ML?

In this talk I will show that while more data often improves DL models in high variance problem spaces (with semi or unstructured data) such as NLP, image, video more data does not significantly improve high bias problem spaces where traditional ML is more appropriate. Additionally, even in the deep learning domain, single node models can still outperform distributed models via transfer learning.

Data scientists have pain points running many models in parallel automating the experimental set up. Getting others (especially analysts) within an organization to use their models Databricks solves these problems using pandas udfs, ml runtime and MLflow.

Managing the Complete Machine Learning Lifecycle with MLflowSummit Europe 2019

ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models.

To solve for these challenges, Databricks unveiled last year MLflow, an open source project that aims at simplifying the entire ML lifecycle. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.

In the past year, the MLflow community has grown quickly: over 120 contributors from over 40 companies have contributed code to the project, and over 200 companies are using MLflow.

In this tutorial, we will show you how using MLflow can help you:

  • Keep track of experiments runs and results across frameworks.
  • Execute projects remotely on to a Databricks cluster, and quickly reproduce your runs.
  • Quickly productionize models using Databricks production jobs, Docker containers, Azure ML, or Amazon SageMaker.

We will demo the building blocks of MLflow as well as the most recent additions since the 1.0 release.

What you will learn:

  • Understand the three main components of open source MLflow (MLflow Tracking, MLflow Projects, MLflow Models) and how each help address challenges of the ML lifecycle.
  • How to use MLflow Tracking to record and query experiments: code, data, config, and results.
  • How to use MLflow Projects packaging format to reproduce runs on any platform.
  • How to use MLflow Models general format to send models to diverse deployment tools.

Prerequisites:

  • A fully-charged laptop (8-16GB memory) with Chrome or Firefox
  • Python 3 and pip pre-installed
  • Pre-Register for a Databricks Standard Trial
  • Basic knowledge of Python programming language
  • Basic understanding of Machine Learning Concepts

Managing the Complete Machine Learning Lifecycle with MLflow—continuesSummit Europe 2019

ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models.

To solve for these challenges, Databricks unveiled last year MLflow, an open source project that aims at simplifying the entire ML lifecycle. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.

In the past year, the MLflow community has grown quickly: over 120 contributors from over 40 companies have contributed code to the project, and over 200 companies are using MLflow.

In this tutorial, we will show you how using MLflow can help you:

  • Keep track of experiments runs and results across frameworks.
  • Execute projects remotely on to a Databricks cluster, and quickly reproduce your runs.
  • Quickly productionize models using Databricks production jobs, Docker containers, Azure ML, or Amazon SageMaker.

We will demo the building blocks of MLflow as well as the most recent additions since the 1.0 release.

What you will learn:

  • Understand the three main components of open source MLflow (MLflow Tracking, MLflow Projects, MLflow Models) and how each help address challenges of the ML lifecycle.
  • How to use MLflow Tracking to record and query experiments: code, data, config, and results.
  • How to use MLflow Projects packaging format to reproduce runs on any platform.
  • How to use MLflow Models general format to send models to diverse deployment tools.

Prerequisites:

  • A fully-charged laptop (8-16GB memory) with Chrome or Firefox
  • Python 3 and pip pre-installed
  • Pre-Register for a Databricks Standard Trial
  • Basic knowledge of Python programming language
  • Basic understanding of Machine Learning Concepts

Navigating the ML Pipeline Jungle with MLflow: Notes from the FieldSummit Europe 2018

Plumbing has been a key focus of modern software engineering, with our API/services/containers/devops driven landscape so it may come as a surprise that plumbing is where AI projects tend to fail. But it is precisely because our modern software development focuses on decoupled plumbing that we have struggled to handle the rise of AI. Specifically, companies are able to use AI effectively when they are able to create end-to-end AI model factories that explicitly account for coupling between data, models, and code. In this talk, I will be walking through what a model factory is and how MLFlow's design supports the creation of end-to-end model factories as well as sharing best practices I've observed helping customers from startups to Fortune 50s create, productionize, and scale end-to-end ML pipelines, and watching those pipelines produce serious, game changing business impact. Session hashtag: #SAISDS11