It's been 2 years since we originally launched MLflow, an open source platform for the full machine learning lifecycle, and we are thrilled and humbled by the adoption and impact it has gained in the data science and data engineering community. With now over 2M+ monthly downloads, 200 code contributors and over a 100 contributing organizations, MLflow is the fastest growing and most widely used open source machine learning platform, confirming the need for an open source approach to help manage the complete ML lifecycle.
We provided an overview of how MLflow helps manage the ML lifecycle with a representation of a diverse set of use cases and industries during a recent virtual conference focused on ML platforms, and we have much more coming at Spark + AI Summit. Below is a list of sessions, tutorials, and trainings on MLflow for you to dive in.
Keynote
Join Matei Zaharia on Thursday, June 25th for his keynote on Simplifying Model Development and Management with MLflow to learn more about some of the most recent and new MLflow features. Specifically, he will cover what's new in MLflow to further streamline the ML lifecycle with simplified experiment tracking, model management, and model deployment with the new MLflow Model Registry. Many organizations face challenges tracking which models are available in the organization and which ones are in production. The MLflow Model Registry provides a centralized database to keep track of these models, share and describe new model versions, and deploy the latest version of a model through APIs.
Talks
We have a fantastic lineup of speakers and sessions throughout the conference on MLflow. Join experts from Accenture, ExxonMobil, Zynga, Atlassian, Databricks and more for real-life examples and deep dives on MLflow (in chronological order):
- AutoML Toolkit – Deep Dive with Daniel Tomes of Databricks
- Productionalizing Models through CI/CD Design MLflow with Mary Grace Moesta and Peter Tamisin of Databricks
- Tuning ML Models: Scaling, Workflows, and Architecture with Joseph Bradley of Databricks
- The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Production with Nathan Buesgens of Accenture
- Productionizing Machine Learning Pipelines with Databricks & Azure ML with Trace Smith and Amirhessam Tahmassebi of ExxonMobil
- Advertising Fraud Detection at Scale at T-Mobile with Eric Yatskowitz and Phan Chuong of T-Mobile
- Continuous Delivery of ML-enabled Pipelines on Databricks using MLflow with Michael Shtelma and Thunder Shiviah of Databricks
- Saving Energy in Homes with a Unified Approach to Data and AI Dr. Stephen Galsworthy and Erni Durdevic of Quby
- Productionizing Deep Reinforcement Learning with Spark and MLflow with Patrick Halina and Curren Pangler of Zynga
- Scaling Production Machine Learning Pipelines with Databricks with Max Cantor and James Evers of Conde Nast
- Translating Models to Medicine, a Minimal Example Using Open COVID-19 Data with Andrew Bauman and James Hibbard of Seattle Children's
- Automated & Explainable Deep Learning for Clinical Language Understanding at Roche with David Talby of Pacific AI and Vishakha Sharma and Yogesh Pandit of Roche
- Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on Quick-Insight Analytics and Demand Modelling with Patryk Oleniuk, and Sandhya Raghavan of Virgin Hyperloop One
- Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ground to cloud using SQL Server with Daniel Coelho of Microsoft
- Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow with Perry Stephenson of Atlassian
- Patterns and Anti-patterns for Memorializing Data Science Project Artifacts with Derrick Higgins and Sonjia Waxmonsky of Blue Cross / Blue Shield of Illinois
- Scaling Data and ML with Apache Spark and Feast at Gojek with Willem Pienaar of GOJEK
- Continuous Delivery of Deep Transformer-based NLP Models Using MLflow and AWS Sagemaker for Enterprise AI Scenarios with with Yong Liu and Andrew Brooks of Outreach Corporation
- Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific with Allison Wu of Thermo Fisher
- Machine Learning Data Lineage with MLflow and Delta Lake with Richard Zang and Denny Lee of Databricks
- Scaling up AI Research to Production with PyTorch and MLflow with Joe Spisak of Facebook
- Operationalizing Machine Learning at Scale at Starbucks with Balaji Venkataraman of Starbucks and Denny Lee of Databricks
- Accelerating MLflow Hyper-parameter Optimization Pipelines with RAPIDS with John Zedlewski of NVIDIA
Free Tutorial
Last but not least, you can join Using MLflow for end-to-end machine learning on Databricks for a free 80-minute tutorial presented by Sean Owen of Databricks. In this session, we'll take a look at a simple example where health data can be used to predict life expectancy. It will start with data engineering in Apache SparkTM, data exploration, model tuning and logging with hyperopt and MLflow. It will continue with examples of how the model registry governs model promotion, and simple deployment to production with MLflow as a job or dashboard.
Next Steps
You can browse through our sessions from the Spark +AI 2020 Summit schedule, too.
To get started with open source MLflow, follow the instructions at mlflow.org or check out the release code on Github. We are excited to hear your feedback!
If you’re an existing Databricks user, you can start using managed MLflow on Databricks by importing the Quick Start Notebook for Azure Databricks or AWS. If you’re not yet a Databricks user, visit https://www.databricks.com/product/managed-mlflow to learn more and start a free trial of Databricks and managed MLflow.
Related Blogs: