ModelDB: A System to Manage Machine Learning Models - Databricks

ModelDB: A System to Manage Machine Learning Models

Download Slides

Building a machine learning model is an iterative process. A data scientist will build many tens to hundreds of models before arriving at one that meets some acceptance criteria. However, the current style of model building is ad-hoc and there is no practical way for a data scientist to manage models that are built over time. In addition, there are no means to run complex queries on models and related data.
In this talk, we present ModelDB, a novel end-to-end system for managing machine learning (ML) models. Using client libraries, ModelDB automatically tracks and versions ML models in their native environments (e.g. spark.ml, scikit-learn). A common set of abstractions enable ModelDB to capture models and pipelines built across different languages and environments. The structured representation of models and metadata then provides a platform for users to issue complex queries across various modeling artifacts. Our rich web frontend provides a way to query ModelDB at varying levels of granularity.

ModelDB has been open-sourced at https://github.com/mitdbg/modeldb.

Learn more:

  • Visualizing Machine Learning Models
  • Machine Learning – Getting Started with Apache Spark on Databricks
  • Building Large Scale Machine Learning Applications with Pipelines
  • About Manasi Vartak

    Manasi Vartak is a PhD Student in the Database Group at MIT CSAIL, advised by Samuel Madden. Her research focuses on novel systems to support fast and interactive data analysis. She is currently working on systems to manage machine learning models and enable easy debugging of models. She has previously worked on visualization recommendation systems. Manasi is a recipient of the Facebook Graduate Fellowship and the Google Anita Borg Scholarship.