Lessons Learned Building an Open Deep Learning Model Exchange - Databricks

Lessons Learned Building an Open Deep Learning Model Exchange

Download Slides

The popular version of applying deep learning is that you take an open-source or research model, train it on raw data and deploy the resulting model as a fully self-contained artefact. However, the reality is far more complex. For the training phase, users face an array of challenges including handling varied deep learning frameworks, hardware requirements and configurations, not to mention code quality, consistency and packaging. For the deployment phase, they face another set of challenges ranging from custom requirements for data pre- and post-processing, to inconsistencies across frameworks, to lack of standardization in serving APIs. The goal of the IBM Code Model Asset eXchange (MAX) is to remove these barriers to entry for developers to obtain, train and deploy open-source deep learning models for their enterprise applications. In building the exchange, we encountered all these challenges and more.

For the training phase, we aim to leverage the Fabric for Deep Learning (FfDL: https://github.com/IBM/FfDL), an open-source project providing framework-independent training of deep learning models on Kubernetes. For the deployment phase, MAX provides container-based, fully self-contained model artefacts, encompassing the end-to-end deep learning predictive pipeline and exposing a standardized REST API.

This talk explores the process of building MAX, the challenges and problems encountered, the solutions developed, the lessons learned along the way and the future and best practices for cross-framework, standardized deep learning model training and deployment.

Session hashtag: #SAISDL6

About Nick Pentreath

Nick Pentreath is a principal engineer in IBM's Center for Open-source Data & AI Technology (CODAIT), where he works on machine learning. Previously, he cofounded Graphflow, a machine learning startup focused on recommendations. He has also worked at Goldman Sachs, Cognitive Match, and Mxit. He is a committer and PMC member of the Apache Spark project and author of Machine Learning with Spark. Nick is passionate about combining commercial focus with machine learning and cutting-edge technology to build intelligent systems that learn from data to add business value.