Productionizing Deep Learning Pipelines with Databricks & Azure Machine Learning - Databricks

Productionizing Deep Learning Pipelines with Databricks & Azure Machine Learning

Deployment of modern machine learning applications can require a significant amount of time, resources, and experience to design and implement, thus introducing overhead for small-scale machine learning projects. In this tutorial, we present a reproducible framework for quickly jumpstarting data science projects using Databricks and Azure Machine Learning workspaces that enables easy production-ready app deployment for data scientists in particular. Although the example presented in the session focuses on deep learning, the workflow can be extended to other traditional machine learning applications as well. The tutorial will include sample-code with templates and recommended project organization structure and tools, along with shared key learnings from our experiences in deploying machine learning pipelines into production and distributing a repeatable framework within our organization.

What you will learn:

  • Understand how to develop pipelines for continuous integration and deployment within Azure Machine Learning using Azure Databricks.
  • Learn how to execute Apache Spark jobs using Databricks Connect and integrating source code with Azure DevOps for version control.
  • Exposure to using Apache Spark and Koalas for extracting and preprocessing data for modeling.
  • Hands-on experience building deep learning models for time series classification. Address challenges of the ML lifecycle by implementing MLflow for tracking model. parameters/results, packaging code for reproducibility, and deploying models.

Prerequisites:

  • Microsoft Azure Account
  • Azure Machine Learning Workspace
  • Azure DevOps Configured Pre-Register for a Databricks Standard Trial (runtime > 6.0)
  • Python 3.7.1 virtual environment with the following libraries
    • databricks-connect==6.1.*
    • koalas==0.23.0
    • pandas==0.25.3
    • keras==2.3.1
    • mlflow==1.4.0
    • More will be added later.
  • Basic knowledge of Python
  • Apache Spark Basic understanding of Deep Learning Concepts.


« back
About Trace Smith

ExxonMobil

Trace is a lead Data Scientist at ExxonMobil and leverages big data and machine learning to help solve complex problems for upstream business units. His experiences consist of building and deploying machine learning applications and interested in real-time predictive maintenance, anomaly detection, and natural language processing. Trace holds a M.S. in Petroleum Engineering from Louisiana State University and a M.S. in Data Science from Southern Methodist University.