Deep Learning Tutorial Demonstrates How to Simplify Distributed Deep Learning Model Inference Using Delta Lake and Apache Spark™

Simple Distributed Deep Learning Model Inference Webinar

Published: November 20, 2019

SolutionsLess than a minute

On October 10th, our team hosted a live webinar—Simple Distributed Deep Learning Model Inference—with Xiangrui Meng, Software Engineer at Databricks.

Model inference, unlike model training, is usually embarrassingly parallel and hence simple to distribute. However, in practice, complex data scenarios and compute infrastructure often make this "simple" task hard to do from data source to sink.

In this webinar, we provided a reference end-to-end pipeline for distributed deep learning model inference using the latest features from Apache Spark and Delta Lake. While the reference pipeline applies to various deep learning scenarios, we focused on image applications, and demonstrated specific pain points and proposed solutions.

The walkthrough starts from data ingestion and ETL, using binary file data source from Apache Spark to load and store raw image files into a Delta Lake table. A small code change then enables Spark structure streaming to continuously discover and import new images, keeping the table up-to-date. From the Delta Lake table, Pandas UDF is used to wrap single-node code and perform distributed model inference in Spark.

We demonstrated these concepts using these Simple Distributed Deep Learning Model Inference Notebooks and Tutorials.

Here are some additional deep learning tutorials and resources available from Databricks.

If you’d like free access Databricks Unified Analytics Platform and try our notebooks on it, you can access a free trial here.

What's next?

October 1, 2024/5 min read

Build Compound AI Systems Faster with Databricks Mosaic AI

October 24, 2024/4 min read

Never miss a Databricks post

Sign up

What's next?

Build Compound AI Systems Faster with Databricks Mosaic AI

Building a Cost-Optimized Chatbot with Semantic Caching