Skip to main content

On October 10th, our team hosted a live webinar—Simple Distributed Deep Learning Model Inference—with Xiangrui Meng, Software Engineer at Databricks.

Simple Distributed Deep Learning Model Inference Webinar

Model inference, unlike model training, is usually embarrassingly parallel and hence simple to distribute. However, in practice, complex data scenarios and compute infrastructure often make this "simple" task hard to do from data source to sink.

In this webinar, we provided a reference end-to-end pipeline for distributed deep learning model inference using the latest features from Apache Spark and Delta Lake. While the reference pipeline applies to various deep learning scenarios, we focused on image applications, and demonstrated specific pain points and proposed solutions.

The walkthrough starts from data ingestion and ETL, using binary file data source from Apache Spark to load and store raw image files into a Delta Lake table. A small code change then enables Spark structure streaming to continuously discover and import new images, keeping the table up-to-date. From the Delta Lake table, Pandas UDF is used to wrap single-node code and perform distributed model inference in Spark.

We demonstrated these concepts using these Simple Distributed Deep Learning Model Inference Notebooks and Tutorials.

Here are some additional deep learning tutorials and resources available from Databricks.

If you’d like free access Databricks Unified Analytics Platform and try our notebooks on it, you can access a free trial here.

Try Databricks for free

Related posts

Modernizing Risk Management Part 1: Streaming data-ingestion, rapid model development and Monte-Carlo Simulations at Scale

May 27, 2020 by Antoine Amend in
Part 2 of this accelerator here . Managing risk within the financial services , especially within the banking sector, has increased in complexity...

Automating Digital Pathology Image Analysis with Machine Learning on Databricks

Check out our solution accelerator for automating digital pathology analysis or watch our on-demand webinar to learn more. With technological advancements in imaging...

10th Spark Summit Sets Another Record of Attendance

June 9, 2017 by Jules Damji and Wayne Chan in
We have assembled a selected collage of highlights from Databricks’ speakers at our 10th Spark Summit, a milestone for Apache Spark community and...
See all Platform Blog posts