Deep Learning is now the standard in object detection, but it is not easy to analyze large amounts of images, especially in an interactive fashion. Traditionally, there has been a gap between Deep Learning frameworks, which excel at image processing, and more traditional ETL and data science tools, which are usually not designed to handle huge batches of complex data types such as images.
In this talk, we show how manipulating large corpora of images can be accomplished in a few lines of code because of recent developments in Apache Spark. Thanks to Spark’s unique ability to blend different libraries, we show how to start from satellite images and rapidly build complex queries on high level information such as houses or buildings. This is possible thanks to Magellan, a geospatial package, and Deep Learning Pipelines, a library that streamlines the integration of Deep Learning frameworks in Spark. At the end of this session, you will walk away with the confidence that you can solve your own image detection problems at any scale thanks to the power of Spark.
Session hashtag: #SAISDL1
Tim Hunter is a software engineer at Databricks and contributes to the Apache Spark MLlib project, as well as the GraphFrames, TensorFrames and Deep Learning Pipelines libraries. He has been building distributed Machine Learning systems with Spark since version 0.2, before Spark was an Apache Software Foundation project.
Raela Wang is a Solutions Architect at Databricks, where she works with clients to address their challenges with scaling their current data workloads, and ensuring that they are successful in building their big data and machine learning applications. Previously, she spent a few years at Credit Suisse, and has worn different hats at various early stage startups.