Spark in the Wild: An In-Depth Analysis of 50+ Production Deployments

Since its inception in 2009, Spark has progressed from an academic endeavor to the most active open source Big Data project with over 400 contributors. Along the way, it has emerged as a popular option for powering enterprise data pipelines with hundreds of production deployments. However, as with any relatively new technology experiencing significant uptake in adoption, one of the most common inquiries around Spark from interested enterprises is to better understand who is using it, what are they using it for, and what lessons they learned along the way. This talk synthesizes our experience from being directly involved with over 50 production Spark deployments across a broad spectrum of industries to provide insights into the following:* What were the primary drivers of Spark adoption?
* What are the most common Spark workflows and use cases and does it vary by vertical?
* What were the main stumbling blocks and lessons learned?