Spark Streaming at Bing Scale

Download Slides

Hundreds of millions of search queries hit Bing.com every day and generate massive volume of logs and signals that need to be collected, processed and enriched in near real-time to monitor the quality of service, analyze user engagement and act upon revenue opportunities in a timely manner. We have employed Apache Spark Streaming to implement the data processing pipeline for this scenario and running it in production. In this talk, we will cover the following: (1) architecture of the stream processing pipeline (2) key challenges and lessons learned in building streaming data pipelines using Spark and Kafka