Hundreds of millions search queries hit Bing.com every day. To enable teams in Bing to monitor and analyze user engagement, act upon revenue opportunities in markets around the world, Shared Data Team must collect logs and signals associated every single search query, process and enrich the data in near real-time. Apache Spark Streaming is the solution that empowers us to fulfill the mission. In this presentation, we will walk through top 5 lessons we learned in building and running large scale streaming applications successfully in production.
Renyi is a Senior Software Engineer in the Shared Data team at Microsoft. In addition to being a key contributor of Mobius Project that introduces C# language binding to Apache Spark, Renyi is busy building a reference Spark streaming pipeline at Microsoft Bing scale. Renyi joined Microsoft Ads team 8 years ago working on Big Data pipelines. In the last two years, Renyi has been focusing on streaming solutions, with expertise in Apache Storm, Apache Kafka and other internal Microsoft streaming solution. Prior to joining Microsoft, Renyi had worked in a start-up building a real time system with Erlang/OTP.