Apache Spark Streaming + Kafka 0.10: An Integration Story

Download Slides

Spark Streaming has supported Kafka since it’s inception, but a lot has changed since those times, both in Spark and Kafka sides, to make this integration more fault-tolerant and reliable.Apache Kafka 0.10 (actually since 0.9) introduced the new Consumer API, built on top of a new group coordination protocol provided by Kafka itself. So a new Spark Streaming integration comes to the playground, with a similar design to the 0.8 Direct DStream approach. However, there are notable differences in usage, and many exciting new features. In this talk, we will cover what are the main differences between this new integration and the previous one (for Kafka 0.8), and why Direct DStreams have replaced Receivers for good. We will also see how to achieve different semantics (at least one, at most one, exactly once) with code examples. Finally, we will briefly introduce the usage of this integration in Billy Mobile to ingest and process the continuous stream of events from our AdNetwork.
Session hashtag: #EUstr5

About Joan Viladrosa Riera

Joan is Senior Big Data Architect and Tech Lead at Billy Mobile where he has been leading the transition to the Hadoop ecosystem during the last two years. Previously, he worked in Trovit Search, where he developed Big Data solutions for programmatic buying in SEM platforms like Google AdWords and Microsoft Bing.