Giselle van Dongen is Lead Data Scientist at Klarrio specializing in real-time data analysis, processing and visualization. Concurrently she is a PhD researcher at Ghent University, teaching and benchmarking real-time distributed processing systems such as Spark Streaming, Structured Streaming, Flink and Kafka Streams.
Due to the increasing interest in real-time processing, many stream processing frameworks were developed. However, no clear guidelines have been established for choosing a framework for a specific use case. In this talk, two different scenarios are taken and the audience is guided through the thought process and questions that one should ask oneself when choosing the right tool. The stream processing frameworks that will be discussed are Spark Streaming, Structured Streaming, Flink and Kafka Streams.
The main questions are:
For each of these questions, we look at how each framework tackles this and what the main differences are. The content is based on the PhD research of Giselle van Dongen in benchmarking stream processing frameworks in several scenarios using latency, throughput and resource utilization.