Josef Adersberger has been a software engineering fanatic for over 10 years. He studied computer science in Rosenheim and Munich and holds a doctoral degree in software engineering. He’s the founder and CTO of QAware, a German software development company, and is a lecturer at several German universities. His main area of interest is cloud computing.
Users leave thousands of traces per second on a successful ecommerce site. It's very pragmatic to analyse and react on this trace event stream in realtime. This is called clickstream analysis. In the talk I'll present a software architecture based on Apache Spark which is able to process thousands of clickstream events per second. A product based on this architecture is in production since mid 2015. The building blocks of the architecture beside Spark are Kafka to handle the inbound event stream, Spark Streaming for initial stream processing and Parquet as serialization format. I argue why we've chosen these technologies and what experiences we had in developing, launching and operating the product.Learn more: