Bas is a programmer, scientist, and IT manager. At Rabobank, he works as Technology Lead in the AI, Data, and Analytics domain. His academic background is in Artificial Intelligence and Informatics. His research on reference architectures for big data solutions was published at the IEEE conference ICITST 2013. Bas has a background in software development, design and architecture with a broad technical view from C++ to Prolog to Scala. He occasionally teaches programming courses and is a regular speaker on conferences and informal meetings, where he brings a mixture of market context, his own vision, business cases, architecture and source code in an enthusiastic way towards his audience.
Streaming Analytics (or Fast Data processing) is becoming an increasingly popular subject in the financial sector. There are two main reasons for this development. First, more and more data has to be analyze in real-time to prevent fraud; all transactions that are being processed by banks have to pass and ever-growing number of tests to make sure that the money is coming from and going to legitimate sources. Second, customers want to have friction-less mobile experiences while managing their money, such as immediate notifications and personal advise based on their online behavior and other users’ actions.
A typical streaming analytics solution follows a 'pipes and filters' pattern that consists of three main steps: detecting patterns on raw event data (Complex Event Processing), evaluating the outcomes with the aid of business rules and machine learning algorithms, and deciding on the next action. At the core of this architecture is the execution of predictive models that operate on enormous amounts of never-ending data streams.
In this talk, I'll present an architecture for streaming analytics solutions that covers many use cases that follow this pattern: actionable insights, fraud detection, log parsing, traffic analysis, factory data, the IoT, and others. I'll go through a few architecture challenges that will arise when dealing with streaming data, such as latency issues, event time vs server time, and exactly-once processing. The solution is build on the KISSS stack: Kafka, Ignite, and Spark Structured Streaming. The solution is open source and available on GitHub.