Serge Smertin - Databricks

Serge Smertin

Software Engineer, Adyen

Serge Smertin is full-cycle Software Engineer focusing on data solutions, security and heterogeneous system integration. During his career worked on various projects starting from ETL systems for e-commerce industry to large-scale malware forensic analysis platforms for cyber-threat intelligence industry hands on with scripting on Perl and Bash to distributed services on Java and Scala. Currently Serge is working for payments service provider Adyen, where he is leading Monitoring Stream, which focuses on real-time anomaly detection and decision support systems. Previously he was building data science platform using Apache Spark and Jupyter to scale up data initiatives.


Scalable Time Series Forecasting and Monitoring using Apache Spark and ElasticSearch at AdyenSummit Europe 2019

Adyen enables integrating companies to accept payments from their customers using any payment method over any sales channel. We have designed and implemented a time series forecasting algorithm that allows us to predict the volume for each integration with confidence and thus be able to flag anomalies such as traffic drop or abnormally low traffic. We are using Apache Spark as our computational engine both to make this data available to the training process as well as to train over years of data in a scalable way. The prediction performances are benchmarked and the models are served in production through custom real-time monitoring and alerting infrastructure that uses ElasticSearch as hot storage. With this state-of-the-art solution, Adyen knows whether a problem happened and can alert the operational teams accordingly in a record time. This presentation will cover the journey we took with focus on the mathematical concepts, the present time constraints, the prediction performances, and the architecture needed to make this happen. We'll go over lessons learned, pitfalls, and best practices discovered on modeling time series datasets with Apache Spark. Data Scientists would be able to gain insights on applying effective and real-life seasonality modeling techniques. We'll share our approaches used for sub-millisecond model serving that would inspire Data Engineers who work on related problems.