Andreu Mora is Senior Data Scientist at Adyen. His journey has taken him mostly to data processing, data analysis, space missions engineering, product management, team leading and more recently fintech data science. Before Adyen Andreu used to work for European Space Agency as technical product manager and data analysis engineering team leader, where he helped making algorithms to answer mission engineering questions. Andreu holds MSc. Telecommunication Engineering from Universitat PolitÌ¬cnica de Catalunya with focus on Maths, Signal and Image processing.
Adyen enables integrating companies to accept payments from their customers using any payment method over any sales channel. We have designed and implemented a time series forecasting algorithm that allows us to predict the volume for each integration with confidence and thus be able to flag anomalies such as traffic drop or abnormally low traffic. We are using Apache Spark as our computational engine both to make this data available to the training process as well as to train over years of data in a scalable way. The prediction performances are benchmarked and the models are served in production through custom real-time monitoring and alerting infrastructure that uses ElasticSearch as hot storage. With this state-of-the-art solution, Adyen knows whether a problem happened and can alert the operational teams accordingly in a record time. This presentation will cover the journey we took with focus on the mathematical concepts, the present time constraints, the prediction performances, and the architecture needed to make this happen. We'll go over lessons learned, pitfalls, and best practices discovered on modeling time series datasets with Apache Spark. Data Scientists would be able to gain insights on applying effective and real-life seasonality modeling techniques. We'll share our approaches used for sub-millisecond model serving that would inspire Data Engineers who work on related problems.