Gevorg is the Lead AI Research Engineer at Altocloud. He has received his BA from Yerevan State University Applied Math faculty and his MSc from American University of Armenia. With many years of experience in building Software Systems in various industries, he specializes in Machine Learning, Deep Learning, and functional programming. Gevorg is the winner of Best Master Student in Technology and Sciences award in 2013 awarded by the President of Republic of Armenia.
Spark ML Pipelines provide a comprehensive framework for predictive modeling, including feature engineering, batch model training, and real-time predictions based on streams of data. For example, a model predicting likelihood of cart abandonment may be trained periodically using features based on Web activity of customers and applied to a stream of Web events to make real-time predictions for live users. However, in multi-tenant environments where streams contain events from different sources, application of ML Pipelines becomes difficult. Even though the pipeline paradigm can be applied to model training using datasets that contain events separated by source, generating real-time prediction in Spark Streaming poses multiple challenges, since a single micro-batch contains events that require evaluation of different pipelines. In this talk we will show how Altocloud applies Spark Pipelines to train hundreds of predictive models and to enable real-time predictions on high-throughput heterogeneous data streams. In particular we will focus on: 1. Training multiple models for activity streams from different sources. 2. Application of these models in real-time to a heterogeneous stream of events containing behavioural data for millions of users. 3. Automated training, validation, selection, and deployment of multiple predictive models in a multi-tenant environment at scale. Session hashtag: #EUds4