Elena Boiarskaia - Databricks

Elena Boiarskaia

Data Scientist, H2O.ai

As a Senior Solutions Engineer with H2O.ai, Elena is passionate about helping customers solve advanced data science problems while maximizing business value. Coming from a diverse quantitative background, Elena earned her PhD from the University of Illinois at Urbana-Champaign with a dissertation focused on using machine learning to find patterns in accelerometer data and predict health outcomes.​ Previously, Elena worked with a variety of customer data science use cases while at Databricks, as well building machine learning models to identify manipulative activity in the US markets as the Lead Data Scientist at the Financial Industry Regulatory Authority (FINRA). In her spare time, Elena trains Brazilian Jiu Jitsu and dances classical ballet.

UPCOMING SESSIONS

PAST SESSIONS

Detecting Financial Fraud at Scale with Machine LearningSummit Europe 2019

Detecting fraudulent patterns at scale is a challenge given the massive amounts of data to sift through, the complexity of the constantly evolving techniques, and the very small number of actual examples of fraudulent behavior. In finance, added security concerns and the importance of explaining how fraudulent behavior was identified further increases the difficulty of the task. Legacy systems rely on rule-based detection that is difficult to implement and run at scale. The resulting code is very complex and brittle, making it difficult to update to keep up with new threats.

In this talk, we will go over how to convert a rule based financial fraud detection program to use machine learning on Spark as part of a scalable, modular solution. We will examine how to identify appropriate features and labels and how to create a feedback loop that will allow the model to evolve and improve overtime. We will also look at how MLflow may be leveraged throughout this effort for experiment tracking and model deployment.

Specifically, we will discuss:
-How to create a fraud-detection data pipeline
-How to leverage a framework for building features from large datasets
-How to create modular code to re-use and maintain new machine learning models
-How to choose appropriate models and algorithms for a given fraud-detection problem