Claudiu Branzan

Principal Engineer, Atigeo

Claudiu Branzan is a principal engineering lead at Atigeo, leading a team of data scientists and software engineers who tackle complex challenges in machine learning, data mining, information retrieval, and statistics. Claudiu has over 10 years of real-world data science experience across industries including finance, healthcare, legal, mobile, and retail. He has co-authored multiple patents, and holds a master’s degree in industrial intelligent systems from the Polytechnic University of Timișoara.

SESSIONS

Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams

Fraud detection is a classic adversarial learning challenge: As soon as an automated system successfully learns to stop one scheme, fraudsters move on to attack another way. Each scheme requires looking for different signals (i.e. features) to catch; is relatively rare (one in millions for finance or e-commerce); and may take months to investigate a single case (in healthcare or tax, for example) - making quality training data scarce. This talk will cover, via live demo and code walk-through, the key lessons we've learned while building such real-world software systems over the past few years. We'll be building machine learning models from streams of free text, transactions and user feedback. The models will be continuously updated (online learning) to learn dynamically evolving schemes of anomaly and behavior change. The models will include features derived from natural language processing, graph (link) analysis, and time series analysis; the code walkthrough shows how to compute and combine them. Apache Spark is used to build & run these models at scale, while Kafka & Spark Streaming are used to process the incoming data streams. We'll discuss the data model, computation, and feedback workflows, resulting in an end-to-end, field-tested reference architecture. This talk will benefit anyone interested in adversarial learning, hybrid analytics and online learning at scale on top of Apache Spark. The demo's datasets & code will be made publicly available after the talk.