Deep Learning for Large-Scale Online Fraud Detection—Fighting Fraudsters Among Billions of Users - Databricks

Deep Learning for Large-Scale Online Fraud Detection—Fighting Fraudsters Among Billions of Users

Download Slides

Here we present a real-time, scalable online fraud detection solution backed by deep learning technique. Nowadays, most deep learning applications are seen in actively studied fields including computer vision, natural language processing, etc. Our current solution represents one of the few production examples where deep learning models are applied to security problems. Our results demonstrate that deep learning solution outforms traditional blacklist and machine learning approaches significantly at terabyte-data scale.

Online fraud is largely orchestrated by organized crime rings. Coordinated malicious user accounts, either created anew, or obtained via user hijacking, actively target the various modern online service for real-world financial gain. Existing fraud solutions either rely on reputation lists for blocking known suspicious activities, or require extensive feature engineering by human analysts for model training. These approaches do not adapt well to changing fraud patterns nor are able to scale to large data volumes. At DataVisor, we analyze activities from billions of accounts across global online services to detect fraud and abuse. These data gives us unique insights into the online fraud landscape that allow us to tackle the coordinated fraud attacks holistically.

Our deep learning solution is based on digital information commonly collected by online services, including IP addresses, user-agent strings, email domains, user nicknames, etc. We build a general fraud detection framework which can identify fraudulent activities in log data that contain (all or a subnet of) these common digital information. By leveraging common digital information, the model is agnostic to the specific application or service from which data queries originate. We discuss the design and implementation of our deep learning pipeline based on Spark and Tensorflow that is built to fit our multi-cloud, real-time production requirements. We also demonstrate how our system outperforms traditional solutions including blacklists and machine learning methods.

Session hashtag: #DLSAIS17

« back
About Arthur Meng

Arthur Meng currently works at DataVisor as Tech Lead in Algorithm Platform team. His work includes building machine learning infrastructure, developing deep learning algorithms and improving various aspects of unsupervised machine learning algorithm for fraud detection. Prior to DataVisor, Arthur Meng earned his Ph.D degree from Stanford University, where his first author work were published in Nature, Nature Communications etc.

About Ting-Fang Yen

Ting-Fang Yen is Director of Research at DataVisor. Her work focuses on the detection of online threats, including malware, malicious insiders and intrusions, and online fraud. Her research has shaped product directions and published at top industry and academic security conferences. She was previously principal research scientist at RSA Labs and threat scientist at E8 Security. Dr. Yen received her Ph.D. degree from Carnegie Mellon University.