Wei is a data scientist at Salesforce building detection engines for cybersecurity. Previously he worked as applied researcher at Microsoft and Ebay for shipping machine learning products for a variety of business problems, including speech recognition, recommendation system and marketing campaign. Wei holds a Ph.D. in Applied Mathematics and M.S. degree in Computer Science at Michigan State University with a focus of machine learning and AI.
Salesforce recently invented and deployed a real-time, scalable, terabyte data-level and low false positive personalized anomaly detection system. Anomaly detection on user in-app behavior at terabyte-data scale is extremely challenging because traditional techniques like clustering methods suffer serious production performance issues. Salesforce's method tackles the traditional challenges through three phases: 1) Leveraging Principal Component Analysis (PCA) to extract high-variance and low-variance feature subsets. The low-variance feature subset is valuable in cybersecurity because we want to determine if a user deviates from his or her stable behavior. The high-variance one is used for dimension reduction; 2) On each feature subset, they build a profile for each user to characterize the user’s baseline behavior and legitimate abnormal behavior; 3) During detection, for each incoming event, their method will compare it with the user’s profile and produce an anomaly score. The computation complexity of the detection module for each incoming event is constant. st cloud computing platforms; the novelty of our user behavior profiling based anomaly detection technique and the challenges of implementing and deploying it with Apache Spark in production. We will also demonstrate how our system outperforms the other traditional machine learning algorithms. Session Hashtag: #SFml5