Zhengyi Le is a Deputy Director at Suning R&D Palo Alto, leading a big data team to build a unified big data machine learning platform to serve company-wide product lines. She is a believer of “data power” and enjoys engaging in large scale software R&D, building innovative products, and bringing them to market.
Boosted by Apache Spark's data processing engine, machine learning as a service (MLaaS) is now faster and more powerful. However, Spark MLlib is developing and is limited by data preprocessing algorithms. In this session, learn how Suning R&D's MLaaS platform abstracted, standardized and implemented a very rich machine learning pipeline on top of Spark, from data pre-processing, supervised and unsupervised modeling, performance evaluation, to model deployment. Their feature Spark extensions are: 1) a rich function set of data pre-processing, such as missing data treatment, many types of sampling, outlier detecting, advanced binning, etc.; (2) time series analysis/modeling algorithms; (3) domain-specific library for finance, such as cost sensitive decision tree for fraud detection; (4) a user-friendly drag-and-play codeless modeling canvas. Session hashtag: #SFexp14