An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining with Spark Streaming - Databricks

An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining with Spark Streaming

Download Slides

Real-time/online machine learning is an integral piece in the machine learning landscape, particularly in regard to unsupervised learning. Areas such as focused advertising, stock price prediction, recommendation engines, network evolution and IoT streams in smart cities and smart homes are increasing in demand and scale. Continuously-updating models with efficient update methodologies, accurate labeling, feature extraction, and modularity for mixed models are integral to maintaining scalability, precision, and accuracy in high demand scenarios.
This session explores a real-time/online learning algorithm and implementation using Spark Streaming in a hybrid batch/ semi-supervised setting. It presents an easy-to-use, highly scalable architecture with advanced customization and performance optimization. Within this framework, we will examine some of the key methodologies for implementing the algorithm, including partitioning and aggregation schemes, feature extraction, model evaluation and correction over time, and our approaches to minimizing loss and improving convergence. The result is a simple, accurate pipeline that can be easily adapted and scaled to a variety of use cases.

The performance of the algorithm will be evaluated comparatively against existing implementations in both linear and logistic prediction. The session will also cover real-time uses cases of the streaming pipeline using real time-series data and present strategies for optimization and implementation to improve both accuracy and efficiency in a semi-supervised setting.

Session hashtag: #SFds19

About J White Bear

University of Michigan—Computer ScienceDatabases, Machine Learning/Computational Biology, Cryptography University of California San Francisco—Computational Biology/Bioinformatics Machine Learning/Multi Objective Optimization/Statistical Mechanics for Protein-Protein Interactions McGill University Machine Learning/Multi-objective Optimization for Path Planning/ Cryptography