Structured Streaming is a new API in Spark 2.0 that simplifies the end to end development of continuous applications. One such continuous application is online model updates: Online models are incrementally updated with new data and can be continuously queried while being updated. As a result, they can be fast to train and leverage new data faster than offline algorithms. In this talk, we give a brief introduction the area of online learning and describe how online model updates can be built using structured streaming APIs. The end result is a robust pipeline for updating models that is scalable, fast and fault tolerant.
I am the Product Manager for open source efforts at Databricks. Prior experience includes Spark and Data Science Architect at Hortonworks, Principal Research Scientist at Yahoo focused on large scale data mining and machine learning for search and display advertising. I am an Apache Spark PMC Member and Committer.