BoF-How to bring the pipeline built in Notebook / Apache Spark to production, and machine learning deployment cycles

Notebook is a widely used tools for data scientists to analyze the data to find insight and build learning models. However, there is a gap bringing the notebook into production pipeline. How do we streamline the process to deploy the notebook into production? If we find something needed to be improved in production, how do we shorten the cycles? How do we make sure there is no discrepancy between the online feature generation which will be used for online service and the features generated offline for model training?

« back
About DB Tsai

DB Tsai is an Apache Spark PMC / Committer and an open source and big data engineer at Apple. He implemented several algorithms including linear models with Elastici-Net (L1/L2) regularization using LBFGS/OWL-QN optimizers in Apache Spark. Prior to joining Apple, DB worked on Personalized Recommendation ML Algorithms at Netflix. DB was a Ph.D. candidate in Applied Physics at Stanford University. He holds a Master's degree in Electrical Engineering from Stanford.