Spark and MLlib are a terrific toolkit for fitting very large scale classification, regression, collaborative filtering, and clustering models. However, taking a vague problem statement like “”learn a classifier”” and translating that into a working model presently requires a high degree of hand tuning and trial and error. Additionally, when models are expensive to train, shortening this process can significantly reduce total investment in developing a model.
We present our work on the MLbase optimizer – a system designed to quickly search hyperparemeter space to find a good model without manual effort from the user. Our system, built on Spark, offers a 10x speedup over naive methods for model search by leveraging performance enhancements, better search algorithms, and statistical heuristics.