Nonlinear methods are widely used to produce higher performance compared with linear methods; however, nonlinear methods are generally more expensive in model size, training time, and scoring phase. With proper feature engineering techniques like polynomial expansion, the linear methods can be as competitive as those nonlinear methods. In the process of mapping the data to higher dimensional space, the linear methods will be subject to overfitting and instability of coefficients which can be addressed by penalization methods including Lasso and Elastic-Net. Finally, we’ll show how to train linear models with Elastic-Net regularization using MLlib.
DB Tsai is an Apache Spark PMC and committer and a Senior Research Engineer working on Personalized Recommendation Algorithms at Netflix. He implemented several algorithms including linear models with Elastici-Net (L1/L2) regularization using LBFGS/OWL-QN optimizers in Apache Spark. Prior to joining Netflix, DB was a Lead Machine Learning Engineer at Alpine Data Labs, where he led a team to develop innovative large-scale distributed learning algorithms, and then contributed back to open source Apache Spark project. DB was a Ph.D. candidate in Applied Physics at Stanford University. He holds a Master's degree in Electrical Engineering from Stanford.