For predicting vehicle defects at BMW, a machine learning pipeline evaluating several thousand features was implemented. As important features can be useful for evaluating specific defects, a feature selection approach has been used. For further evaluating the importance of features, several feature selection techniques (filters and wrappers) have been implemented as ml PipelineStages for usage on dataframes for incorporation in a complete Spark ml Pipeline, including preprocessing and classification. The general steps for building custom Spark ml Estimators are presented. The API of the newly implemented feature selection techniques is demonstrated and results of a performance analysis are shown. Besides that, experiences gained and pitfalls that should be avoided are shared.
Session hashtag: #EUds16
Marc is a master student in the faculty of electrical engineering at the Technical University of Munich. He has special interest in automation and robotics and machine learning and is currently utilizing Spark to model vehicle defects at BMW.