Mohak Shah is an analytics leader with over 15 years of experience in organization formation, strategy, and end-to-end data science engagements. He has led analytics and IoT engagements in domains including automotive, aviation, energy and healthcare managing diverse teams from research, software and businesses. As a scientist, Mohak has developed novel machine learning algorithms with high impact business applications. He is the author of “Evaluating Learning Algorithms: A Classification Perspective”, and has published more than 45 research articles, and patents He was the Chair of KDD 2016 and and holds an Adjunct position with the University of Illinois at Chicago.
Apache Spark is rapidly becoming the de facto framework for big-data analytics. Spark’s built-in, large-scale Machine Learning Library (MLlib) uses traditional stochastic gradient descent (SGD) to solve standard ML algorithms. However, MlLib currently provides limited coverage of ML algorithms. Further, the convergence of the adopted SGD approach is heavily dictated by issues such as step-size selection, conditioning of the problem and so on, making it difficult for adoption by non-expert end users. In this session, the speakers introduce a large-scale ML tool built on the Alternating Direction Method of Multipliers (ADMM) on Spark to solve a gamut of ML algorithms. The proposed approach decomposes most ML problems into smaller sub-problems suitable for distributed computation in Spark. Learn how this toolkit provides a wider range of ML algorithms, better accuracy compared to MLlib, robust convergence criteria and a simple python API suitable for data scientists – making it easy for end users to develop advanced ML algorithms at scale, without worrying about the underlying intricacies of the optimization solver. It's a useful arsenal for data scientists' ML ecosystem on Spark. Session hashtag: #SFds15