Sauptik Dhar is a Senior Research Scientist at Robert Bosch LLC. He has authored several peer reviewed and invited articles in renowned journals and conferences. He is currently an Associate Editor of Neural Processing Letters and has served as reviewer for several other international journals like, Neural Networks, Pattern Recognition, Neurocomputing, PLoS ONE, Neural Processing Letters, IEEE Systems Man and Cybernetics and Neurocomputing etc. He has also served in the Program committees of KDD and ICMLA and has been a reviewer for IJCNN/NIPS. He has been an expert speaker in various big-data symposiums. He received his PhD in 2013.
Apache Spark is rapidly becoming the de facto framework for big-data analytics. Spark’s built-in, large-scale Machine Learning Library (MLlib) uses traditional stochastic gradient descent (SGD) to solve standard ML algorithms. However, MlLib currently provides limited coverage of ML algorithms. Further, the convergence of the adopted SGD approach is heavily dictated by issues such as step-size selection, conditioning of the problem and so on, making it difficult for adoption by non-expert end users. In this session, the speakers introduce a large-scale ML tool built on the Alternating Direction Method of Multipliers (ADMM) on Spark to solve a gamut of ML algorithms. The proposed approach decomposes most ML problems into smaller sub-problems suitable for distributed computation in Spark. Learn how this toolkit provides a wider range of ML algorithms, better accuracy compared to MLlib, robust convergence criteria and a simple python API suitable for data scientists – making it easy for end users to develop advanced ML algorithms at scale, without worrying about the underlying intricacies of the optimization solver. It's a useful arsenal for data scientists' ML ecosystem on Spark. Session hashtag: #SFds15