Masato Asahara - Databricks

Masato Asahara

Computer Scientist, NEC

Masato Asahara (Ph.D.) is currently leading developments of Spark-based machine learning and data analytics systems, which fully automate predictive modeling. Masato received his Ph.D. degree from Keio University, and has worked at NEC for 7 years as a researcher in the field of distributed computing systems and computing resource management technologies.


Distributed Heterogeneous Mixture Learning On Spark

NEC has developed a completely unique machine learning algorithm for FAB/HMEs (hierarchical mixture of experts using factorized asymptotic Bayesian inference) and is expanding data analytics business to enterprise customers. FAB/HMEs are highly-accurate and interpretable models which combine rule-based space partitioning and sparse linear models in individual partitions (a.k.a. piecewise sparse linear models). FAB/HMEs have already achieved many enterprise-level successes in real-world predictive analysis, as the core technology of so-called "Heterogeneous Mixture Learning"; e.g. energy/water demand forecasting to maximize utility of natural resource, sales forecasting to minimize food disposal in retail stores, repair parts demand prediction to optimize logistics inventory, and so on. In this session, we introduce our lessons learned from the development of the distributed learning algorithm for FAB/HMEs called dFAB and the algorithm implementation on Spark. To achieve the scale-out performance improvement on Spark, dFAB is carefully designed to reduce the communication cost between a driver process and executors and improve the multicore CPU utilization on each worker. There are two significant design features of dFAB. One is the RDD design which enables dFAB to exploit modern matrix calculation libraries like BLAS and Breeze. Another one is an efficient implementation of dFAB on Spark which reduces data transfers and the idle time of CPUs. We are going to disclose the experimental results which demonstrate the scale-out performance improvement of dFAB and its higher accuracy and interpretability than the algorithms currently implemented in Spark MLlib.