Flyby: Improved Dense Matrix Multiplication

Download Slides

At Thomson Reuters Reseach and Development, we have found several business tasks with the same computational dependencies as distributed matrix multiplication, such as computing user-product recommendations and interpreting LDA topics. While matrix multiplication is an elementary distributed algorithm, Spark is inefficient on it. This talk proposes a formalism to express these kinds of tasks and demonstrates gains from optimizing it. In naive dense matrix multiplication, a cartesian or join generates n**3 data, which are shuffled and reduced to n**2 data. The shuffle is expensive; moreover, the partitioning of the reduction is known ahead of time because of the key assignment pattern, and this can be used to optimize the procedure. We propose a method called “flyby,” which allows the reduction to be localized and avoids the shuffle. I well introduce flyby and provide timing comparisons on business tasks and synthetic problems.