Fred Reiss is Chief Architect at the IBM Spark Technology Center in San Francisco and is one of the founding employees of the Center. Fred received his Ph.D. from UC Berkeley in 2006, then worked for IBM Research Almaden for the next nine years. At Almaden, Fred worked on the SystemML and SystemT projects, as well as on the research prototype of DB2 with BLU Acceleration. Fred has over 25 peer-reviewed publications and six patents.
Machine learning in the enterprise is an iterative process. A data scientist will tweak or replace her learning algorithm until she finds an approach that works for the business problem and the available data. Apache SystemML is a new system that accelerates this kind of exploratory algorithm development for large-scale machine learning problems. SystemML provides a high-level language to quickly implement and run machine learning algorithms on Spark. SystemML's cost-based optimizer takes care of low-level decisions about how to use Spark's parallelism, allowing users to focus on the algorithm and the real-world problem that the algorithm is trying to solve. This talk will explain how SystemML automates the design decisions involved in translating a high-level algorithm into Spark API calls. The explanation will center around a three-line snippet of R code. We'll start by explaining several different ways that one could implement this code snippet on Spark. We'll show how, depending on the characteristics of the data and the Spark cluster, each of these approaches might work very well or not work at all. Then we'll explain how SystemML's optimizer enumerates these different execution strategies and chooses one that works. By the end of this process, we will have walked through how the code changes as it passes through each stage of SystemML's compilation chain, finally reaching the SystemML runtime for Spark. The talk will conclude with pointers to how the audience can try out Apache SystemML or learn more about the parts of SystemML's optimizer that weren't covered in the talk.
Apache SystemML is an open-source language and compiler that makes it dramatically easier to build custom machine learning solutions that scale automatically to massive data sizes. This talk will show how to deploy and use SystemML for algorithm development. I will start with some instructive examples of the importance of algorithm customization in machine learning. I'll show how algorithms that appear very similar can produce dramatically different results. Then I'll walk through the process of building a custom algorithm using Apache SystemML, starting with the software download and ending with running the new algorithm in parallel on Spark.