Fred Reiss is the Chief Architect at IBM’s Center for Open-Source Data and AI Technologies in San Francisco. Fred received his Ph.D. from UC Berkeley in 2006, then worked for IBM Research Almaden for the next nine years. At Almaden, Fred worked on the SystemML and SystemT projects, as well as on the research prototype of DB2 with BLU Acceleration. Fred has over 25 peer-reviewed publications and six patents.
We’ve all heard that AI is going to become as ubiquitous in the enterprise as the telephone, but what does that mean exactly? Everyone in IBM has a telephone; and everyone knows how to use her telephone; and yet IBM isn’t a phone company. How do we bring AI to the same standard of ubiquity --- where everyone in a company has access to AI and knows how to use AI; and yet the company is not an AI company? In this talk, we’ll break down the challenges a domain expert faces today in applying AI to real-world problems. We’ll talk about the challenges that a domain expert needs to overcome in order to go from “I know a model of this type exists” to “I can tell an application developer how to apply this model to my domain.” We’ll conclude the talk with a live demo that show cases how a domain expert can cut through the five stages of model deployment in minutes instead of days using IBM and other open source tools.
Machine learning in the enterprise is an iterative process. A data scientist will tweak or replace her learning algorithm until she finds an approach that works for the business problem and the available data. Apache SystemML is a new system that accelerates this kind of exploratory algorithm development for large-scale machine learning problems. SystemML provides a high-level language to quickly implement and run machine learning algorithms on Spark. SystemML's cost-based optimizer takes care of low-level decisions about how to use Spark's parallelism, allowing users to focus on the algorithm and the real-world problem that the algorithm is trying to solve. This talk will explain how SystemML automates the design decisions involved in translating a high-level algorithm into Spark API calls. The explanation will center around a three-line snippet of R code. We'll start by explaining several different ways that one could implement this code snippet on Spark. We'll show how, depending on the characteristics of the data and the Spark cluster, each of these approaches might work very well or not work at all. Then we'll explain how SystemML's optimizer enumerates these different execution strategies and chooses one that works. By the end of this process, we will have walked through how the code changes as it passes through each stage of SystemML's compilation chain, finally reaching the SystemML runtime for Spark. The talk will conclude with pointers to how the audience can try out Apache SystemML or learn more about the parts of SystemML's optimizer that weren't covered in the talk.
Apache SystemML is an open-source language and compiler that makes it dramatically easier to build custom machine learning solutions that scale automatically to massive data sizes. This talk will show how to deploy and use SystemML for algorithm development. I will start with some instructive examples of the importance of algorithm customization in machine learning. I'll show how algorithms that appear very similar can produce dramatically different results. Then I'll walk through the process of building a custom algorithm using Apache SystemML, starting with the software download and ending with running the new algorithm in parallel on Spark.