Nick Pentreath is a principal engineer in IBM’s Center for Open-source Data & AI Technology (CODAIT), where he works on machine learning. Previously, he cofounded Graphflow, a machine learning startup focused on recommendations. He has also worked at Goldman Sachs, Cognitive Match, and Mxit. He is a committer and PMC member of the Apache Spark project and author of ‘Machine Learning with Spark’. Nick is passionate about combining commercial focus with machine learning and cutting-edge technology to build intelligent systems that learn from data to add business value.
In the last few years, deep learning has achieved dramatic success in a wide range of domains, including computer vision, artificial intelligence, speech recognition, natural language processing and reinforcement learning. However, good performance comes at a significant computational cost. This makes scaling training expensive, but an even more pertinent issue is inference, in particular for real-time applications (where runtime latency is critical) and edge devices (where computational and storage resources may be limited). This talk will explore common techniques and emerging advances for dealing with these challenges, including best practices for batching; quantization and other methods for trading off computational cost at training vs inference performance; architecture optimization and graph manipulation approaches.
A deep learning model is often viewed as fully self-contained, freeing practitioners from the burden of data processing and feature engineering. However, in most real-world applications of AI, these models have similarly complex requirements for data pre-processing, feature extraction and transformation as more traditional ML models. Any non-trivial use case requires care to ensure no model skew exists between the training-time data pipeline and the inference-time data pipeline.
This is not simply theoretical - small differences or errors can be difficult to detect but can have dramatic impact on the performance and efficacy of the deployed solution. Despite this, there are currently few widely accepted, standard solutions for enabling simple deployment of end-to-end deep learning pipelines to production. Recently, the Open Neural Network Exchange (ONNX) standard has emerged for representing deep learning models in a standardized format.
While this is useful for representing the core model inference phase, we need to go further to encompass deployment of the end-to-end pipeline. In this talk I will introduce ONNX for exporting deep learning computation graphs, as well as the ONNX-ML component of the specification, for exporting both 'traditional' ML models as well as common feature extraction, data transformation and post-processing steps.
I will cover how to use ONNX and the growing ecosystem of exporter libraries for common frameworks (including TensorFlow, PyTorch, Keras, scikit-learn and now Apache SparkML) to deploy complete deep learning pipelines.
Finally, I will explore best practices for working with and combining these disparate exporter toolkits, as well as highlight the gaps, issues and missing pieces to be taken into account and still to be addressed.
Continuous integration and deployment has become an increasingly standard and common practice in software development. However, doing this for machine learning models and applications introduces many challenges. Not only do we need to account for standard code quality and integration testing, but how do we best account for changes in model performance metrics coming from changes to code, deployment framework or mechanism, pre- and post-processing steps, changes in data, not to mention the core deep learning model itself?
In addition, deep learning presents particular challenges:
* model sizes are often extremely large and take significant time and resources to train
* models are often more difficult to understand and interpret making it more difficult to debug issues
* inputs to deep learning are often very different from the tabular data involved in most 'traditional machine learning' models
* model formats, frameworks and the state-of-the art models and architectures themselves are changing extremely rapidly
* usually many disparate tools are combined to create the full end-to-end pipeline for training and deployment, making it trickier to plug together these components and track down issues.
We also need to take into account the impact of changes on wider aspects such as model bias, fairness, robustness and explainability. And we need to track all of this over time and in a standard, repeatable manner. This talk explores best practices for handling these myriad challenges to create a standardized, automated, repeatable pipeline for continuous deployment of deep learning models and pipelines. I will illustrate this through the work we are undertaking within the free and open-source IBM Model Asset eXchange.
In the last few years, RNNs have achieved significant success in modeling time series and sequence data, in particular within the speech, language, and text domains. Recently, these techniques have been begun to be applied to session-based recommendation tasks, with very promising results. This talk explores the latest research advances in this domain, as well as practical applications. I will provide an overview of RNNs, covering common architectures and applications, before diving deeper into RNNs for session-based recommendations. I will pay particular attention to the challenges inherent in common personalization tasks and the specific adjustments to models and optimization techniques required for success. Session hashtag: #SAISDD1
The popular version of applying deep learning is that you take an open-source or research model, train it on raw data and deploy the resulting model as a fully self-contained artifact. However, the reality is far more complex. For the training phase, users face an array of challenges including handling varied deep learning frameworks, hardware requirements and configurations, not to mention code quality, consistency and packaging. For the deployment phase, they face another set of challenges ranging from custom requirements for data pre- and post-processing, to inconsistencies across frameworks, to lack of standardization in serving APIs. The goal of the IBM Code Model Asset eXchange (MAX) is to remove these barriers to entry for developers to obtain, train and deploy open-source deep learning models for their enterprise applications. In building the exchange, we encountered all these challenges and more. For the training phase, we aim to leverage the Fabric for Deep Learning (FfDL: https://github.com/IBM/FfDL), an open-source project providing framework-independent training of deep learning models on Kubernetes. For the deployment phase, MAX provides container-based, fully self-contained model artefacts, encompassing the end-to-end deep learning predictive pipeline and exposing a standardized REST API. This talk explores the process of building MAX, the challenges and problems encountered, the solutions developed, the lessons learned along the way and the future and best practices for cross-framework, standardized deep learning model training and deployment. Session hashtag: #SAISDL6
In the last few years, deep learning has achieved significant success in a wide range of domains, including computer vision, artificial intelligence, speech, NLP, and reinforcement learning. However, deep learning in recommender systems has, until recently, received relatively little attention. This talks explores recent advances in this area in both research and practice. I will explain how deep learning can be applied to recommendation settings, architectures for handling contextual data, side information, and time-based models, and compare deep learning approaches to other cutting-edge contextual recommendation models, and finally explore scalability issues and model serving challenges. Session hashtag: #AISAIS13
Tuning a Spark ML model with cross-validation can be an extremely computationally expensive process. As the number of hyperparameter combinations increases, so does the number of models being evaluated. The default configuration in Spark is to evaluate each of these models one-by-one to select the best performing. When running this process with a large number of models, if the training and evaluation of a model does not fully utilize the available cluster resources then that waste will be compounded for each model and lead to long run times. Enabling model parallelism in Spark cross-validation, from Spark 2.3, will allow for more than one model to be trained and evaluated at the same time and make better use of cluster resources. We will go over how to enable this setting in Spark, what effect this will have on an example ML pipeline and best practices to keep in mind when using this feature. Additionally, we will discuss ongoing work in progress to reduce the amount of computation required when tuning ML pipelines by eliminating redundant transformations and intelligently caching intermediate datasets. This can be combined with model parallelism to further reduce the run time of cross-validation for complex machine learning pipelines. Session hashtag: #DS6SAIS
The common perception of machine learning is that it starts with data and ends with a model. In real-world production systems, the traditional data science and machine learning workflow of data preparation, feature engineering and model selection, while important, is only one aspect. A critical missing piece is the deployment and management of models, as well as the integration between the model creation and deployment phases. This is particularly challenging in the case of deploying Apache Spark ML pipelines for low-latency scoring. While MLlib's DataFrame API is powerful and elegant, it is relatively ill-suited to the needs of many real-time predictive applications, in part because it is tightly coupled with the Spark SQL runtime. In this talk I will introduce the Portable Format for Analytics (PFA) for portable, open and standardized deployment of data science pipelines & analytic applications. I'll also introduce and evaluate Aardpfark, a library for exporting Spark ML pipelines to PFA, as well as compare and contrast it to other available alternatives including PMML, MLeap, ONNX and Apple's CoreML. Session hashtag: #ML1SAIS
The talk will cover how Graphflow uses Spark to power its real-time recommendation and customer intelligence platform. We will cover how we use Spark and MLlib to process and analyze customer behavior data for recommendation and predictive analytics models. We will also give an overview of using Spark and Shark to power data aggregation and analytics for customer insights and front-end data visualization apps.