In the last few years, deep learning has achieved dramatic success in a wide range of domains, including computer vision, artificial intelligence, speech recognition, natural language processing and reinforcement learning. However, good performance comes at a significant computational cost. This makes scaling training expensive, but an even more pertinent issue is inference, in particular for real-time applications (where runtime latency is critical) and edge devices (where computational and storage resources may be limited). This talk will explore common techniques and emerging advances for dealing with these challenges, including best practices for batching; quantization and other methods for trading off computational cost at training vs inference performance; architecture optimization and graph manipulation approaches.
Nick Pentreath is a principal engineer in IBM's Center for Open-source Data & AI Technology (CODAIT), where he works on machine learning. Previously, he cofounded Graphflow, a machine learning startup focused on recommendations. He has also worked at Goldman Sachs, Cognitive Match, and Mxit. He is a committer and PMC member of the Apache Spark project and author of Machine Learning with Spark. Nick is passionate about combining commercial focus with machine learning and cutting-edge technology to build intelligent systems that learn from data to add business value.