Over the last two decades Cory has worked across the spectrum of software engineering; from embedded systems to massive high volume Kafka pipelines. Building systems that directly address the needs of the user drives him to work on cutting-edge products. Cory joined Verta from New Relic where he spent more than 6 years as a Lead Engineer working on multiple products including Insights, Mobile, and SixthSense. This gives him a unique perspective on monitoring at scale.
May 27, 2021 05:00 PM PT
Application performance monitoring (APM) has become the cornerstone of software engineering allowing engineering teams to quickly identify and remedy production issues. However, as the world moves to intelligent software applications that are built using machine learning, traditional APM quickly becomes insufficient to identify and remedy production issues encountered in these modern software applications.
As a lead software engineer at NewRelic, my team built high-performance monitoring systems including Insights, Mobile, and SixthSense. As I transitioned to building ML Monitoring software, I found the architectural principles and design choices underlying APM to not be a good fit for this brand new world. In fact, blindly following APM designs led us down paths that would have been better left unexplored.
In this talk, I draw upon my (and my team’s) experience building an ML Monitoring system from the ground up and deploying it on customer workloads running large-scale ML training with Spark as well as real-time inference systems. I will highlight how the key principles and architectural choices of APM don’t apply to ML monitoring. You’ll learn why, understand what ML Monitoring can successfully borrow from APM, and hear what is required to build a scalable, robust ML Monitoring architecture.