Murali Kaundinya is a senior strategist with technology and architecture with extensive leadership and management consulting experience. He has served in a leadership role conceiving, executing and delivering transformational programs with Fortune 100 enterprises including financial services, health and life sciences, insurance and advanced technology. He was a Technology Fellow at Goldman Sachs where he transformed the firm’s distributed software engineering practices into a centrally managed platform optimizing on innovation, productivity, risk management and compliance. He has held leadership roles at Sun Microsystems, Inc. (now under Oracle) where he provided strategy consulting services to CXOs of Sun’s top clients across the world. Murali started his career at NASA/Goddard SpaceFlight Center, Greenbelt, Maryland and has published and presented extensively on many technology topics. He holds several patents in RFID and in the field of Telemedicine.
May 27, 2021 03:50 PM PT
Classic event, incident, problem and change management are ITSM practices that are getting integrated with DevOps/SRE and ML through a competency known as AIOps. Large streams of data generated through logs, metrics and traces are organized and computed using machine learning algorithms to extract insights on the anomalies of system behavior that could be impacting end-users and business transactions. Businesses cannot afford to see their end-users impacted by those anomalies and therefore would want to proactively predict the likelihood of systems regressing and take corrective action long before any material impact.
In this talk, we show the use of simple linear regression and multivariate linear regression techniques to predict the likelihood of system behavior resulting in one or two sigma of standard deviation. We show how to use FOSS tools to predict them using various decision trees that are integrated to high performing streaming platforms like Apache Flink, Apache Beam, Prometheus and Grafana which makes it a lot easier to visualize the various alerts and triage their way back to performing root cause analysis. These high performing systems are also backed by KAFKA for its streaming and distributed computing capabilities by partitioning the data for various staged analysis some of which can be done in parallel and concurrently based on the use cases. We present a fully integrated architecture that helps you realize a commercial AIOps capability without having to license expensive software products. The above open architecture allows you to implement various ML algorithms as needed and its agnostic to programming languages and tools.
The talk will combine various techniques with demos and is focused to practicing engineers and developers who are familiar with ML.