Contemporary applications and infrastructure software leave behind a tremendous volume of metric and log data. This aggregated “digital exhaust” is inscrutable to humans and difficult for computers to analyze, since it is vast, complex, and not explicitly structured.
This session will introduce the log processing domain and provide practical advice for analyzing log data with Apache Spark, including:
– how to impose a uniform structure on disparate log sources;
– machine-learning techniques to detect infrastructure failures automatically and characterize the text of log messages;
– best practices for tuning Spark, training models against structured data, and ingesting data from external sources like ElasticSearch; and
– a few relatively painless ways to visualize your results.
You’ll have a better understanding of the unique challenges posed by infrastructure log data after this session. You’ll also learn the most important lessons from our efforts both to develop analytic capabilities for an open-source log aggregation service and to evaluate these at enterprise scale.
William Benton is passionate about making it easier for machine learning practitioners to benefit from advanced infrastructure and making it possible for organizations to manage machine learning systems. His recent roles have included defining product strategy and professional services offerings related to data science and machine learning, leading teams of data scientists and engineers, and contributing to many open source communities related to data, ML, and distributed systems. Will was an early advocate of building machine learning systems on Kubernetes and developed and popularized the “intelligent applications” idiom for machine learning systems in the cloud. He has also conducted research and development related to static program analysis, language runtimes, cluster configuration management, and music technology.