Apache Spark and its ecosystem provide many instrumentation points, metrics, and monitoring tools that you can use to improve the performance of your jobs and understand how your Spark workloads are utilizing the available system resources. Spark 3.0 comes with several important additions and improvements to the monitoring system. This talk will cover the new features, review some readily available solutions to use them, and will provide examples and feedback from production usage at the CERN Spark service. Topics covered will include Spark executor metrics for fine-grained memory monitoring and extensions to the Spark monitoring system using Spark 3.0 Plugins. Plugins allow us to deploy custom metrics extending the Spark monitoring system to measure, among other things, I/O metrics for cloud file systems like S3, OS metrics, and custom metrics provided by external libraries.
Speaker: Luca Canali
Luca is a data engineer at CERN with the Hadoop, Spark, streaming and database services. Luca has 20 years of experience with architecting, deploying and supporting enterprise-level database and data services with a special interest in methods and tools for performance troubleshooting. Luca is working in developing and supporting solutions for data analytics and ML for the CERN community, including LHC experiments, the accelerator sector and CERN IT. He enjoys taking part and sharing knowledge with the open source, science, and industry data community at large.