Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins

This talk will cover some practical aspects of Apache Spark monitoring, focusing on measuring Apache Spark running on cloud environments, and aiming to empower Apache Spark users with data-driven performance troubleshooting. Apache Spark metrics allow extracting important information on Apache Spark’s internal execution. In addition, Apache Spark 3 has introduced an improved plugin interface extending the metrics collection to third-party APIs. This is particularly useful when running Apache Spark on cloud environments as it allows measuring OS and container metrics like CPU usage, I/O, memory usage, network throughput, and also measuring metrics related to cloud filesystems access. Participants will learn how to make use of this type of instrumentation to build and run an Apache Spark performance dashboard, which complements the existing Spark WebUI for advanced monitoring and performance troubleshooting.

About Luca Canali

Luca is a data engineer at CERN with the Hadoop, Spark, streaming, and database services. Luca has 20+ years of experience with designing, deploying, and supporting enterprise-level database and data services with a special interest in methods and tools for performance troubleshooting. Luca is active in developing and supporting platforms for data analytics and ML for the CERN community, including the LHC experiments, the accelerator sector, and CERN IT. He enjoys sharing experience and knowledge with data communities in science and industry at large.