While there are many extension points in Spark, there are still situations when one wishes there would be one more configuration setting, another listener type, or a new hook to collect custom metrics. Although modifying the Spark source code and running the patched version to achieve such custom extensions are possible, they are not easy, maintainable, or advised in production. That’s where Java agents come to play. By using Java agents, it is possible to enhance or modify the core Spark functionality in reliable and efficient ways. This talk will present the concept of Java agents and show what it takes to write one. It will demonstrate the capabilities of Java agents by showing agents that log extra information, collect custom metrics, as well as auto-tune the Spark configuration on the fly. The talk will cover how Java agents can be packaged and distributed throughout the cluster in easy and non-intrusive ways. This talk will also be useful to Spark developers and researchers who want to prototype new Spark features rapidly and experiment with them.
Jaroslav is currently a member of the engineering team at Unravel. Focusing primarily on low overhead collection of application metrics he is utilizing his experience from working on the JVM platform, being a member of JVM serviceability team, and JVM performance tools like VisualVM and NetBeans Profiler. He is also the maintainer of the BTrace - a dynamic instrumentation for JVM.