Edwin Barnes is the performance architect at Sqrrl Data. He leads efforts in design and deployment of large scale performance testing systems for high-throughput distributed data processing and analytics applications. Previously he worked at a range of top tech and data companies including Vertica Systems, Unidesk and Dataupia. He has a vast experience in cloud computing and is a leading expert in quality assurance and performance optimization of distributed systems.
We all dread “Lost task” and “Container killed by YARN for exceeding memory limits” messages in our scaled-up spark yarn applications. Even answering the question “How much memory did my application use?” is surprisingly tricky in the distributed yarn environment. Sqrrl has developed a testing framework for observing vital statistics of spark jobs including executor-by-executor memory and CPU usage over time for both the JDK and python portions of pyspark yarn containers. This talk will detail the methods we use to collect, store, and report spark yarn resource usage. This information has proved to be invaluable for performance and regression testing of the spark jobs in Sqrrl Enterprise.