Kaushik Tadikonda

Software Engineer, IBM

Kaushik Tadikonda is a software engineer for Enterprise Performance Management at IBM, where he builds applications that identify problems with ETL pipelines. His day-to-day work involves optimizing Spark jobs and deploying, monitoring, and designing infrastructure. He is frequently interested in understanding how things work on a low level, which often causes more problems than it solves.

Past sessions

Summit 2020 Fine Tuning and Enhancing Performance of Apache Spark Jobs

June 24, 2020 05:00 PM PT

Apache Spark defaults provide decent performance for large data sets but leave room for significant performance gains if able to tune parameters based on resources and job. We'll dive into some best practices extracted from solving real world problems, and steps taken as we added additional resources. garbage collector selection, serialization, tweaking number of workers/executors, partitioning data, looking at skew, partition sizes, scheduling pool, fairscheduler, Java heap parameters. Reading sparkui execution dag to identify bottlenecks and solutions, optimizing joins, partition. By spark sql for rollups best practices to avoid if possible