Architecture Analysis for ETL Processing: CPU vs GPU
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Lightning Talk |
TRACK | Data Lakehouse Architecture |
INDUSTRY | Enterprise Technology |
TECHNOLOGIES | Apache Spark, ETL |
SKILL LEVEL | Advanced |
DURATION | 20 min |
DOWNLOAD SESSION SLIDES |
GPUs are well-known as accelerators for DL and ML workloads. This session will describe how GPUs can accelerate batch ETL operations. We will review the CPU and GPU architectures, including an overview of their memory subsystems. We provide a roofline analysis for individual database operations like joins, aggregations, and data compression. We discuss why these operations are well suited for GPU acceleration and can achieve up to an order of magnitude speedup. Using industry-standard benchmark queries, we demonstrate full end-to-end SQL query acceleration using GPUs in a prototype query engine. We compare the results to existing CPU solutions. Finally, we will review the performance of the same queries at a 3TB scale with the RAPIDS Accelerator for Apache Spark™, a plugin to Apache Spark™ that enables GPU acceleration with no code change.
SESSION SPEAKERS
Jason Lowe
/Distinguished System Software Engineer
NVIDIA
Nikolay Sakharnykh
/Senior AI Developer Technology Manager
NVIDIA