Session

Scaling XGBoost With Spark Connect ML on Grace Blackwell

Overview

Experience	In Person
Type	Breakout
Track	Artificial Intelligence
Industry	Retail and CPG - Food, Financial Services
Technologies	Apache Spark
Skill Level	Intermediate
Duration	40 min

XGBoost is one of the off-the-shelf gradient boosting algorithms for analyzing tabular datasets. Unlike deep learning, gradient-boosting decision trees require the entire dataset to be in memory for efficient model training. To overcome the limitation, XGBoost features a distributed out-of-core implementation that fetches data in batch, which benefits significantly from the latest NVIDIA GPUs and the NVLink-C2C’s ultra bandwidth.

In this talk, we will share our work on optimizing XGBoost using the Grace Blackwell super chip. The fast chip-to-chip link between the CPU and the GPU enables XGBoost to scale up without compromising performance. Our work has effectively increased XGBoost’s training capacity to over 1.2TB on a single node.

The approach is scalable to GPU clusters using Spark, enabling XGBoost to handle terabytes of data efficiently. We will demonstrate combining XGBoost out-of-core algorithms with the latest connect ML from Spark 4.0 for large model training workflows.

Session Speakers

IMAGE COMING SOON

Bobby Wang

/Engineer
Nvidia

Jiaming Yuan

/Engineer
NVIDIA