Session

Scaling XGBoost With Spark Connect ML on Grace Blackwell

Overview

ExperienceIn Person
TypeBreakout
TrackArtificial Intelligence
IndustryRetail and CPG - Food, Financial Services
TechnologiesApache Spark
Skill LevelIntermediate
Duration40 min

XGBoost is one of the off-the-shelf gradient boosting algorithms for analyzing tabular datasets. Unlike deep learning, gradient-boosting decision trees require the entire dataset to be in memory for efficient model training. To overcome the limitation, XGBoost features a distributed out-of-core implementation that fetches data in batch, which benefits significantly from the latest NVIDIA GPUs and the NVLink-C2C’s ultra bandwidth.

 

In this talk, we will share our work on optimizing XGBoost using the Grace Blackwell super chip. The fast chip-to-chip link between the CPU and the GPU enables XGBoost to scale up without compromising performance. Our work has effectively increased XGBoost’s training capacity to over 1.2TB on a single node.

 

The approach is scalable to GPU clusters using Spark, enabling XGBoost to handle terabytes of data efficiently. We will demonstrate combining XGBoost out-of-core algorithms with the latest connect ML from Spark 4.0 for large model training workflows.

Session Speakers

IMAGE COMING SOON

Bobby Wang

/Engineer
Nvidia

Jiaming Yuan

/Engineer
NVIDIA