Efficient Stable Diffusion Pre-Training on Billions of Images with Ray
OVERVIEW
EXPERIENCE | In Person |
---|---|
TYPE | Breakout |
TRACK | Data Science and Machine Learning |
INDUSTRY | Enterprise Technology |
TECHNOLOGIES | AI/Machine Learning, GenAI/LLMs |
SKILL LEVEL | Intermediate |
DURATION | 40 min |
DOWNLOAD SESSION SLIDES |
Stable Diffusion demonstrates the impressive ability to produce high-quality images consistently. However, pre-training a Stable Diffusion model is challenging because it is a long-running workload that ingests billions of images with complex preprocessing logic on hundreds of GPUs. To maximize performance and cost efficiency at such a scale, we need to deal with challenges such as scaling out data preprocessing, improving GPU utilization, fault tolerance, and managing heterogeneous clusters. In this talk, we will introduce how to use Ray Data and Ray Train to build an end-to-end pre-training solution that achieves large-scale state-of-the-art performance. Takeaways: Easily implement an end-to-end stable diffusion pre-training pipeline on billions of images using Ray. Improve efficiency and stability in large-scale multimodal data processing with Ray Data. Scale online preprocessing and distributed training using different GPU types to increase GPU utilization and reduce costs.
SESSION SPEAKERS
Yunxuan Xiao
/Software Engineer
Anyscale Inc.
Hao Chen
/Staff Software Engineer
Anyscale Inc.