Skip to main content
<
Page 11

MosaicML Delivers Leading NLP Performance in MLPerf v2.1

MosaicML leads the MLPerf NLP results, delivering a score of 7.9 minutes on 8x NVIDIA A100 GPUs in the Open Division, thanks to...

Mosaic LLMs: GPT-3 quality for <$500k

September 29, 2022 by Abhi Venigalla and Linden Li in
Training large language models (LLMs) costs less than you think. Using the MosaicML platform, we show how fast, cheap, and easy it is...

Mosaic LLMs (Part 1): Billion-Parameter GPT Training Made Easy

August 11, 2022 by Abhi Venigalla and Linden Li in
In Part 1 of this LLM blog post series, we use the MosaicML platform to train vanilla GPT-3 models up to 1.3B params...

Behind the Scenes: Setting a Baseline for Image Segmentation Speedups

July 27, 2022 by Landan Seguin in
We establish a new semantic segmentation baseline of 45.56 mIoU on the ADE20k segmentation benchmark in 3.5 hours on a system with 8x...

Mosaic ResNet Deep Dive

July 18, 2022 by Matthew Leavitt in
TL;DR: We recently released a set of recipes which can accelerate training of a ResNet-50 on ImageNet by up to 7x over standard...

MosaicML Satisfies the Need for Speed with MLPerf Results

MosaicML’s Open Division submission to the MLPerf Image Classification benchmark delivers a score of 23.8 minutes (4.5x speed-up relative to our baseline) on...

Farewell, CUDA OOM: Automatic Gradient Accumulation

June 23, 2022 by Mihir Patel and Erica Ji Yuen in
With automatic gradient accumulation, Composer lets users seamlessly change GPU types and number of GPUs without having to worry about batch size. CUDA...

Blazingly Fast Computer Vision Training with the Mosaic ResNet and Composer

Match benchmark accuracy on ImageNet (He et al., 2015) in 27 minutes, a 7x speedup (ResNet-50 on 8xA100s). Reach higher levels of accuracy...

Efficiently Estimating Pareto Frontiers with Cyclic Learning Rate Schedules

April 7, 2022 by Jacob Portes in
Benchmarking the tradeoff between model accuracy and training time is computationally expensive. Cyclic learning rate schedules can construct a tradeoff curve in a...