Skip to main content
Page 1
>

Benchmarking Domain Intelligence

Large language models are improving rapidly; to date, this improvement has largely been measured via academic benchmarks. These benchmarks, such as MMLU and...

Introducing Llama2-70B-Chat with MosaicML Inference

Llama2-70B-Chat is a leading AI model for text completion, comparable with ChatGPT in terms of quality. Today, organizations can leverage this state-of-the-art model...

Announcing MPT-7B-8K: 8K Context Length for Document Understanding

July 18, 2023 by Sam Havens and Erica Ji Yuen in
Today, we are releasing MPT-7B-8K, a 7B parameter open-source LLM with 8k context length trained with the MosaicML platform. MPT-7B-8K was pretrained starting...

How We Trained Stable Diffusion for Less than $50k (Part 3)

In our previous blog post, we showed how we used the MosaicML platform, Streaming datasets, and the Composer library to train a Stable...

Training Stable Diffusion from Scratch for <$50k with MosaicML (Part 2)

We've replicated Stable Diffusion 2 for less than $50k, and we've open-sourced the training code so you can too! This is a 3x...

MosaicBERT: Pretraining BERT from Scratch for $20

With the MosaicBERT architecture + training recipe, you can now pretrain a competitive BERT-Base model from scratch on the MosaicML platform for $20...

MosaicML StreamingDataset: Fast, Accurate Streaming of Training Data from Cloud Storage

Loading your training data becomes an escalating challenge as datasets grow bigger in size and the number of nodes scales. We built StreamingDataset...

5x Faster Image Segmentation Training with MosaicML Recipes

Can't stop, won't stop. Earlier this year, we shared a new baseline for semantic segmentation (basically, classifying an image at the pixel level)...

MosaicML Delivers Leading NLP Performance in MLPerf v2.1

MosaicML leads the MLPerf NLP results, delivering a score of 7.9 minutes on 8x NVIDIA A100 GPUs in the Open Division, thanks to...

Farewell, CUDA OOM: Automatic Gradient Accumulation

June 23, 2022 by Mihir Patel and Erica Ji Yuen in
With automatic gradient accumulation, Composer lets users seamlessly change GPU types and number of GPUs without having to worry about batch size. CUDA...