Generative AI | Databricks Blog

Page 10

Announcing MPT-7B-8K: 8K Context Length for Document Understanding

July 18, 2023 by Sam Havens and Erica Ji Yuen in Mosaic Research

Today, we are releasing MPT-7B-8K, a 7B parameter open-source LLM with 8k context length trained with the MosaicML platform. MPT-7B-8K was pretrained starting...

Training LLMs with AMD MI250 GPUs and MosaicML

June 30, 2023 by Abhi Venigalla in Mosaic Research

With the release of PyTorch 2.0 and ROCm 5.4, we are excited to announce that LLM training works out of the box on...

MPT-30B: Raising the bar for open-source foundation models

June 22, 2023 by in Mosaic Research

Introducing MPT-30B, a new, more powerful member of our Foundation Series of open-source models, trained with an 8k context length on NVIDIA H100...

Introducing AI2 OLMo (Open Language Model)

June 2, 2023 by Jonathan Frankle in Mosaic Research

Last month, the Allen Institute for AI (AI2) announced the development of an open, state-of-the-art generative language model: AI2 OLMo (Open Language Model)...

Cloudflare R2 and MosaicML: Train LLMs on Any Compute with Zero Switching Costs

May 23, 2023 by Abhinav Venigalla in Mosaic Research

Together, Cloudflare and MosaicML give users the freedom to train LLMs on any compute, anywhere in the world, for faster, cheaper training runs...

Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs

May 5, 2023 by in Mosaic Research

Introducing MPT-7B, the first entry in our MosaicML Foundation Series. MPT-7B is a transformer trained from scratch on 1T tokens of text and...

How We Trained Stable Diffusion for Less than $50k (Part 3)

April 28, 2023 by Mihir Patel, Erica Ji Yuen, Cory Stephenson and Landan Seguin in Mosaic Research

In our previous blog post, we showed how we used the MosaicML platform, Streaming datasets, and the Composer library to train a Stable...

Benchmarking Large Language Models on NVIDIA H100 GPUs with CoreWeave (Part 1)

April 27, 2023 by Daya Khudia and Vitaliy Chiley in Mosaic Research

Benchmarking Large Language Models on NVIDIA H100 GPUs with CoreWeave The research and engineering teams here at MosaicML collaborated with CoreWeave, one of...

Training Stable Diffusion from Scratch for <$50k with MosaicML (Part 2)

April 26, 2023 by Mihir Patel, Cory Stephenson, Landan Seguin, Austin Jacobson and Erica Ji Yuen in Mosaic Research

We've replicated Stable Diffusion 2 for less than $50k, and we've open-sourced the training code so you can too! This is a 3x...

MosaicBERT: Pretraining BERT from Scratch for $20

March 9, 2023 by Jacob Portes, Alex Trott, Daniel King, Sam Havens and Erica Ji Yuen in Mosaic Research

With the MosaicBERT architecture + training recipe, you can now pretrain a competitive BERT-Base model from scratch on the MosaicML platform for $20...