Skip to main content
Page 1

Fine-tuning Llama 3.1 with Long Sequences

We are excited to announce that Mosaic AI Model Training now supports the full context length of 131K tokens when fine-tuning the Meta...

Inference-Friendly Models with MixAttention

Transformer models, the backbone of modern language AI, rely on the attention mechanism to process context when generating output. During inference, the attention...

Training MoEs at Scale with PyTorch and Databricks

Mixture-of-Experts (MoE) has emerged as a promising LLM architecture for efficient training and inference. MoE models like DBRX , which use multiple expert...

Bringing MegaBlocks to Databricks

At Databricks, we’re committed to building the most efficient and performant training tools for large-scale AI models. With the recent release of DBRX...

Benchmarking Large Language Models on NVIDIA H100 GPUs with CoreWeave (Part 1)

April 27, 2023 by Daya Khudia and Vitaliy Chiley in
Benchmarking Large Language Models on NVIDIA H100 GPUs with CoreWeave The research and engineering teams here at MosaicML collaborated with CoreWeave, one of...