Skip to main content
Page 1

Fine-tuning Llama 3.1 with Long Sequences

We are excited to announce that Mosaic AI Model Training now supports the full context length of 131K tokens when fine-tuning the Meta...

Training MoEs at Scale with PyTorch and Databricks

Mixture-of-Experts (MoE) has emerged as a promising LLM architecture for efficient training and inference. MoE models like DBRX , which use multiple expert...