Skip to main content
Page 1

Fine-tuning Llama 3.1 with Long Sequences

We are excited to announce that Mosaic AI Model Training now supports the full context length of 131K tokens when fine-tuning the Meta...

Inference-Friendly Models with MixAttention

Transformer models, the backbone of modern language AI, rely on the attention mechanism to process context when generating output. During inference, the attention...