Fine-tuning Llama 3.1 with Long SequencesSeptember 19, 2024 by Saaketh Narayan, Irene Dea, Brian Chu, Shashank Rajput and Vitaliy Chiley in Generative AI We are excited to announce that Mosaic AI Model Training now supports the full context length of 131K tokens when fine-tuning the Meta...
Inference-Friendly Models with MixAttentionSeptember 18, 2024 by Shashank Rajput, Ying Sheng (Stanford University), Sean Owen and Vitaliy Chiley in Mosaic Research Transformer models, the backbone of modern language AI, rely on the attention mechanism to process context when generating output. During inference, the attention...