Fine-tuning Llama 3.1 with Long SequencesSeptember 19, 2024 by Saaketh Narayan, Irene Dea, Brian Chu, Shashank Rajput and Vitaliy Chiley in Generative AI We are excited to announce that Mosaic AI Model Training now supports the full context length of 131K tokens when fine-tuning the Meta...
Training MoEs at Scale with PyTorch and DatabricksJuly 1, 2024 by Brian Chu, Mihir Patel, Vitaliy Chiley and Evan Racah in Mosaic Research Mixture-of-Experts (MoE) has emerged as a promising LLM architecture for efficient training and inference. MoE models like DBRX , which use multiple expert...