How Long Should You Train Your Language Model?July 19, 2024 by Nikhil Sardana, Jacob Portes and Sasha Doubov in Mosaic Research How long should you train your language model? How large should your model be? In today's generative AI landscape, these are multi-million dollar...
Serving Quantized LLMs on NVIDIA H100 Tensor Core GPUsJanuary 30, 2024 by Nikhil Sardana, Julian Quevedo and Daya Khudia in Mosaic Research Quantization is a technique for making machine learning models smaller and faster. We quantize Llama2-70B-Chat, producing an equivalent-quality model that generates 2.2x more...
LLM Inference Performance Engineering: Best PracticesOctober 12, 2023 by Megha Agarwal, Asfandyar Qureshi, Nikhil Sardana, Linden Li, Julian Quevedo and Daya Khudia in Mosaic Research In this blog post, the MosaicML engineering team shares best practices for how to capitalize on popular open source large language models (LLMs)...
MosaicML Delivers Leading NLP Performance in MLPerf v2.1November 9, 2022 by Daya Khudia, Nikhil Sardana, Sam Havens, Alex Trott and Erica Ji Yuen in Mosaic Research MosaicML leads the MLPerf NLP results, delivering a score of 7.9 minutes on 8x NVIDIA A100 GPUs in the Open Division, thanks to...