Skip to main content
Page 1

Fast, Secure and Reliable: Enterprise-grade LLM Inference

Introduction After a whirlwind year of developments in 2023, many enterprises are eager to adopt increasingly capable generative AI models to supercharge their...

Integrating NVIDIA TensorRT-LLM with the Databricks Inference Stack

Over the past six months, we've been working with NVIDIA to get the most out of their new TensorRT-LLM library. TensorRT-LLM provides an easy-to-use Python interface to integrate with a web server for fast, efficient inference performance with LLMs. In this post, we're highlighting some key areas where our collaboration with NVIDIA has been particularly important.

Introducing Mixtral 8x7B with Databricks Model Serving

Today, Databricks is excited to announce support for Mixtral 8x7B in Model Serving . Mixtral 8x7B is a sparse Mixture of Experts (MoE)...

LLM Inference Performance Engineering: Best Practices

In this blog post, the MosaicML engineering team shares best practices for how to capitalize on popular open source large language models (LLMs)...

Mosaic LLMs: GPT-3 quality for <$500k

September 29, 2022 by Abhi Venigalla and Linden Li in
Training large language models (LLMs) costs less than you think. Using the MosaicML platform, we show how fast, cheap, and easy it is...

Mosaic LLMs (Part 1): Billion-Parameter GPT Training Made Easy

August 11, 2022 by Abhi Venigalla and Linden Li in
In Part 1 of this LLM blog post series, we use the MosaicML platform to train vanilla GPT-3 models up to 1.3B params...