Skip to main content
Page 1

Fast, Secure and Reliable: Enterprise-grade LLM Inference

Introduction After a whirlwind year of developments in 2023, many enterprises are eager to adopt increasingly capable generative AI models to supercharge their...

Integrating NVIDIA TensorRT-LLM with the Databricks Inference Stack

Over the past six months, we've been working with NVIDIA to get the most out of their new TensorRT-LLM library. TensorRT-LLM provides an easy-to-use Python interface to integrate with a web server for fast, efficient inference performance with LLMs. In this post, we're highlighting some key areas where our collaboration with NVIDIA has been particularly important.

LLM Inference Performance Engineering: Best Practices

In this blog post, the MosaicML engineering team shares best practices for how to capitalize on popular open source large language models (LLMs)...