Skip to main content
<
Page 9
>

Serving Quantized LLMs on NVIDIA H100 Tensor Core GPUs

Quantization is a technique for making machine learning models smaller and faster. We quantize Llama2-70B-Chat, producing an equivalent-quality model that generates 2.2x more...

Building and Customizing GenAI with Databricks: LLMs and Beyond

Generative AI has opened new worlds of possibilities for businesses and is being emphatically embraced across organizations. According to a recent MIT Tech...

LLM Training and Inference with Intel Gaudi 2 AI Accelerators

January 4, 2024 by Abhi Venigalla and Daya Khudia in
At Databricks, we want to help our customers build and deploy generative AI applications on their own data without sacrificing data privacy or...

Integrating NVIDIA TensorRT-LLM with the Databricks Inference Stack

Over the past six months, we've been working with NVIDIA to get the most out of their new TensorRT-LLM library. TensorRT-LLM provides an easy-to-use Python interface to integrate with a web server for fast, efficient inference performance with LLMs. In this post, we're highlighting some key areas where our collaboration with NVIDIA has been particularly important.

Patronus AI: Using LLMs to Detect Business-Sensitive Information

November 1, 2023 by Emily Hutson in
EnterprisePII is a first-of-its-kind large language model (LLM) data set aimed at detecting business-sensitive information. The challenge of detecting and redacting sensitive business...

Training LLMs at Scale with AMD MI250 GPUs

October 30, 2023 by Abhi Venigalla in
Introduction Four months ago, we shared how AMD had emerged as a capable platform for generative AI and demonstrated how to easily and...

LLM Training on Unity Catalog data with MosaicML Streaming Dataset

Introduction Large Language Models (LLMs) have given us a way to generate text, extract information, and identify patterns in industries from healthcare to...

LLM Inference Performance Engineering: Best Practices

In this blog post, the MosaicML engineering team shares best practices for how to capitalize on popular open source large language models (LLMs)...

Introducing Llama2-70B-Chat with MosaicML Inference

Llama2-70B-Chat is a leading AI model for text completion, comparable with ChatGPT in terms of quality. Today, organizations can leverage this state-of-the-art model...

End-to-End Secure Evaluation of Code Generation Models

With MosaicML, you can now evaluate LLMs and Code Generation Models on code generation tasks (such as HumanEval, with MBPP and APPS coming...