Generative AI | Databricks Blog

Page 9

Serving Quantized LLMs on NVIDIA H100 Tensor Core GPUs

January 30, 2024 by Nikhil Sardana, Julian Quevedo and Daya Khudia in Mosaic Research

Quantization is a technique for making machine learning models smaller and faster. We quantize Llama2-70B-Chat, producing an equivalent-quality model that generates 2.2x more...

Building and Customizing GenAI with Databricks: LLMs and Beyond

January 22, 2024 by Ari Kaplan, Emily Hutson and Nicolas Pelaez in Generative AI

Generative AI has opened new worlds of possibilities for businesses and is being emphatically embraced across organizations. According to a recent MIT Tech...

LLM Training and Inference with Intel Gaudi 2 AI Accelerators

January 4, 2024 by Abhi Venigalla and Daya Khudia in Mosaic Research

At Databricks, we want to help our customers build and deploy generative AI applications on their own data without sacrificing data privacy or...

Integrating NVIDIA TensorRT-LLM with the Databricks Inference Stack

December 21, 2023 by Linden Li, Megha Agarwal, Kobie Crawford and Daya Khudia in Mosaic Research

Over the past six months, we've been working with NVIDIA to get the most out of their new TensorRT-LLM library. TensorRT-LLM provides an easy-to-use Python interface to integrate with a web server for fast, efficient inference performance with LLMs. In this post, we're highlighting some key areas where our collaboration with NVIDIA has been particularly important.

Patronus AI: Using LLMs to Detect Business-Sensitive Information

November 1, 2023 by Emily Hutson in Mosaic Research

EnterprisePII is a first-of-its-kind large language model (LLM) data set aimed at detecting business-sensitive information. The challenge of detecting and redacting sensitive business...

Training LLMs at Scale with AMD MI250 GPUs

October 30, 2023 by Abhi Venigalla in Mosaic Research

Introduction Four months ago, we shared how AMD had emerged as a capable platform for generative AI and demonstrated how to easily and...

LLM Training on Unity Catalog data with MosaicML Streaming Dataset

October 17, 2023 by Xiaohan Zhang, Maddie Dawson and Karan Jariwala in Mosaic Research

Introduction Large Language Models (LLMs) have given us a way to generate text, extract information, and identify patterns in industries from healthcare to...

LLM Inference Performance Engineering: Best Practices

October 12, 2023 by Megha Agarwal, Asfandyar Qureshi, Nikhil Sardana, Linden Li, Julian Quevedo and Daya Khudia in Mosaic Research

In this blog post, the MosaicML engineering team shares best practices for how to capitalize on popular open source large language models (LLMs)...

Introducing Llama2-70B-Chat with MosaicML Inference

August 24, 2023 by Hagay Lupesko, Margaret Qian, Daya Khudia, Sam Havens, Daniel King and Erica Ji Yuen in Mosaic Research

Llama2-70B-Chat is a leading AI model for text completion, comparable with ChatGPT in terms of quality. Today, organizations can leverage this state-of-the-art model...

End-to-End Secure Evaluation of Code Generation Models

August 10, 2023 by Rishab Parthasarathy in Mosaic Research

With MosaicML, you can now evaluate LLMs and Code Generation Models on code generation tasks (such as HumanEval, with MBPP and APPS coming...