Large Language Models (LLMs)
What are Large Language Models (LLMs)?
Large language models (LLMs) are a new class of natural language processing (NLP) models that have significantly surpassed their predecessors in performance and ability in a variety of tasks such as answering open-ended questions, chat, content summarization, execution of near-arbitrary instructions, translation as well as content and code generation. LLMs are trained from massive data sets using advanced machine learning algorithms to learn the patterns and structures of human language.
Here’s more to explore
The Big Book of MLOps
A must-read for ML engineers and data scientists seeking a better way to do MLOps.
Tap the Potential of LLMs
How to boost efficiency and reduce costs with AI.
Databricks Named a Leader in New Report
Databricks is a Leader in the 2024 Gartner®️ Magic Quadrant™️ for Data Science and Machine Learning Platforms.
How do large language models (LLMs) work?
Large language models or LLMs typically have three architectural elements:
- Encoder: After a tokenizer converts large amounts of text into tokens, which are numerical values, the encoder creates meaningful embeddings of tokens that put words with similar meanings close together in vector space.
- Attention mechanisms: These algorithms are used in LLMs that enable the model to focus on specific parts of the input text, for related words of text. This is not separate from the encoder and decoder.
- Decoder: The tokenizer converts the tokens back into words so we can understand. In this process, the LLM predicts the next word, and the next word, for millions of words. Once the models complete their training process, they can now accomplish new tasks such as answering questions, doing language translations, semantic search and more.
How do LLMs work?
A simplified version of the LLM training process
Learn more about transformers, the foundation of every LLM
What is the history of large language models (LLMs)?
The techniques used in LLMs are a culmination of research and work in the field of artificial intelligence that originated in the 1940s.
1940s
The first scientific paper on neural networks was published in 1943.
1989
A scientific paper was published by Lecun on digit recognition showing a back propagation network can be applied to image-recognition problems.
2012
A paper from Hinton et al showed deep neural networks significantly outperforming any previous models for speech recognition.
A convolutional neural network (AlexNet) halved the existing error rate on Imagenet visual recognition, becoming the first to break 75% accuracy. It highlighted new techniques, including the use of GPUs to train models.
2017
The ground-breaking paper, “Attention is All you Need,” introduced the transformer architecture which is the underlying architecture for all LLM models.
2018
Google introduces BERT (Bidirectional Encoder Representations from Transformers), which is a big leap in architecture and paves the way for future large language models.
2020
OpenAI releases GPT-3, which becomes the largest model at 175B parameters and sets a new performance benchmark for language-related tasks.
2022
ChatGPT is launched, which turns GPT-3 and similar models into a service that is widely accessible to users through a web interface and kicks off a huge increase in public awareness of LLMs and generative AI.
2023
Open source LLMs show increasingly impressive results with releases such as LLaMA 2, Falcon and MosaicML MPT. GPT-4 was also released, setting a new benchmark for both parameter size and performance.
What are the use cases for LLMs?
LLMs can drive business impact across use cases and different industries. Example use cases include:
- Chatbots and virtual assistants: LLMs are used to power chatbots to give customers and employees the ability to have open-ended conversations to help on customer support, website lead follow-up, and be a personal assistant.
- Code generation and debugging: LLMs can generate useful code snippets, identify and fix errors in code and complete programs based on input instructions.
- Sentiment analysis: LLMs can automatically understand the sentiment of a piece of text to automate understanding of customer satisfaction.
- Text classification and clustering: LLMs can organize, categorize and sort large volumes of data to identify common themes and trends to support informed decision-making.
- Language translation: LLMs can translate documents and web pages into different languages.
- Summarization and paraphrasing: LLMs can summarize papers, articles, customer calls or meetings and surface the most important points.
- Content generation: LLMs can develop an outline or write new content that can be a good first draft to build from.
What are customer examples where LLMs have been deployed effectively?
JetBlue
JetBlue has deployed “BlueBot,” a chatbot that uses open source generative AI models complemented by corporate data, powered by Databricks. This chatbot can be used by all teams at JetBlue to get access to data which is governed by role. For example, the finance team can see data from SAP and regulatory filings, but the operations team will only see maintenance information.
Chevron Phillips
Chevron Phillips Chemical uses Databricks to support their generative AI initiatives, including document process automation.
Thrivent Financial
Thrivent Financial is looking at generative AI to make search better, produce better summarized and more accessible insights and improve the productivity of engineering.
Why are large language models (LLMs) suddenly becoming popular?
There are many recent technological advancements that have propelled LLMs into the spotlight:
- Advancement of machine learning technologies
- LLMs utilize many advancements in ML techniques. The most notable is the transformer architecture which is the underlying architecture for most LLM models.
- Increased accessibility
- The release of ChatGPT opened the door for anyone with internet access to interact with one of the most advanced LLMs through a simple web interface. This lets the world understand the power of LLMs.
- Increased computational power
- The availability of more powerful computing resources, like graphics processing units (GPUs), and better data processing techniques allowed researchers to train much larger models.
- Quantity and quality of training data
- The availability of large data sets and the ability to process them have improved model performance dramatically. For example, GPT-3 was trained on big data (about 500 billion tokens) that included high-quality subsets such as the WebText2 data set (17 million documents), which contains publicly crawled web pages with an emphasis on quality.
How do I customize an LLM with my organization’s data?
There are four architectural patterns to consider when customizing an LLM application with your organization’s data. These techniques are outlined below and are not mutually exclusive. Rather, they can (and should) be combined to take advantage of the strengths of each.
Method | Definition | Primary use case | Data requirements | Advantages | Considerations |
---|---|---|---|---|---|
Crafting specialized prompts to guide LLM behavior | Quick, on-the-fly model guidance | None | Fast, cost-effective, no training required | Less control than fine-tuning | |
Combining an LLM with external knowledge retrieval | Dynamic data sets and external knowledge | External knowledge base or database (e.g., vector database) | Dynamically updated context, enhanced accuracy | Increases prompt length and inference computation | |
Adapting a pre-trained LLM to specific data sets or domains | Domain or task specialization | Thousands of domain-specific or instruction examples | Granular control, high specialization | Requires labeled data, computational cost | |
Training an LLM from scratch | Unique tasks or domain-specific corpora | Large data sets (billions to trillions of tokens) | Maximum control, tailored for specific needs | Extremely resource-intensive |
Regardless of the technique selected, building a solution in a well-structured, modularized manner ensures organizations will be prepared to iterate and adapt. Learn more about this approach and more in The Big Book of MLOps.
What does prompt engineering mean as it relates to large language models (LLMs)?
Prompt engineering is the practice of adjusting the text prompts given to an LLM to elicit more accurate or relevant responses. Not every LLM model will produce the same quality, as prompt engineering is model-specific. Some generalized tips that work for a variety of models include:
- Use clear, concise prompts, which may include an instruction, context (if needed), a user query or input, and a description of the desired output type or format.
- Provide examples in your prompt (“few-shot learning”) to help the LLM understand what you want.
- Tell the model how to behave, such as telling it to admit if it cannot answer a question.
- Tell the model to think step-by-step or explain its reasoning.
- If your prompt includes user input, use techniques to prevent prompt hacking, such as making it very clear which parts of the prompt correspond to your instruction vs. user input.
What does retrieval augmented generation (RAG) mean as it relates to large language models (LLMs)?
Retrieval augmented generation or RAG is an architectural approach that can improve the efficacy of large language model (LLM) applications by leveraging custom data. This is done by retrieving relevant data/documents relevant to a question or task and providing them as context for the LLM. RAG has shown success in support chatbots and Q&A systems that need to maintain up-to-date information or access domain-specific knowledge.
Learn more about RAG here.
What does it mean to fine-tune large language models (LLMs)?
Fine-tuning is the process of adapting a pre-trained LLM on a comparatively smaller data set that is specific to an individual domain or task. During the fine-tuning process, it continues training for a short time, possibly by adjusting a relatively smaller number of weights compared to the entire model.
The term “fine-tuning” can refer to several concepts, with the two most common forms being:
- Supervised instruction fine-tuning: This approach involves continued training of a pre-trained LLM on a data set of input-output training examples — typically conducted with thousands of training examples.
- Continued pre-training: This fine-tuning method does not rely on input and output examples but instead uses domain-specific unstructured text to continue the same pre-training process (e.g., next token prediction, masked language modeling).
What does it mean to pre-train a large language model (LLM)?
Pre-training an LLM model from scratch refers to the process of training a language model on a large corpus of data (e.g., text, code) without using any prior knowledge or weights from an existing model. This is in contrast to fine-tuning, where an already pre-trained model is further adapted to a specific task or data set. The output of full pre-training is a base model that can be directly used or further fine-tuned for downstream tasks. Pre-training is typically the largest and most expensive training tasks one would encounter, and not typical for what most organizations would undertake.
What are the most common LLMs and how are they different?
The field of large language models is crowded with many options to choose from. Generally speaking, you can group LLMs into two categories: proprietary services and open source models.
Proprietary services
The most popular LLM is ChatGPT from OpenAI which was released with much fanfare. ChatGPT provides a friendly search interface where users can feed prompts and typically receive a fast and relevant response. Developers can access the ChatGPT API to integrate this LLM into their own applications, products or services. Other services include Google Bard and Claude from Anthropic.
Open source models
Another option is to self-host an LLM, typically using a model that is open source and available for commercial use. The open source community has quickly caught up to the performance of proprietary models. Popular open source LLM models include Llama 2 from Meta, and MPT from MosaicML (acquired by Databricks).
How to evaluate the best choice
The biggest considerations and differences in approach between using an API from a closed third-party vendor vs. self-hosting your own open source (or fine-tuned) LLM model are future-proofing, managing costs and leveraging your data as a competitive advantage. Proprietary models can be deprecated and removed, breaking your existing pipelines and vector indexes; open source models will be accessible to you forever. Open source and fine-tuned models can offer more choice and tailoring to your application, allowing better performance-cost trade-offs. Planning for future fine-tuning your own models will allow you to leverage your organization’s data as a competitive advantage for building better models than are available publicly. Finally, proprietary models may raise governance concerns as these “black box” LLMs permit less oversight of their training processes and weights.
Hosting your own open source LLM models does require more work than using proprietary LLMs. MLflow from Databricks makes this easier for someone with Python experience to pull any transformer model and use it as a Python object.
How do I choose which LLM to use based on a set of evaluation criteria?
Evaluating LLMs is a challenging and evolving domain, primarily because LLMs often demonstrate uneven capabilities across different tasks. An LLM might excel in one benchmark, but slight variations in the prompt or problem can drastically affect its performance.
Some prominent tools and benchmarks used to evaluate LLM performance include:
- MLflow
- Provides a set of LLMOps tools for model evaluation.
- Mosaic Model Gauntlet
- An aggregated evaluation approach, categorizing model competency into six broad domains (shown below) rather than distilling to a single monolithic metric.
- Hugging Face gathers hundreds of thousands of models from open LLM contributors
- BIG-bench (Beyond the Imitation Game benchmark)
- A dynamic benchmarking framework, currently hosting over 200 tasks, with a focus on adapting to future LLM capabilities.
- EleutherAI LM Evaluation Harness
- A holistic framework that assesses models on over 200 tasks, merging evaluations like BIG-bench and MMLU, promoting reproducibility and comparability.
Also read the Best Practices for LLM Evaluation of RAG Applications.
How do you operationalize the management of large language models (LLMs) via large language model ops (LLMOps)?
Large language model ops (LLMOps) encompasses the practices, techniques and tools used for the operational management of large language models in production environments.
LLMOps allows for the efficient deployment, monitoring and maintenance of large language models. LLMOps, like traditional machine learning ops (MLOps), requires a collaboration of data scientists, DevOps engineers and IT professionals. See more details of LLMOps here.
Where can I find more information about large language models (LLMs)?
There are many resources available to find more information on LLMs, including:
Training
- LLMs: Foundation Models From the Ground Up (EDX and Databricks Training) — Free training from Databricks that dives into the details of foundation models in LLMs
- LLMs: Application Through Production (EDX and Databricks Training) — Free training from Databricks that focuses on how to build LLM-focused applications with the latest and most well-known frameworks
eBooks
Technical blogs
- Best Practices for LLM Evaluation of RAG Applications
- Using MLflow AI Gateway and Llama 2 to Build Generative AI Apps (Achieve greater accuracy using retrieval augmented generation (RAG) with your own data)
- Deploy Your LLM Chatbot With Retrieval Augmented Generation (RAG), llama2-70B (MosaicML Inferences) and Vector Search
- LLMOps: Everything You Need to Know to Manage LLMs
Next steps
- Contact Databricks to schedule a demo and talk to someone about your large language model (LLM) projects
- Read about Databricks’ offerings for LLMs
- Read more about the retrieval augmented generation (RAG) use case (the most common LLM architecture)