Thousands of enterprises already use Llama models on the Databricks Data Intelligence Platform to power AI applications, agents, and workflows. Today, we’re excited to partner with Meta to bring you their latest model series—Llama 4—available today in many Databricks workspaces and rolling out across AWS, Azure, and GCP.
Llama 4 marks a major leap forward in open, multimodal AI—delivering industry-leading performance, higher quality, larger context windows, and improved cost efficiency from the Mixture of Experts (MoE) architecture. All of this is accessible through the same unified REST API, SDK, and SQL interfaces, making it easy to use alongside all your models in a secure, fully governed environment.
The Llama 4 models raise the bar for open foundation models—delivering significantly higher quality and faster inference compared to any previous Llama model.
At launch, we’re introducing Llama 4 Maverick, the largest and highest-quality model from today’s release from Meta. Maverick is purpose-built for developers building sophisticated AI products—combining multilingual fluency, precise image understanding, and safe assistant behavior. It enables:
And you can now build all of this with significantly better performance. Compared to Llama 3.3 (70B), Maverick delivers:
Coming soon to Databricks is Llama 4 Scout—a compact, best-in-class multimodal model that fuses text, image, and video from the start. With up to 10 million tokens of context, Scout is built for advanced long-form reasoning, summarization, and visual understanding.
“With Databricks, we could automate tedious manual tasks by using LLMs to process one million+ files daily for extracting transaction and entity data from property records. We exceeded our accuracy goals by fine-tuning Meta Llama and, using Mosaic AI Model Serving, we scaled this operation massively without the need to manage a large and expensive GPU fleet."— Prabhu Narsina, VP Data and AI, First American
Connect Llama 4 to your enterprise data using Unity Catalog-governed tools to build context-aware agents. Retrieve unstructured content, call external APIs, or run custom logic to power copilots, RAG pipelines, and workflow automation. Mosaic AI makes it easy to iterate, evaluate, and improve these agents with built-in monitoring and collaboration tools—from prototype to production.
Apply Llama 4 at scale—summarizing documents, classifying support tickets, or analyzing thousands of reports—without needing to manage any infrastructure. Batch inference is deeply integrated with Databricks workflows, so you can use SQL or Python in your existing pipeline to run LLMs like Llama 4 directly on governed data with minimal overhead.
Customize Llama 4 to better fit your use case—whether it’s summarization, assistant behavior, or brand tone. Use labeled datasets or adapt models using techniques like Test-Time Adaptive Optimization (TAO) for faster iteration without annotation overhead. Reach out to your Databricks account team for early access.
“With Databricks, we were able to quickly fine-tune and securely deploy Llama models to build multiple GenAI use cases like a conversation simulator for counselor training and a phase classifier for maintaining response quality. These innovations have improved our real-time crisis interventions, helping us scale faster and provide critical mental health support to those in crisis.”— Matthew Vanderzee, CTO, Crisis Text Line
Ensure safe, compliant model usage with Mosaic AI Gateway, which adds built-in logging, rate limiting, PII detection, and policy guardrails—so teams can scale Llama 4 securely like any other model on Databricks.
We’re launching Llama 4 in phases, starting with Maverick on Azure, AWS, and GCP. Coming soon:
As we expand support, you'll be able to pick the best Llama model for your workload—whether it's ultra-long context, high-throughput jobs, or unified text-and-vision understanding.
Llama 4 will be rolling out to your Databricks workspaces over the next few days.