Skip to main content

Announcing GA of AI Model Sharing

Discover, evaluate, install, share and serve AI models within your organization or across clouds, platforms and regions
Share this post
Special thanks to Daniel Benito (CTO, Bitext), Antonio Valderrabanos(CEO, Bitext), Chen Wang (Lead Solution Architect, AI21 Labs), Robbin Jang (Alliance Manager, AI21 Labs) and Alex Godfrey (Partner Marketing Lead, AI21 Labs) for their valuable insights and contributions to this blog

 

We are pleased to share the General Availability of AI Model Sharing within Databricks Delta Sharing and the Databricks Marketplace. This milestone follows the Public Preview announcement in January 2024. Since the Public Preview launch, we have worked with new AI model sharing customers and providers such as BitextAI21 Labs, and Ripple to further simplify AI Model Sharing.

You can easily share and serve AI models securely using Delta Sharing. Sharing could be within your organization or externally across clouds, platforms, and regions. In addition, Databricks Marketplace now has over 75+ AI Models including new industry-specific AI models from John Snow Labs, OLA Krutrim, and Bitext as well as foundation models like Databricks DBRX, Llama 3, AI21 Labs, Mistral and several others. In this blog, we will review the business need for AI model sharing and take a deeper dive into use cases driven by AI21 ’s Jamba 1.5 Mini foundation model and Bitext models.

AI models are also now readily available out-of-the-box from the Unity Catalog, streamlining the process for users to access and deploy models efficiently. This development not only simplifies the user experience but also enhances the accessibility of AI models, supporting seamless integration and deployment across various platforms and regions.

3 benefits of AI Model Sharing

Here are the 3 benefits of AI Model Sharing with Databricks we saw with early adopters and launch partners

  1. Lower Cost:  AI model sharing with Delta Sharing reduces the total cost of ownership by minimizing acquisition, development, and infrastructure expenses. Organizations can access pre-built or third-party models, either Delta Shared or from Databricks Marketplace, cutting initial investment and development time. Sharing models with Delta Sharing across clouds and platforms optimizes infrastructure use, reducing redundancy and expenses while deploying models closer to end-users to minimize latency.
  2. Production Quality: Delta Sharing allows you to acquire models that fit customers’ use cases and augment them with a single platform for the entire AI lifecycle. By sharing models into the Databricks Mosaic AI platform, customers gain access to AI and governance features to productionize any model. This includes end-to-end model development capabilities, from model serving to fine-tuning, along with Unity Catalog's security and management features such as lineage and Lakehouse monitoring, ensuring high confidence in the models and associated data.
  3. Complete Control: When working with third-party models, AI model sharing enables you to have full control over the corresponding models and data sets. Because Delta Sharing allows customers to acquire entire model packages, the model and your data remain in the customer’s infrastructure, under their control. They don’t need to send confidential data to a provider who is serving the model on the customer’s behalf.   

 

So, how does AI Model Sharing work? 

AI Model Sharing is powered by Delta Sharing. Providers can share AI models with customers either directly using Delta Sharing or by listing them on the Databricks Marketplace, which also uses Delta Sharing. 

Delta Sharing makes it easy to use AI models wherever you need them. You can train models anywhere, and then you can use them anywhere without having to manually move them around. The model weights (i.e. parameters that the AI model has learned during training) will be automatically pulled into the serving endpoint (i.e. the place where the model "lives"). This eliminates the need for cumbersome model movement after each model training or fine-tuning, ensuring a single source of truth and streamlining the serving process. For example, customers can train models in the cloud and region that provides the cheapest training infrastructure, and then serve the model in another region closer to the end users to minimize the inference latency (i.e reducing the time it takes for an AI model to process data and provide results).

Databricks Marketplace, powered by Delta Sharing, lets you easily find and use over 75 AI models. You can set up these models as if they're on your local system, and Delta Sharing automatically updates them during deployment or upgrades. You can also customize models with your data for tasks like managing a knowledge base. As a provider, you only need one copy of your model to share it with all your Databricks clients.

What’s the business impact?

Since the Public Preview of AI Model Sharing was announced in Jan 2024, we’ve worked with several customers and partners to ensure that AI Model Sharing delivers significant cost savings for the organizations

 

 "We use Reinforcement learning (RL) models in some of our products. Compared to supervised learning models, RL models have longer training times and many sources of randomness in the training process. These RL models need to be deployed in 3 workspaces in separate AWS regions. With model sharing we can have one RL model available in multiple workspaces without having to retrain it again or without any cumbersome manual steps to move the model."    
— Mihir Mavalankar Machine Learning Engineer, Ripple

AI21 Labs' Jamba 1.5 Mini: Bringing Large Context AI Models to Databricks Marketplace

 

AI21 Labs, a leader in generative AI and large language models, has published Jamba 1.5 Mini, part of the Jamba 1.5 Model Family, on the Databricks Marketplace. Jamba 1.5 Mini by AI21 Labs introduces a novel approach to AI language models for enterprise use. Its innovative hybrid Mamba-Transformer architecture enables a 256K token effective context window, along with exceptional speed and quality. With Mini’s optimization for efficient use of computing, it can handle context lengths of up to 140K tokens on a single GPU.

"AI21 Labs is pleased to announce that Jamba 1.5 Mini is now on the Databricks Marketplace. With Delta Sharing, enterprises can access our Mamba-Transformer architecture, featuring a 256K context window, ensuring exceptional speed and quality for transformative AI solutions"
— Pankaj Dugar, SVP & GM , AI21 Labs

A 256K token effective context window in AI models refers to the model's ability to process and consider 256,000 tokens of text at once. This is significant because it allows the AI21 Models model to handle large and complex data sets, making it particularly useful for tasks that require understanding and analyzing extensive information, such as lengthy documents or intricate data-heavy workflows, and enhancing the retrieval stage of any RAG-based workflow. Jamba’s hybrid architecture ensures the model’s quality does not degrade as context increases, unlike what is typically seen with Transformer-based LLMs’ claimed context windows.

AI21 Labs: Claimed vs Effective Context Window

Check out this video tutorial that demonstrates how to obtain AI21 Jamba 1.5 Mini model from the Databricks Marketplace, fine-tune it, and serve it

Use cases

Jamba 1.5 Mini’s 256k context window means the models can efficiently handle the equivalent of 800 pages of text in a single prompt. Here are a few examples of how Databricks customers in different industries can use these models

  1. Document Processing: Customers can use Jamba 1.5 Mini to quickly summarize long reports, contracts, or research papers. For financial institutions, the models can summarize earnings reports, analyze market trends from lengthy financial documents, or extract relevant information from regulatory filings
  2. Enhancing agentic workflows: For Healthcare providers, the model can assist in complex medical decision-making processes by analyzing multiple patient data sources and providing treatment recommendations.
  3. Improving retrieval-augmented generation (RAG) processes: In RAG systems for retail companies, the models can generate more accurate and contextually relevant responses to customer inquiries by considering a broader range of product information and customer history.

How Bitext Verticalized AI Models on Databricks Marketplace improve customer onboarding

 

Bitext offers pre-trained verticalized models on the Databricks Marketplace. These models are versions of the Mistral-7B-Instruct-v0.2 model fine-tuned for the creation of chatbots, virtual assistants and copilots for the Retail Banking domain, providing customers with fast and accurate answers about their banking needs. These models can be produced for any family of foundation models: GPT, Llama, Mistral, Jamba, OpenELM…

 

Use Case: Improving Onboarding with AI

A leading social trading App was experiencing high dropout rates during user onboarding. It leveraged Bitext's pretrained verticalized Banking models to revamp its onboarding process, transforming static forms into a conversational, intuitive, and personalized user experience. 

 

Bitext shared the verticalized AI model with the customer. Using that model as a base, a data scientist did the initial fine-tuning with customer-specific data, such as common FAQs. This step ensured that the model understood the unique requirements and language of the user base. This was followed by advanced Fine-Tuning with Databricks Mosaic AI. 

 

Once the Bitext model was fine-tuned, it was deployed using Databricks AI Model Serving.

  1. The fine-tuned model was registered in the Unity Catalog 
  2. An endpoint was created.
  3. The model was deployed to the endpoint

The collaboration set a new standard in user interaction within the social finance sector, significantly improving customer engagement and retention. Thanks to the jump-start provided by the shared AI model, the entire implementation was completed within 2 weeks. 

Take a look at the demo that shows how to install and fine-tune Bitext Verticalized AI Model from Databricks Marketplace here   

 

"Unlike generic models that need a lot of training data, starting with a specialized model for a specific industry reduces the data needed to customize it. This helps customers quickly deploy tailored AI models.  We're thrilled about AI Model Sharing. Our customers have experienced up to a 60% reduction in resource costs (fewer data scientists and lower computational requirements) and up to 50% savings in operational disruptions (quicker testing and deployment) with our specialized AI models available on the Databricks Marketplace."  
— Antonio S. Valderrábanos , Founder & CEO, Bitext

Cost Savings of Bitext's 2-Step Model Training Approach

Cost Components

Generic LLM Approach 

Bitext's Verticalized Model on Databricks Marketplace

Cost Savings (%)

Verticalization

High - Extensive fine-tuning for sector & use case

Low - Start with pre-finetuned vertical LLM

60%

Customization with Company Data

Medium - Further fine-tuning required

Low - Specific customization needed

30%

Total Training Time

3-6 months

1-2 months

50-60% reduction

Resource Allocation

High - More data scientists and computational power

Low - Less intensive

40-50%

Operational Disruption

High - Longer integration and testing phases

Low - Faster deployment

50%

Call to Action

Now that AI model sharing is generally available (GA) for both Delta Sharing and new AI models on the Databricks Marketplace, we encourage you to:

 

Try Databricks for free

Related posts

See all Platform Blog posts