Skip to main content

Mosaic AI Foundation Model Serving

Two ways to purchase

Access and query state-of-the-art open foundation models and use them to quickly and easily build applications that leverage a high-quality generative AI model without maintaining your own model deployment.

Select plan

help me choose

Select cloud

Select model

Select
Loading...

Foundation Model Serving DBU rates and Throughput

ModelPay-Per-Token ServingProvisioned Throughput serving
DBU / 1M INPUT tokens
(Global)
DBU / 1M OUTPUT tokens
(Global)
DBU rate
(Global)
Throughput Band1
(max tokens / sec)2
Llama 3 70B14.286 42.857212.143670
DBRX 32.143 96.429212.143 600
Llama 2 70B 28.571 28.571 157.143 635
Mixtral 8x7B 21.429 21.429 290.857 1,700
Llama 3 8B 3.571 10.714 106.000 3,600
MPT 30B 14.286 14.286 112.000 580
Llama 2 13B 13.571 13.571 78.571 1,580
MPT 7B 7.143 7.143 20.000 2,450
BGE Large 1.429 1.429N/AN/A

1: Throughput band is a model-specific maximum throughput (tokens per second) provided at the above per-hour price.  With Provisioned Throughput Serving, model throughput is provided in increments of its specific "throughput band"; higher model throughput will require the customer to set an appropriate multiple of the throughput band which is then charged at the multiple of the per-hour price above.

2: Shown for serving on Azure.  Some  numbers are different on AWS when charged at a different price.

Pay-Per-Token Serving Pricing Examples

ModelInput tokensOutput tokensRegionUnit price
$ / DBU
Total Price
DBRX4,000,0001,000,000US East$0.070$15.75
Llama 2 70B4,000,0001,000,000US East$0.070$10.00
Mixtral 8x7B4,000,0001,000,000AP (Sydney)$0.088$9.43

Provisioned Throughput Serving Pricing Examples

ModelHours / monthRegionUnit price
$ / DBU
Monthly Price3
DBRX720US East$0.070$10,692
Llama 2 70B720US East$0.070$7,920
Mixtral 8x7B720AP (Sydney)$0.088$18,429

3: Per throughput band

Pay as you go with a 14-day free trial or contact us for committed-use discounts or custom requirements.

Mosaic AI Model Serving FAQ

Our regional prices are based on the regional cost of infrastructure supporting our serverless products.