Skip to main content

Mosaic AI Foundation Model Serving

Access and query state-of-the-art open foundation models and use them to quickly and easily build applications that leverage a high-quality generative AI model without maintaining your own model deployment.

Loading...

* For regional availability: AWS, Azure

Foundation Model Serving DBU rates and Throughput

Model Pay-Per-Token Provisioned Throughput1
DBU / 1M INPUT tokens
(Global)
DBU / 1M OUTPUT tokens
(Global)
DBU / hour
(Global)
Throughput Band2
(max tokens / sec)
Current Models
Llama 3.1 405B 35.714 142.857 600.000 3,400
Llama 3.3 70B 7.143 21.429 342.857 9,500
Llama 3.1 70B n/a n/a 342.857 9,500
Llama 3.1 8B n/a n/a 106.000 19,000
Llama 3.2 3B n/a n/a 92.857 22,000
Llama 3.2 1B n/a n/a 85.714 35,000
DBRX 10.714 32.143 171.429 650
GTE 1.857 n/a 20.000 9,450
BGE Large 1.429 n/a 24.000 11,800
Legacy Models
Llama 3 70B n/a n/a 212.143 1,000
Llama 3 8B n/a n/a 106.000 3,000
Llama 2 70B n/a n/a 290.800 1,200
Llama 2 13B n/a n/a 112.000 980
Mixtral 8x7B 7.143 14.286 290.857 5,000
MPT 30B n/a n/a 112.000 450
MPT 7B n/a n/a 20.000 2,450

1: Throughput shown is an example based on a typical real-time use case with input / output of 3500 / 300 tokens. Actual throughput will vary, depending on the use case, query shape and other factors. Input/output ratios do not apply to embedding models.

2: Throughput band is a model-specific maximum throughput (tokens per second) provided at the above per-hour price.  With Provisioned Throughput Serving, model throughput is provided in increments of its specific "throughput band"; higher model throughput will require the customer to set an appropriate multiple of the throughput band which is then charged at the multiple of the per-hour price above.

Pay-Per-Token Serving Pricing Examples

Model Input tokens Output tokens Region Unit price
$ / DBU
Total Price
Llama 3.1 405B 4,000,000 1,000,000 US East $0.070 $20.00
Llama 3.3 70B 4,000,000 1,000,000 US East $0.070 $3.50
DBRX 4,000,000 1,000,000 AP (Sydney) $0.088 $6.60

Provisioned Throughput Serving Pricing Examples

Model Throughput bands Hours / month Region Unit price
$ / DBU
Monthly Price
Llama 3.1 405B 1 720 US East $0.070 $30,240
Llama 3.3 70B 1 720 US East $0.070 $17,280
Llama 3.1 8B 1 720 US East $0.070 $5,342
Llama 3.2 3B 2 720 Europe (Ireland) $0.077 $10,296
Llama 3.2 1B 4 720 AP (Sydney) $0.088 $21,723

Pay as you go with a 14-day free trial or contact us for committed-use discounts or custom requirements.

Mosaic AI Foundation Model Serving FAQ