OpenAI: gpt-oss-120b

openai/gpt-oss-120b

OpenAIclosed-weight131K context19 providersIntelligence 81.3

Cheapest provider

$0.04 / 1M

DeepInfra

Fastest provider (p95)

2152 tok/s

Cerebras

Intelligence (composite)

81.3

MMLU-Pro · HumanEval · math · GPQA

Per-provider performance

Latency / throughput / uptime / price measured across providers over the last 30 minutes of live traffic. This is what proves “sourced cheapest” — Atlas mode draws on these per call to serve the cheapest path that holds quality.

Provider	Quant	Input $/1M	Output $/1M	Latency p50 / p95	Throughput p50 / p95	Uptime 30m	Success
DeepInfra	full· bf16	$0.0390	$0.1900	657ms / 10623ms	82 / 405.8 tok/s	98.66%	59.0%
DekaLLM	full· bf16	$0.0390	$0.1800	509ms / 1530ms	5 / 39.1 tok/s	96.09%	78.9%
Novita	q4· fp4	$0.0500	$0.2500	450ms / 1027ms	97 / 198 tok/s	100.00%	100.0%
SiliconFlow	q8· fp8	$0.0500	$0.4500	1416ms / 4011ms	17 / 38 tok/s	92.56%	92.2%
Google	full· unknown	$0.0900	$0.3600	314ms / 833ms	221 / 658 tok/s	100.00%	100.0%
BaseTen	q4· fp4	$0.1000	$0.5000	185ms / 891ms	252 / 397 tok/s	100.00%	100.0%
DigitalOcean	full· unknown	$0.1000	$0.7000	513ms / 1506ms	127 / 162 tok/s	99.72%	99.7%
Parasail	q4· fp4	$0.1000	$0.7500	319ms / 655ms	252 / 430.2 tok/s	100.00%	99.9%
SambaNova	full· unknown	$0.1400	$0.9500	486ms / 4269ms	177 / 430 tok/s	97.83%	85.2%
Amazon Bedrock	full· unknown	$0.1500	$0.6000	—	—	98.53%	—
Amazon Bedrock	full· unknown	$0.1500	$0.6000	—	—	—	—
DeepInfra	full· bf16	$0.1500	$0.6000	657ms / 10623ms	82 / 405.8 tok/s	98.18%	59.0%
Groq	full· unknown	$0.1500	$0.6000	104ms / 649ms	443 / 855.4 tok/s	99.99%	100.0%
Mara	full· unknown	$0.1500	$0.7500	943ms / 2413ms	89.5 / 460.9 tok/s	99.10%	78.6%
Nebius	q4· fp4	$0.1500	$0.6000	207ms / 1221ms	308 / 489.3 tok/s	100.00%	100.0%
Phala	full· unknown	$0.1500	$0.6000	1156ms / 2222ms	91 / 122 tok/s	100.00%	100.0%
Together	full· unknown	$0.1500	$0.6000	307ms / 1501ms	62 / 112 tok/s	99.80%	99.9%
WandB	q4· fp4	$0.1500	$0.6000	274ms / 565ms	50 / 114 tok/s	100.00%	100.0%
Cerebras	full· fp16	$0.3500	$0.7500	196ms / 652ms	694 / 2152 tok/s	99.97%	99.4%

“—” means live telemetry hasn’t accumulated enough recent traffic for that endpoint. “undisclosed” means the provider serves the model but doesn’t expose the quantization label (typically running fp8 / int8 internally).

Intelligence breakdown

Composite score is a weighted average of public benchmarks (30% MMLU-Pro, 25% code pass@1, 25% math, 20% GPQA). Numbers come from model cards and the Artificial Analysis intelligence harness; missing components are renormalised over what’s present.

MMLU-Pro

—

broad reasoning

Code

71.0

pass@1 (HumanEval / LiveCodeBench)

AIME 2025

92.5

math accuracy

GPQA Diamond

80.1

hard reasoning

Source: OpenAI gpt-oss model card (GPQA-Diamond 80.1, AIME25 92.5 no-tools, HumanEval ~71); arxiv.org/abs/2508.10925

How Atlas mode sources OpenAI: gpt-oss-120b

Strict mode — pin OpenAI: gpt-oss-120b exactly and we pass it straight through, sourced from the cheapest provider above. The same model, no substitutions — currently DeepInfra at $0.04/1M.
Atlas mode — the default. Each call is auto-optimized for the cheapest path that holds quality, at least 5% off going direct from call one and climbing as it ramps. You always see which model served the call and exactly what you saved — thumbs-down anything you don’t like for a full refund.

Why Atlas mode is the default →Try it on your prompts →See the leaderboard →