NEWMEN
← back to all models

OpenAI: gpt-oss-120b

openai/gpt-oss-120b

OpenAIclosed-weight131K context19 providersIntelligence 81.3

Cheapest provider

$0.04 / 1M

DeepInfra

Fastest provider (p95)

2613 tok/s

Cerebras

Intelligence (composite)

81.3

MMLU-Pro · HumanEval · math · GPQA

Per-provider performance

Latency / throughput / uptime measured across providers over the last 30 minutes of live traffic. Atlas’s router weighs these per call (with the eval-gate signal) when picking a variant for Standard and Batch tiers.

ProviderQuantInput $/1MOutput $/1MLatency p50 / p95Throughput p50 / p95Uptime 30mSuccess
DeepInfrafull· bf16$0.0390$0.1900409ms / 10066ms220 / 445.7 tok/s100.00%42.7%
DekaLLMfull· bf16$0.0390$0.1800588ms / 3369ms9 / 36 tok/s98.95%97.9%
Novitaq4· fp4$0.0500$0.2500554ms / 1613ms33 / 228 tok/s99.79%100.0%
SiliconFlowq8· fp8$0.0500$0.45001981ms / 10515ms9 / 20 tok/s58.43%100.0%
Googlefull· unknown$0.0900$0.3600692ms / 2165ms97 / 154 tok/s99.93%100.0%
BaseTenq4· fp4$0.1000$0.5000242ms / 844ms214 / 317 tok/s99.95%99.1%
Parasailq4· fp4$0.1000$0.7500460ms / 1170ms76 / 95 tok/s100.00%93.8%
Phalafull· unknown$0.1000$0.4900983ms / 1831ms79 / 110 tok/s99.94%99.6%
SambaNovafull· unknown$0.1400$0.9500677ms / 3183ms136 / 677 tok/s99.58%70.2%
Amazon Bedrockfull· unknown$0.1500$0.60001546ms / 7909ms157 / 241 tok/s100.00%100.0%
Amazon Bedrockfull· unknown$0.1500$0.60001546ms / 7909ms157 / 241 tok/s25.60%100.0%
Ambientfull· unknown$0.1500$0.6000483ms / 1478ms72 / 138 tok/s99.96%97.9%
DeepInfrafull· bf16$0.1500$0.6000409ms / 10066ms220 / 445.7 tok/s99.75%42.7%
Groqfull· unknown$0.1500$0.6000170ms / 1000ms382.5 / 756 tok/s99.97%98.5%
Marafull· unknown$0.1500$0.7500640ms / 3004ms140 / 475.6 tok/s99.94%83.6%
Nebiusq4· fp4$0.1500$0.6000233ms / 10121ms107 / 221 tok/s100.00%95.5%
Togetherfull· unknown$0.1500$0.6000780ms / 2323ms65 / 126 tok/s99.10%99.8%
WandBq4· fp4$0.1500$0.6000408ms / 1014ms118 / 134 tok/s100.00%100.0%
Cerebrasfull· fp16$0.3500$0.7500194ms / 711ms990 / 2612.9 tok/s100.00%96.4%

“—” means live telemetry hasn’t accumulated enough recent traffic for that endpoint. “undisclosed” means the provider serves the model but doesn’t expose the quantization label (typically running fp8 / int8 internally).

Intelligence breakdown

Composite score is a weighted average of public benchmarks (30% MMLU-Pro, 25% code pass@1, 25% math, 20% GPQA). Numbers come from model cards and the Artificial Analysis intelligence harness; missing components are renormalised over what’s present.

MMLU-Pro

broad reasoning

Code

71.0

pass@1 (HumanEval / LiveCodeBench)

AIME 2025

92.5

math accuracy

GPQA Diamond

80.1

hard reasoning

Source: OpenAI gpt-oss model card (GPQA-Diamond 80.1, AIME25 92.5 no-tools, HumanEval ~71); arxiv.org/abs/2508.10925

How Atlas routes OpenAI: gpt-oss-120b

  • Realtime tier — direct passthrough at the upstream’s native precision. Best for hard latency / quality guarantees.
  • Standard tier — Atlas picks the cheapest provider variant whose quantization has stayed green on your operation’s eval gates. For OpenAI: gpt-oss-120b that’s currently DeepInfra at $0.04/1M.
  • Batch tier — async, biggest discount. Roadmapped to use provider batch APIs (OpenAI / Anthropic) where available and queued spot capacity for open-weight workloads.