NEWMEN
← back to all models

OpenAI: gpt-oss-20b

openai/gpt-oss-20b

OpenAIclosed-weight131K context12 providersIntelligence 62.0· est.

Cheapest provider

$0.03 / 1M

DeepInfra

Fastest provider (p95)

2464 tok/s

Groq

Intelligence (estimated)

62.0

Family + generation + popularity + price tier

Per-provider performance

Latency / throughput / uptime measured across providers over the last 30 minutes of live traffic. Atlas’s router weighs these per call (with the eval-gate signal) when picking a variant for Standard and Batch tiers.

ProviderQuantInput $/1MOutput $/1MLatency p50 / p95Throughput p50 / p95Uptime 30mSuccess
DeepInfrafull· bf16$0.0300$0.1400413ms / 10679ms27 / 30 tok/s99.99%100.0%
Novitaq4· fp4$0.0400$0.1500616ms / 4099ms89 / 110 tok/s99.96%99.9%
Parasailq4· fp4$0.0400$0.2000409ms / 3563ms117 / 159 tok/s99.99%99.9%
SiliconFlowq8· fp8$0.0400$0.18001933ms / 7760ms34 / 95.4 tok/s99.93%95.6%
Togetherfull· unknown$0.0500$0.2000526ms / 3902ms117 / 190 tok/s100.00%100.0%
WandBq4· fp4$0.0500$0.2000237ms / 980ms282 / 440 tok/s100.00%100.0%
Amazon Bedrockfull· unknown$0.0700$0.15001383ms / 5660ms100 / 235 tok/s44.03%97.8%
Amazon Bedrockfull· unknown$0.0700$0.15001383ms / 5660ms100 / 235 tok/s99.47%97.8%
Fireworksfull· unknown$0.0700$0.3000802ms / 8338ms50 / 84 tok/s94.54%94.0%
Googlefull· unknown$0.0700$0.2500526ms / 1679ms22 / 87 tok/s99.91%99.9%
Groqfull· unknown$0.0750$0.3000166ms / 1845ms447 / 2463.7 tok/s99.95%99.6%
NextBitq8· fp8$0.1000$0.4500686ms / 2610ms115 / 202 tok/s99.37%100.0%

“—” means live telemetry hasn’t accumulated enough recent traffic for that endpoint. “undisclosed” means the provider serves the model but doesn’t expose the quantization label (typically running fp8 / int8 internally).

Intelligence estimate

No public benchmark numbers indexed for this model yet, so the leaderboard score is derived from the catalogue data we sync: model family standing, generation, and live usage rank — applied identically to open- and closed-weight models. The heuristic ceiling sits below confirmed-benchmark frontier models so curated rankings stay clearly on top.

Want a curated score? File a benchmark report and we’ll add it to the next sync.

How Atlas routes OpenAI: gpt-oss-20b

  • Realtime tier — direct passthrough at the upstream’s native precision. Best for hard latency / quality guarantees.
  • Standard tier — Atlas picks the cheapest provider variant whose quantization has stayed green on your operation’s eval gates. For OpenAI: gpt-oss-20b that’s currently DeepInfra at $0.03/1M.
  • Batch tier — async, biggest discount. Roadmapped to use provider batch APIs (OpenAI / Anthropic) where available and queued spot capacity for open-weight workloads.