Blog

How Atlas picks a model

2026-02-01 · 3 min read

Atlas is the optimization layer, not a model. Here's what it conditions on per call to serve the cheapest path that holds quality — and how that sharpens as your operation accumulates traffic.

The usual way to cut an inference bill is to manually pick a cheaper model and hope it holds up — or to prompt-engineer one model into behaving on your operation. Both break on the long tail of production traffic, and both leave you guessing whether the cheaper path actually worked. Atlas is the other path: it's the optimization layer that serves each call the cheapest way that holds quality, and it proves the result on your own traffic.

When you send a request to model: "atlas-1", Atlas makes an optimization decision before the call hits any provider. That decision happens in milliseconds and conditions on several signals. This post explains what those signals are and how the decision sharpens over time. (atlas-1 is a selector for Atlas mode — not a trained model.)

What Atlas conditions on per call

Atlas weighs four signals per call:

Operation type. If you've tagged calls with metadata.operation_id, Atlas has a history of which paths hold quality on that class of task. A structured-extraction operation may be served differently from a general-purpose summarization operation.

Latency history. Atlas tracks p50 and p95 latency per provider per operation. For latency-sensitive operations, it weights lower-latency paths more heavily.

Cost. Where quality history is comparable across candidates, Atlas serves the cheaper one. As your token volume grows, that decision compounds into real savings.

Verified quality on your past traffic. This is the strongest signal when it exists. If your evaluators have scored past responses on this operation, Atlas uses that as a prior — a path that consistently clears your bar gets preference. This is the verification that makes "cheaper" safe.

What happens without signal

For a net-new operation with no history, Atlas starts cautious — at least 5% off from your first call — and serves more calls the cheaper way as it sees more of your traffic. This is the Trust Ramp. The faster you tag calls and run evaluators, the faster Atlas builds a per-operation quality profile and the more of your traffic it can serve cheaply with confidence.

Strict mode

If you pin a model in Strict mode — the same model, just sourced cheaper — Atlas honors it as a pure pass-through with no substitutions. The call is still recorded and still eligible for tagging and evaluation, so you keep full verification even when you're not auto-optimizing. You can mix modes within an operation: some calls auto-optimized, some pinned.

How the decision sharpens over time

As you tag corrections and run evaluators, Atlas updates its quality prior for your operation: paths that hold up on your corrected examples earn more of your traffic, and your verified savings climb. This is the compounding part — the optimization engine learns which route holds quality on your workload, never a language model trained on your data.

For teams that want to go further, per-tenant tuning is available as a separate, explicit opt-in (the Reliability Loop) with its own terms. It is never the default and never something applied to your data quietly.

What about provider outages

Atlas tracks provider availability in real time. If a provider is erroring or slow, Atlas serves the next-best path that still holds quality for that operation. In Strict mode, a pinned provider going down fails the call rather than silently substituting — by design.

The cost picture

Atlas mode doesn't add a per-call fee on top of provider costs; we make money on the spread, and only when you save. You never pay more than going direct, and every call shows the served model, the provider, and what you saved — reconcilable against your provider bill. Cheaper and verified are the same thing here: the cheapest path that your evaluators say still holds quality.

← All posts