NEWMEN

Blog

How Atlas picks a model

2026-02-01 · 3 min read

Routing is the alternative to prompt engineering. Here's what signals Atlas uses per call — and how those signals sharpen as your operation accumulates traffic.

The usual alternative to a routing layer is prompt engineering: crafting system prompts that steer one model toward the right behavior on your operation. It works on a curated eval set. It breaks on the long tail of production traffic, and it needs re-tuning with every model update. Atlas is the other path: route automatically, let the signal accumulate, and let the routing decisions improve without manual iteration.

When you send a request to model: "atlas", Atlas makes a routing decision before the call hits any provider. That decision happens in milliseconds and factors in several signals. This post explains what those signals are, how they interact, and how the routing improves over time.

The inputs to a routing decision

Atlas evaluates four signals per call:

Operation type. If you've tagged calls with metadata.operation_id, Atlas has a history of which models perform well on that class of task. A structured extraction operation that has accumulated enough signal may route differently from a general-purpose summarization operation.

Latency history. Atlas tracks p50 and p95 latency per provider per operation. For latency-sensitive operations — anything customer-facing with a tight response budget — Atlas weights lower-latency providers more heavily.

Cost per token. For operations where accuracy history is similar across candidates, Atlas will route to the cheaper option. As your token volume grows, this routing decision compounds.

Accuracy on your past traffic. This is the strongest signal when it exists. If your evaluators have scored past responses on this operation, Atlas uses that signal as a prior. A provider that consistently scores above your ship gate threshold gets routing preference.

What happens without signal

For a net-new operation with no history, Atlas routes based on its internal prior — a model of which provider tends to perform well on which task class. This prior is trained across aggregate anonymized signal from all operations on the platform. It is a reasonable default. It is not personalized to you.

The faster you start tagging calls and running evaluators, the faster Atlas builds a per-operation signal profile and the more confidently it can route for your workload specifically.

Passthrough routing

If you specify a model directly — model: "openai/chatgpt-5.5", model: "anthropic/claude-sonnet-4" — Atlas bypasses its routing layer and sends the call to that provider directly. The call is still recorded. The response is still eligible for tagging and dataset inclusion. The reliability loop runs regardless of which model handled the call.

This matters: you can mix routing modes within an operation. Some calls go to Atlas for routing decisions; some go directly to a provider you've specified. Both feed the same dataset.

How routing improves over time

As you tag corrections and promote datasets, Atlas does two things:

First, it updates its routing prior for your operation based on accuracy signal. Providers that perform well on your corrected examples gain routing weight.

Second — and this is the compounding part — it trains task-specific LoRA adapters on your golden dataset. Once an adapter exists for an operation, Atlas can route to it: smaller, cheaper, and more accurate for that specific task than the general model. The adapter becomes the default route for that operation.

The first adapter takes weeks of correction accumulation to train. Subsequent versions train faster because the prior dataset is already clean. The accuracy curve on a well-tagged operation is non-linear.

What about provider outages

Atlas tracks provider availability in real time. If a provider is returning errors or high latency, Atlas will route around it to the next-best option for that operation. You can configure fallback behavior per operation in the console. If you have specified a model directly (model: "openai/chatgpt-5.5") and that provider goes down, the call fails — there is no automatic fallback on direct passthrough, by design.

The cost picture

Routing to Atlas does not add a per-call fee on top of provider costs. The Newmen rate covers routing, recording, and the platform overhead. Third-party models at cost + 10% are transparent — you see the provider rate and the Newmen markup separately in the console.

As your adapters mature, the routed option becomes cheaper than the general model because adapters are smaller. The accuracy improvement and the cost improvement are the same thing: a smaller, better-targeted model doing less work to get the right answer.