Blog

Introducing Atlas

2026-05-16 · 3 min read

One env var. The models you already use. Atlas is the optimization layer that serves every call the cheapest way that holds quality — and proves it on your own traffic.

Today we are introducing Atlas and the platform that ships with it. Atlas is not a model. It is the optimization layer for AI inference: you point your existing coding agent at us with one env var, keep the model ids you already use, and every call is served the cheapest way that still holds quality. You pay less than going direct from the first call, and you can refund any answer you don't like.

Why we built an optimization layer, not a model

The model-capability gap is closing. Every major provider ships strong models now. What's missing is a layer that takes the same workload and serves it for less — without you having to wonder whether you got a quietly worse answer. That layer is what nobody was building, and it is what Newmen is.

We kept watching the same pattern: teams paying full provider rates for calls that could have been served cheaper, with no trustworthy way to know whether a cheaper path actually held up. So they overpaid for certainty. Atlas closes that gap by making the cheaper path verifiable instead of a leap of faith.

What Atlas is

Atlas has two modes, set by the model id in your request.

Atlas mode (the default). Pass model: "atlas-1" — a selector, not a trained model — and Atlas auto-optimizes the call for the lowest cost that holds quality on your operation. Even when you pass a normal provider model id, Atlas runs the Trust Ramp and applies an immediate discount. You don't write routing logic. You point and save.

Strict mode. Pin an exact model and endpoint and Atlas honors it as a pure pass-through — the same model, just sourced cheaper. No substitutions, ever. It's the trust anchor and the on-ramp for skeptics and automation.

The two guarantees, in writing

Price. Never more than going direct — and you start saving on call number one. Atlas takes at least 5% off from the first request and climbs as it ramps on your traffic.

Quality. Thumbs-down any call you don't like and we refund it in full — no ticket, no explanation, no argument. Quality is the guaranteed thing, not an asserted one.

There's a migration guarantee too: we migrate you in under a day, or we run free until you're live.

You see the receipts

Every call you make shows the model that served it, the provider, and the price — alongside what the same call would have cost going direct. When any optimization lever fires, the receipt grows into a full inference manifest you can reconcile against your provider bill. You check the math instead of taking our word.

What we do with your data

We train our optimization engine — which route holds quality at the lowest cost — not a language model. Your prompts and code are never absorbed into a model, sold, or shared. Your IP stays yours. By default we use usage to improve routing, cost, and quality; you can turn that off per key, and Enterprise gets full data isolation.

Per-tenant model tuning is a separate, explicit opt-in with its own terms — never the default, never something that happens to your data quietly.

Pricing

We make money on the spread, not a visible per-token markup. We only make money when you save money — if you don't save, we don't get paid. You never pay more than going direct, and you see your verified, quality-held-constant savings line by line in the console.

How to start

Change one env var. Point your base URL at https://api.newmen.ai/v1 and keep your key. Every model id you use today keeps working — OpenAI, Anthropic, and more. No SDK rewrite. The discount applies from your first call, and the savings climb as Atlas ramps.

The Newmen SDKs — @newmen-ai/sdk on npm and newmen-ai on PyPI — ship today, with response types generated from our OpenAPI spec so they can't drift from the wire format. If you already use the OpenAI SDKs, keep them and only swap the base URL.

What is next

Wider provider coverage, deeper per-operation optimization, and an audit-logging surface targeting SOC 2 readiness later this year. The opt-in tuning tier (the Reliability Loop) continues for teams that want it, on its own terms.

If you run AI in production and want a smaller bill you can verify, talk to us. sales@newmen.ai. We respond within one business day.

Atlas is here. Cut your AI bill — same quality, or it's free.

← All posts