Blog
Introducing Atlas
2026-05-16 · 4 min read
One API. Any model. The routing intelligence that continuously trains on your production traffic — so the only benchmark that matters is yours.
Today we are releasing Atlas and the platform that ships with it. Atlas is not a single static model. It is a routing layer that picks the optimal provider per call, then trains LoRA adapters on your production corrections so every operation gets cheaper and more accurate over time. The model is competent. The difference is the loop around it.
Why we built a platform, not a lab
The honest answer is that the model capability gap is closing. Every major provider ships capable models now. The gap that remains — and it is a wide one — is the infrastructure to observe, correct, and continuously improve model behavior on a specific production workload. That infrastructure is what nobody was building, and it is what Newmen is.
We kept watching the same pattern repeat at customer after customer: a pilot succeeds on a curated set, ships to production, and quietly drifts as real traffic exposes the shape of the workload. There was no clean way to observe the drift, no standardized way to correct it in place, and no way to bind those corrections to the next release — regardless of which AI provider they were using.
We built Atlas to close that loop. Across any model.
What Atlas is
Atlas has three roles, set by the model field in your request:
Routing layer. When you pass model: "atlas" or model: "atlas-1", Atlas picks the best underlying provider for that call based on operation type, latency history, cost, and accuracy on your past traffic. You do not write routing logic. Atlas routes.
Direct passthrough. Pass model: "openai/chatgpt-5.5" and the call routes to ChatGPT-5.5 directly, billed at provider cost + 10%. Same for any supported model — Anthropic, Google, Mistral, Meta, and more. One API key. Transparent pricing.
Continuously trained specialist. As tagged corrections accumulate per operation, Atlas trains task-specific LoRA adapters. The adapter is smaller, cheaper, and more accurate for that specific task than the general model — and it improves every week.
What is the loop
The loop is six stages. Define an operation — a named, observable unit of work like summarize_ticket or extract_invoice. Tag every production call with the operation's key. Review and correct calls directly in the console, no export and re-upload required. Build a versioned dataset from tagged calls. Run evaluators against the dataset — regex, JSON-schema, LLM-judge, embedding-match — and configure ship gates that block promotion until every gate passes. Once gated, request training, and a per-tenant adapter is registered against your organization.
This is not a workflow you bolt on to Atlas. It is what Atlas was built for. The console, the SDKs, the API surface, and the routing intelligence were designed in lockstep with the loop. Every primitive composes. The loop runs on any model in the table.
Three things you don't have to do
Prompt engineering. Atlas routes to the best model per call, and per-operation adapters train on your corrections. You don't iterate on prompts to coax better behavior — you tag what was wrong and the system closes the gap.
Training infrastructure. You don't stand up a training pipeline, manage compute, or version checkpoints. You promote a golden dataset through ship gates. Newmen handles the rest.
Inference operations. No serving layer to run. No model monitoring to wire up. No adapter deployment to orchestrate. The console shows you your per-operation pass rate and the direction it is moving.
What we don't publish
We don't publish benchmark scores. We show customers their improvement curve on their production operations. That's the only benchmark that matters.
Other providers will report SOTA on the same handful of public evals every quarter. We will show you your pass rate on extract_invoice over the last ninety days and the direction it is moving. Those are different measurements. Only one of them tells you whether the system is working for you.
Pricing
One API key. Two rate structures. Atlas inference runs at Newmen-controlled rates. Third-party models route at provider cost + 10% — you see exactly what the provider charges and exactly what Newmen adds. The full model table is at /pricing.
The $1,500/mo reliability loop is a platform fee on top of usage. It funds recording, evaluators, ship gates, and the training workflow. It applies equally whether you are running Atlas inference or third-party models.
How to start
Three steps. Sign up, create an organization, generate an API key. Replace your existing chat-completions base URL with https://api.newmen.ai/v1. Start adding metadata.operation_id to your calls. The loop activates on day one.
The Newmen SDKs — @newmen-ai/sdk on npm and newmen-ai on PyPI — ship today. Both are hand-written for ergonomics with response types generated from our OpenAPI spec, so they cannot drift from the wire format. If you already use the OpenAI SDKs, you can keep using them and only swap the base URL; everything works.
What is next
Per-tenant training pipeline (currently sales-gated). Vision (next release). Per-operation adapter routing — pointing one operation at the routing layer and another at a trained adapter. An audit-logging surface targeting SOC 2 readiness later this year.
We are a small team and that is on purpose. The reliability loop is the one thing.
If you run AI in production and you care about measurable performance on your own traffic, talk to us. sales@newmen.ai. We respond within one business day, and we send a solutions engineer to the first call.
Atlas is here. The loop is open.