Atlas mode vs Strict

Every Newmen call runs in one of two modes. Atlas mode (the default) auto-optimizes each call for the lowest cost that holds quality. Strict mode pins an exact model and passes it straight through — the same model, just sourced cheaper. Either way you pay less than going direct from call #1 and you see the model that served every call.

Two ways to call Newmen

You point your existing client at Newmen with two env vars (see Quickstart) and every model id you use today keeps working. What changes is how each call is served:

Atlas mode (default) — auto-optimize every call for the lowest cost that holds quality. Savings climb as it ramps.
Strict mode — pin a call to an exact model and endpoint. The same model, just sourced cheaper. No substitutions, ever.

Note.Margin is only ever made on Atlas mode. Strict mode is pure pass-through, priced below direct — it’s the trust anchor and the on-ramp for skeptics and automation.

Atlas mode (default)

Atlas mode is the default. Even when you pass a normal provider model id with no explicit atlas selector, Newmen serves the call the cheapest way that holds quality and bills you below direct from request one.

typescript · Atlas mode (default)// Pass any model id you already use. This is Atlas mode by
// default — every call is optimized for the cheapest path that
// holds quality, and billed below direct from call #1.
const res = await client.chat.completions.create({
  model: "openai/gpt-5.5",
  messages,
});

Passing model: "atlas-1" is the explicit selector for full auto-optimize. It is a selector, not a model you can pin — it tells Atlas to choose the path for you.

typescript · explicit atlas-1// model: "atlas-1" is the selector for full auto-optimize.
// Same SDK shape, same response shape.
const res = await client.chat.completions.create({
  model: "atlas-1",
  messages,
});

Strict mode

When you need the exact same model every time — a contract names a specific model, or an automated pipeline must not vary — use Strict mode. It honors the model id you pass, sourced at the cheapest provider for that model, with no substitutions.

typescript · Strict mode// Strict mode: pin an exact model, pure pass-through.
// The same model the provider would serve — just sourced
// cheaper. No substitutions, ever. Still at least 5% under direct.
const res = await client.chat.completions.create({
  model: "openai/gpt-5.5",
  strict: true,
  messages,
});

Note.Strict mode is the only place “the exact same model, just cheaper” is literally true. In Atlas mode, quality is the guaranteed thing (thumbs-down → full refund), not an asserted one — so we never claim “the same model” outside Strict.

The price guarantee

Both modes are covered by the same written guarantee: you never pay more than going direct, and you start saving on call number one.

At least 5% off from your first call — a real, contractual discount, even in Strict mode where it’s the same model.
Climbing as Atlas ramps — as it sees more of your traffic, more calls are served the cheaper way that holds quality, and your discount grows.

Savings are reported quality-held-constant against your prior cost going direct, line by line in /console/usage.

The quality refund

We never claim quality is identical — we make it free if it isn’t. Thumbs-down any call you don’t like and we refund it in full, no ticket and no explanation needed. See Feedback & refunds for the contract.

See the served model on every call

Every response carries a Newmen-specific delivery block alongside the OpenAI- or Anthropic-shaped body. As a logged-in customer you always see the model that served the call, the provider, and your price versus what it would have cost going direct.

typescript · response.delivery// res.delivery → {
//   served_model: "openai/gpt-5.5",  // what actually answered
//   provider:     "OpenAI",
//   your_price:   0.0041,           // what you paid
//   direct_price: 0.0072,           // what it costs going direct
//   saved:        "43%",
// }

That’s the headline: served model and cost versus direct, so you check the math instead of taking our word. The same line shows up per-call in /console/calls.

Your data

Required.We train our optimization engine, not a language model. Your prompts and code are never absorbed into a trained model, sold, or shared — your IP stays yours. There is no model fine-tuning on your traffic by default; per-tenant tuning is a separate, explicit opt-in (the Reliability Loop, Pro). Turn off usage-based optimization per key any time, or get full data isolation on Enterprise.

FAQ

How do I guarantee the exact same model every time?

Use Strict mode — pin the model id and set strict: true. Newmen honors that exact model with no substitutions and still bills you below direct.

Do you train a model on my prompts?

No. We train the engine that decides which path is cheapest while holding quality — never a language model on your content. Per-tenant model tuning is a separate opt-in with its own terms.

When should I use Atlas mode vs Strict?

Default everything to Atlas mode; that’s where the savings climb. Reach for Strict only on the few calls where the exact model id is part of the product or a pipeline must not vary.