NEWMEN

Get started

Why default to atlas-1

atlas-1 is the smart-routing default. Change one string in your existing OpenAI-compatible code and Atlas picks the cheapest path that passes your operation's eval gates. Pin a specific model only when you need a hard guarantee.

The one-string migration

If your existing call looks like this on OpenAI / Anthropic:

typescript · beforeconst res = await client.chat.completions.create({
  model: "chatgpt-5.5",
  messages,
  metadata: { operation_id: "summarize_ticket" },
});

The Atlas equivalent is identical except for the model field:

typescript · afterconst res = await client.chat.completions.create({
  model: "atlas-1",              // smart routing across providers
  messages,
  metadata: { operation_id: "summarize_ticket" },
});

// res.delivery → {
//   tier: "standard",
//   served_by: "provider",
//   provider: "Hyperbolic",
//   quantization: "q8",
//   upgraded: false,
//   cost_usd: 0.0041,
//   latency_ms: 1240,
//   evals: { gates: 1, scheduled: true },
//   fingerprints: {
//     prompt:   "fp_p_a1b2c3d4e5f6071829304a5b",
//     exchange: "fp_e_b2c3d4e5f6a1071829304a5c",
//     schema:   null,
//     cache:    "fp_c_c3d4e5f6a1b2071829304a5d",
//   },
// }

Same SDK shape, same response shape — plus a Newmen-specific delivery block on the response that tells you exactly what was served: which tier, which provider, which quantization, and whether the call was upgraded out of a cheaper tier.

Note.Atlas’s base URL is https://api.newmen.ai/v1. Set it as your OpenAI client’s baseURL and use a Newmen API key. See Quickstart for the full setup.

What atlas-1 actually does on your call

For every request where model: "atlas-1", Atlas runs the following decision in microseconds before forwarding upstream:

  1. Resolve a candidate model from the operation’s eval-gate history. If gates have stayed green at a cheaper quantization for the recent past, prefer that.
  2. Pick a provider variant for the chosen quantization. For Standard / Batch, the cheapest passing provider wins (variants are pre-sorted at sync time).
  3. Forward upstream with provider.only and provider.sort: "price" pinned so the routing decision is deterministic.
  4. Surface the decision on response.delivery and persist it on the recorded call so your console (and your analytics) can reason about real cost-per-tier.

You can hint the tier per call:

typescript// Cheaper still — async batch, ~24h SLA, ~50–70% off
const res = await client.chat.completions.create({
  model: "atlas-1",
  tier: "batch",
  messages,
  metadata: { operation_id: "weekly_digest" },
});

// Hard guarantee — always full-precision provider, no upgrade
const res = await client.chat.completions.create({
  model: "atlas-1",
  tier: "realtime",
  tier_strict: true,            // never silently upgrade
  messages,
  metadata: { operation_id: "live_chat" },
});

How the eval loop protects quality

The cost claim is defensible only because the eval loop runs in the same call path. When you set metadata.operation_id, Atlas:

  • Looks up the operation’s bound evaluators (set via evaluators) and their min_score thresholds.
  • Avoids any quantization or provider whose recent score has slipped below threshold. Bad variants are quarantined per operation, not globally — a quantization that works for summarization might fail for code generation; Atlas tracks each separately.
  • Records the served quantization on the call. If you sweep your calls table in the console, the “avg eval score by quantization per operation” report writes itself.

The quality refund policy

Atlas’s eval-loop integration ships with a contract no other inference broker offers: if a call belongs to an operation with a bound evaluator and the post-call evaluator returns a score below the configured min_score, the call is not metered. Period.

typescriptawait client.evaluators.create({
  id: "ev_pii_regex",
  kind: "regex",
  config: { pattern: "ssn|credit card", inverse: true },
});

await client.operations.create({
  key: "extract_invoice",
  ship_gates: [{ evaluator_id: "ev_pii_regex", min_score: 1.0 }],
});

// Calls under this operation whose output trips the evaluator are
// recorded but NOT metered. Your bill only counts the passing ones.

The mechanic in production: the chat-completions handler kicks off the evaluator in the same after() hook that meters usage. If the score lands below threshold, a compensating Stripe usage event is sent so the customer’s net metered quantity for that call is zero. The call still appears in your console (so your team can correct it via feedback), but it doesn’t show up on your invoice.

Required.The quality refund is plan-agnostic — it triggers on every plan (PAYG, Reliability Loop, Strategic) the moment an operation has a bound evaluator with a numeric threshold. Operations without an evaluator, and calls without an operation_id, aren’t eligible.

Reading the delivery block

Every chat completion carries a Newmen-specific delivery block alongside the OpenAI-compatible response. It tells you exactly what Atlas chose, what it cost, how long it took, whether the eval pass is armed, and four content-addressable fingerprints you can use for caching, dedup, and similarity-based routing.

  • cost_usd — USD priced at response time. Subject to refund-to-zero if the async eval pass scores below the operation’s min_score.
  • latency_ms — server-measured end-to-end latency in milliseconds.
  • evals.scheduledtrue when the async eval pass will run on this call (i.e. the operation has at least one bound ship gate). evals.gates is the count.
  • fingerprints — four content-addressable ids, each prefixed fp_<class>_:
    • prompt (fp_p_…) — content-only, no model, no params, no result. Use for “was this exact question asked before, anywhere?”
    • exchange (fp_e_…) — prompt + assistant result, model-agnostic. Use for cross-model deduplication and for telling the router “this exact answer has already been produced for this prompt; route the next similar call to the same path.”null on calls that returned no content.
    • schema (fp_s_…) — fingerprint of the response_format and tools[*].function.parameters schemas. Use to cluster structurally-similar calls. null when the request has no JSON schema attached.
    • cache (fp_c_…) — strict cache key. Prompt + model + every Atlas-relevant parameter (temperature, top_p, max_tokens, stop, seed, response_format, tools, tier, tier_strict). Two requests producing the same cache fingerprint are semantically equivalent and can share a cached response.
Note.On streaming completions the delivery block is carried on a synthetic chunk emitted between the last delta and [DONE]. The TypeScript SDK exposes it as await stream.finalDelivery() and the Python SDK as the stream.final_delivery property on the wrapper returned by stream_with_delivery.

What you give up by pinning a specific model

You can absolutely pass model: "openai/chatgpt-5.5" directly. Atlas still threads tier and the quality refund through — pinning only locks the underlying model id. What you give up:

  • Cross-provider arbitrage. Atlas won’t consider llama-3.1 for your summarization workload if you asked for GPT-4o, even when the eval loop says it would pass.
  • Quantization variant choice. Atlas still picks the cheapest provider for the pinned model at the requested tier, but it can’t step down a level if the cheaper provider is also serving a higher quantization than your pin.
  • Per-operation auto-improvement. The router gets smarter as your eval-gate history grows; pinning short- circuits that signal.

The rule of thumb: pin only for the few operations where the model id is part of the product (legal — the customer contracted for a specific model), or where tier:realtime + tier_strict isn’t enough of a guarantee. Default everything else to atlas-1.

FAQ

Will atlas-1 ever route a closed-weight prompt to a partner GPU?

No. Atlas Network only ever serves open-weight models (Llama, Qwen, Mistral, DeepSeek, Gemma, Phi). Closed-weight models from OpenAI / Anthropic / Google / xAI always go to managed providers, regardless of org settings or tier.

What if my operation has no eval history yet?

atlas-1 falls back to a tier-appropriate default — frontier full-precision on Realtime, a vetted open-weight model with q8 on Standard / Batch. Your first few hundred calls earn the history that future routing decisions read.

How do I opt out per call?

Set tier: "realtime" for the upstream-passthrough behaviour, plus tier_strict: true to disable silent upgrades. Set forbid_atlas_network: true to keep the call on managed providers even when partner-eligible.