NEWMEN

About Newmen

Your partner, not your supplier.

Frontier labs sell you tokens and move on. They don’t fix the bad responses, they don’t take your corrections into training, and they make you wear every regression. Newmen exists because the contract should run the other direction. Your evals are the rule. Your corrections train your model. Bad calls don’t make it to your invoice. The platform is Newmen, the routing intelligence is Atlas, and the product is the loop between them — across any model you use.

Values

Three commitments

Your partner, not your supplier

Frontier labs treat API customers like a number — they don't fix bad responses, they don't take your corrections into training, they make you wear every regression. Newmen is built the opposite way. Your evals are the rule. Your corrections train your model. Bad calls never make it to your invoice.

Eval-gated everything

Internal Atlas releases pass through the same gates we ship to customers. Datasets cannot be promoted without their evaluators passing. Training is a request, not a self-serve button — because retrains shape user-facing surfaces and deserve a human in the loop. The eval-gated quality refund is the contract: if a call scores below your threshold, you don't pay for it.

Transparency with customers

Every call is recorded, viewable, and exportable. Every dataset has a version. Every evaluator has its rubric on file. You see the provider cost and the Newmen markup, separately. The `delivery` block on every response tells you exactly which model, provider, and quantization served the call. If we will not explain it, we will not ship it.

Why now

Intelligence is becoming electricity.

LLMs are commoditizing. Capability ceilings are converging across labs. The next axis is not which model you call — it's whether the call gets faster, cheaper, and more accurate every week on your specific production traffic. General models are general. Yours shouldn't be.

We started Newmen because we kept watching the same pattern repeat at customer after customer: a pilot succeeds on a curated set, ships to production, and quietly drifts as real traffic exposes the shape of the workload. There was no clean way to observe the drift, no standardized way to correct it in place, and no way to bind those corrections to the next release — regardless of which AI provider you were using.

Atlas was designed around that gap. It routes to the right model per call. It records every production call. Engineers correct outputs in place. Those corrections accumulate into per-customer golden datasets that train task-specific adapters — a version of Atlas tuned on your reality, not the average of the internet. The loop runs on any model in the table.

Team

Who is here

Placeholder roster for v1. The team is small and on purpose.

Investors

Backing

Talk to a solutions engineer

Atlas is sold to teams who commit to meaningful production volume. That commitment unlocks the reliability loop.