Newmen · Smart Routing Case Study

Codex, env-only.~30% under rate.

We pointed Codex at Newmen with nothing but environment config — no proxy — and asked it to build a Web Audio synth studio. The gateway now speaks Codex natively, so it runs end-to-end. On the build, smart routing billed ~30% under what the default model would have charged for the identical generation. Real numbers, honest trade-offs.

under the default model's rate — for the identical build

$0.00

Default rate

$0.00

Smart routing

$0.00

Saved

24,078 tokens in the build·2 paths, both working·32,992 bytes shipped

It’s one line to switch

Keep your client, your prompts, and your model choices exactly as they are. Point the base URL at Newmen, drop in your key, and smart routing handles the rest.

~/.bashrc

export OPENAI_BASE_URL="https://api.newmen.ai/v1"
export OPENAI_API_KEY="nm_live_..."

That’s the entire integration. No SDK swaps, no prompt changes, no model config. Every call is logged in your console with full payloads for review.

The build · Codex, native through the gateway

Routed — 30% under the rate

Codex now runs entirely through Newmen with env config alone — no proxy. It built the Web Audio Synth Station, and smart routing billed that exact generation ~30% below what the default model's published rate would have charged. We also re-ran the default path direct (left) to confirm it now works end-to-end through the gateway.

The prompt

Build a single self-contained HTML “Synth Station” — a Web Audio browser music studio: a 16-step sequencer (kick, snare, hi-hat + a melodic synth), selectable waveforms and a filter, BPM / master-volume / per-track mute, play / stop / clear / randomize, localStorage save/load, keyboard play, an animated canvas visualizer, and a clean dark-neon UI. Self-review the code and fix anything broken before finishing.

DefaultSmart routing

Cost$0.21$0.14

Calls11

Tokens24,07824,078

Throughput (tok/s)5757

Build time353s353s

Cost vs default30% cheaper

Direct build (default model)Direct

▶ Play it live

Proof the direct path is fixed: Codex's default model now runs clean through the gateway (it used to 400 on a tool-schema mismatch). This separate direct build shipped a working 26 KB studio in 66s at ~193 tok/s — billed $0.13.

Smart-routing buildRouted

▶ Play it live

The routed build: a richer 33 KB studio. Smart routing billed it $0.14 — ~30% under the $0.21 the default model would have charged for the same 24,078-token generation — at ~57 tok/s over 353s.

The Default column is what this exact 24,078-token generation would have cost at the default model's published rate ($0.21); smart routing delivered the identical build for $0.14 — ~30% less. Throughput is the trade-off: ~57 tok/s, well under a direct call's ~193. The separate direct build (left) confirms Codex's default path now works through the gateway with config alone.

Reading the numbers

~30%

Under the default model's rate, same generation

$0.14

Routed build vs $0.21 default rate

env-only

No proxy — config alone

57 tok/s

Routed throughput — the trade-off

Env-only, no proxy. Codex now runs entirely through the gateway with config alone. The gateway was hardened to accept Codex's `developer` role, null content, and its hybrid tool schema — the direct path that used to 400 works again.
The routing discount is the headline. Smart routing billed the build at $0.14 — ~30% under the $0.21 the default model would have charged for the identical 24,078-token generation, logged call-by-call.
Both paths completed cleanly. The routed build and the newly-working direct build each shipped a complete Synth Station: sequencer, drums, synth, waveforms, filter, save/load, keyboard, visualizer.
Throughput is the trade-off. This is not near-parity. The routed build ran at ~57 tok/s over 353s — materially slower than a direct call (~193 tok/s) — and output size varies run to run. Cost-first, not speed-first; latency-critical calls can opt out.

See it in your console

None of these numbers are estimates — they’re pulled straight from your live traffic. Savings, spend, calls, tokens, and per-run breakdowns are all visible in the dashboard the moment a call lands.

**Usage.** Period savings, spend vs. direct, and token totals at a glance.

**Overview.** “Saved by routing” headline plus recent calls with per-call cost and savings.

**Calls.** Every call shows the struck direct price → what you paid, and the amount saved.

**Sessions.** Calls grouped by run, so you see what a whole task cost — and saved.

Play them yourself

Both builds in each run are embedded above — open “Play it live” on any card, draw a downhill track with the mouse, place the rider, and hit Play. The cost difference is real; judge the quality difference for yourself.

Start cutting your AI bill today.

Two env vars, your existing client, and every call logged with what it cost versus direct. Get $5 in credits and watch the savings land in your console on the first call.

Create a free account →Read the docs