NEWMEN

Core API

Embeddings

OpenAI-compatible embeddings. Pass `model: "atlas-embed-1"` for smart routing across supported upstreams, or pin a specific embedding model id. Same auth, same operation tagging, same `delivery` block as `/chat/completions`.

Basic request

Send a single string and receive a single embedding vector. Newmen forwards to the chosen upstream and surfaces the response in OpenAI’s standard shape.

bash · curlcurl https://api.newmen.ai/v1/embeddings \
  -H "Authorization: Bearer $NEWMEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "atlas-embed-1",
    "input": "The quick brown fox jumps over the lazy dog"
  }'
typescript · @newmen-ai/sdkimport Newmen from "@newmen-ai/sdk";

const client = new Newmen({ apiKey: process.env.NEWMEN_API_KEY });

const res = await client.embeddings.create({
  model: "atlas-embed-1",
  input: "The quick brown fox jumps over the lazy dog",
});

console.log(res.data[0].embedding); // number[]
console.log("dimensions:", res.data[0].embedding.length);
python · newmen-aifrom newmen import Newmen
import os

client = Newmen(api_key=os.environ["NEWMEN_API_KEY"])

res = client.embeddings.create(
    model="atlas-embed-1",
    input="The quick brown fox jumps over the lazy dog",
)

print(res.data[0].embedding[:8], "…")
print("dimensions:", len(res.data[0].embedding))

Batch processing

Pass an array of strings and get a single response with one embedding per input. Always cheaper and faster than N single-input calls.

typescriptconst res = await client.embeddings.create({
  model: "atlas-embed-1",
  input: [
    "Machine learning is a subset of artificial intelligence",
    "Deep learning uses neural networks with multiple layers",
    "NLP enables computers to understand text",
  ],
});

res.data.forEach((row, i) => {
  console.log(`embedding ${i}: ${row.embedding.length} dims`);
});
Note.Per-input order is preserved on the response — data[i] corresponds to input[i].

Multimodal inputs

Some embedding models (e.g. voyageai/voyage-3-large, cohere/embed-multilingual-v3) accept text + image content blocks for joint embeddings. The format matches the standard multimodal embedding shape used by upstream embedding providers.

typescriptconst res = await client.embeddings.create({
  model: "voyageai/voyage-3-large", // pin a multimodal-capable model
  input: [
    {
      content: [
        { type: "text", text: "A scenic boardwalk through a green meadow" },
        {
          type: "image_url",
          image_url: {
            url: "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/640px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    },
  ],
  encoding_format: "float",
});

Quantization

Open-weight embedding models (Qwen3-Embedding, BGE, Arctic Embed, mxbai-embed-large) can be served at quantized precision on the Atlas Network — Q8 is near-lossless on most retrieval benchmarks, Q4 trades a small amount of recall for substantially higher throughput. Pass quantization: "q4" (or "q8") to opt in.

typescript// Q4-quantized embeddings on the Atlas Network — much higher
// throughput, small recall trade-off. Open-weight models only.
const res = await client.embeddings.create({
  model: "qwen/qwen3-embedding-0.6b",
  input: chunks,
  quantization: "q4",
  metadata: { operation_id: "rag_index_pages" },
});

// res.delivery.quantization === "q4"
// Closed-weight models (openai/cohere/voyage) ignore the field and
// always return res.delivery.quantization === null.
Note.Closed-weight upstreams (OpenAI / Cohere / Voyage / Gemini) ignore the field — they serve at their native precision and return quantization: null on the delivery block. The router never silently picks a quantized variant unless you ask for one or unless atlas-embed-1 has eval-gate history saying it’s safe for the operation (Phase 2).

Tagged for the reliability loop

Embedding calls accept metadata.operation_id exactly like chat completions. Use it to group RAG-index calls together so the console can report on indexing latency and cost, and so the eval loop can flag bad chunks if you bind an evaluator to the operation.

typescriptconst res = await client.embeddings.create({
  model: "atlas-embed-1",
  input: chunks,
  metadata: { operation_id: "rag_index_pages" },
});

// res.delivery → { tier: "realtime", served_by: "provider",
//                  provider: "openai", quantization: null, upgraded: false }

Available models

Atlas-embed-1 currently resolves to openai/text-embedding-3-small — calibrated against price / quality / context length for a balanced general-purpose embedder. You can also pin any of:

Open-weight (partner-network eligible, supports Q8 / Q4 via the quantization param):

  • qwen/qwen3-embedding-0.6b — 1024 dims
  • qwen/qwen3-embedding-4b — 2560 dims
  • qwen/qwen3-embedding-8b — 4096 dims
  • baai/bge-large-en-v1.5 / baai/bge-m3
  • snowflake/snowflake-arctic-embed-l (full / Q8)
  • mixedbread-ai/mxbai-embed-large-v1 (full / Q8)

Closed-weight (served at the provider’s native precision):

  • openai/text-embedding-3-small — 1536 dims, $0.020 / 1M
  • openai/text-embedding-3-large — 3072 dims, $0.130 / 1M
  • voyageai/voyage-3 / voyageai/voyage-3-large — text + multimodal
  • cohere/embed-english-v3 / cohere/embed-multilingual-v3
  • google/gemini-embedding-001
Note.Embedding workloads on small Qwen / BGE models are partner-network eligible — see Atlas Network. Partner GPUs serve embedding traffic at very high throughput; OpenAI / Cohere / Voyage models stay on managed providers.