Core API
Embeddings
OpenAI-compatible embeddings. Pass `model: "atlas-embed-1"` for smart routing across supported upstreams, or pin a specific embedding model id. Same auth, same operation tagging, same `delivery` block as `/chat/completions`.
Basic request
Send a single string and receive a single embedding vector. Newmen forwards to the chosen upstream and surfaces the response in OpenAI’s standard shape.
bash · curlcurl https://api.newmen.ai/v1/embeddings \
-H "Authorization: Bearer $NEWMEN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "atlas-embed-1",
"input": "The quick brown fox jumps over the lazy dog"
}'typescript · @newmen-ai/sdkimport Newmen from "@newmen-ai/sdk";
const client = new Newmen({ apiKey: process.env.NEWMEN_API_KEY });
const res = await client.embeddings.create({
model: "atlas-embed-1",
input: "The quick brown fox jumps over the lazy dog",
});
console.log(res.data[0].embedding); // number[]
console.log("dimensions:", res.data[0].embedding.length);python · newmen-aifrom newmen import Newmen
import os
client = Newmen(api_key=os.environ["NEWMEN_API_KEY"])
res = client.embeddings.create(
model="atlas-embed-1",
input="The quick brown fox jumps over the lazy dog",
)
print(res.data[0].embedding[:8], "…")
print("dimensions:", len(res.data[0].embedding))Batch processing
Pass an array of strings and get a single response with one embedding per input. Always cheaper and faster than N single-input calls.
typescriptconst res = await client.embeddings.create({
model: "atlas-embed-1",
input: [
"Machine learning is a subset of artificial intelligence",
"Deep learning uses neural networks with multiple layers",
"NLP enables computers to understand text",
],
});
res.data.forEach((row, i) => {
console.log(`embedding ${i}: ${row.embedding.length} dims`);
});data[i] corresponds to input[i].Multimodal inputs
Some embedding models (e.g. voyageai/voyage-3-large, cohere/embed-multilingual-v3) accept text + image content blocks for joint embeddings. The format matches the standard multimodal embedding shape used by upstream embedding providers.
typescriptconst res = await client.embeddings.create({
model: "voyageai/voyage-3-large", // pin a multimodal-capable model
input: [
{
content: [
{ type: "text", text: "A scenic boardwalk through a green meadow" },
{
type: "image_url",
image_url: {
url: "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/640px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
},
},
],
},
],
encoding_format: "float",
});Quantization
Open-weight embedding models (Qwen3-Embedding, BGE, Arctic Embed, mxbai-embed-large) can be served at quantized precision on the Atlas Network — Q8 is near-lossless on most retrieval benchmarks, Q4 trades a small amount of recall for substantially higher throughput. Pass quantization: "q4" (or "q8") to opt in.
typescript// Q4-quantized embeddings on the Atlas Network — much higher
// throughput, small recall trade-off. Open-weight models only.
const res = await client.embeddings.create({
model: "qwen/qwen3-embedding-0.6b",
input: chunks,
quantization: "q4",
metadata: { operation_id: "rag_index_pages" },
});
// res.delivery.quantization === "q4"
// Closed-weight models (openai/cohere/voyage) ignore the field and
// always return res.delivery.quantization === null.quantization: null on the delivery block. The router never silently picks a quantized variant unless you ask for one or unless atlas-embed-1 has eval-gate history saying it’s safe for the operation (Phase 2).Tagged for the reliability loop
Embedding calls accept metadata.operation_id exactly like chat completions. Use it to group RAG-index calls together so the console can report on indexing latency and cost, and so the eval loop can flag bad chunks if you bind an evaluator to the operation.
typescriptconst res = await client.embeddings.create({
model: "atlas-embed-1",
input: chunks,
metadata: { operation_id: "rag_index_pages" },
});
// res.delivery → { tier: "realtime", served_by: "provider",
// provider: "openai", quantization: null, upgraded: false }Available models
Atlas-embed-1 currently resolves to openai/text-embedding-3-small — calibrated against price / quality / context length for a balanced general-purpose embedder. You can also pin any of:
Open-weight (partner-network eligible, supports Q8 / Q4 via the quantization param):
qwen/qwen3-embedding-0.6b— 1024 dimsqwen/qwen3-embedding-4b— 2560 dimsqwen/qwen3-embedding-8b— 4096 dimsbaai/bge-large-en-v1.5/baai/bge-m3snowflake/snowflake-arctic-embed-l(full / Q8)mixedbread-ai/mxbai-embed-large-v1(full / Q8)
Closed-weight (served at the provider’s native precision):
openai/text-embedding-3-small— 1536 dims, $0.020 / 1Mopenai/text-embedding-3-large— 3072 dims, $0.130 / 1Mvoyageai/voyage-3/voyageai/voyage-3-large— text + multimodalcohere/embed-english-v3/cohere/embed-multilingual-v3google/gemini-embedding-001