Base URL
api.getprometheus.org/v1
Agent-ready reference
Copy the base URL, send a prom_sk_
bearer key, choose a model alias, and use the standard OpenAI request
shapes. This page is built for both engineers and agents.
curl -sS https://api.getprometheus.org/v1/chat/completions \
-H "Authorization: Bearer $PROMETHEUS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "prometheus-core",
"messages": [
{ "role": "user", "content": "Give me a deployment checklist." }
],
"temperature": 0.2,
"max_tokens": 700
}' Base URL
api.getprometheus.org/v1
Auth
Bearer prom_sk_...
Default chat model
prometheus-core
Compatibility
OpenAI SDK + custom baseURL
The compact contract an autonomous agent needs before it calls the API: routing rules, auth, endpoint names, and model selection.
# Prometheus Agent Spec
baseURL: https://api.getprometheus.org/v1
auth: Authorization: Bearer prom_sk_...
endpoints:
GET /v1/models
POST /v1/chat/completions
POST /v1/embeddings
POST /v1/audio/transcriptions
models:
prometheus-spark: chat; fast replies, routing, classification, short tasks; $0.15/1M in, $0.60/1M out
prometheus-lite: chat; balanced cost, quality, and throughput; $0.10/1M in, $0.40/1M out
prometheus-core: chat; complex reasoning, planning, code, multi-step agents; $1.10/1M in, $4.40/1M out
prometheus-vision: chat; image understanding, screenshots, diagrams, homework photos; $0.40/1M in, $1.60/1M out
prometheus-atlas: embeddings; RAG, memory, search, semantic matching; $0.02/1M in
prometheus-echo: audio; speech-to-text transcription; $0.006/min
pricing:
- USD, billed per actual usage; no minimum.
- Chat and vision charge per input + output token.
- Vision images count as input tokens; no separate per-image fee.
- Embeddings charge per input token; audio charges per minute of source.
rules:
- Use Prometheus aliases only; never send provider or upstream model ids.
- OpenAI SDK clients must set baseURL to the /v1 URL above.
- Send Authorization as a Bearer token with a prom_sk_ key.
- Chat supports stream: true and returns OpenAI-style SSE chunks. Prefer stream: true for the lowest time-to-first-token.
- Vision uses standard chat messages with image_url content parts.
- Extra chat fields such as tools, response_format, top_p, and stream_options are forwarded.
- Gateway errors use { error: { message, type, param, code } }. Prometheus exposes public aliases only. Clients should never send provider names or upstream model ids.
| Alias | Endpoint | Capability | Use when |
|---|---|---|---|
prometheus-spark | /v1/chat/completions | Fast chat | Routing, classification, short summaries, simple actions. |
prometheus-lite | /v1/chat/completions | Balanced chat | High-volume assistants, production workflows, everyday agents. |
prometheus-core | /v1/chat/completions | Deep reasoning | Planning, code, analysis, hard decisions, multi-step agents. |
prometheus-vision | /v1/chat/completions | Vision chat | Homework photos, screenshots, diagrams, forms, visual QA. |
prometheus-atlas | /v1/embeddings | Embeddings | RAG, memory, search, recommendations, semantic matching. |
prometheus-echo | /v1/audio/transcriptions | Audio transcription | Meetings, calls, voice notes, subtitles, speech-to-text. |
Billed per actual usage. Chat and vision charge per input and output token, embeddings per input token, and audio per minute. Vision images count as input tokens, so there is no separate per-image fee.
| Alias | Capability | Input | Output |
|---|---|---|---|
prometheus-spark | Fast chat | $0.15 / 1M | $0.60 / 1M |
prometheus-lite | Balanced chat | $0.10 / 1M | $0.40 / 1M |
prometheus-core | Deep reasoning | $1.10 / 1M | $4.40 / 1M |
prometheus-vision | Vision chat | $0.40 / 1M | $1.60 / 1M |
prometheus-atlas | Embeddings | $0.02 / 1M | — |
prometheus-echo | Audio transcription | $0.006 / minute | — |
Set baseURL
to the Prometheus /v1
endpoint and pass a Prometheus API key as the SDK key.
import OpenAI from "openai"
export const prometheus = new OpenAI({
apiKey: process.env.PROMETHEUS_API_KEY,
baseURL: "https://api.getprometheus.org/v1",
}) const completion = await prometheus.chat.completions.create({
model: "prometheus-core",
messages: [
{ role: "system", content: "You are a precise product engineering agent." },
{ role: "user", content: "Summarize this incident and list next actions." },
],
temperature: 0.2,
max_tokens: 800,
})
console.log(completion.choices[0]?.message?.content) const stream = await prometheus.chat.completions.create({
model: "prometheus-lite",
messages: [{ role: "user", content: "Draft a release note." }],
stream: true,
})
for await (const event of stream) {
process.stdout.write(event.choices[0]?.delta?.content ?? "")
} const completion = await prometheus.chat.completions.create({
model: "prometheus-vision",
messages: [
{
role: "user",
content: [
{ type: "text", text: "Read this exercise and explain the next step." },
{
type: "image_url",
image_url: { url: "data:image/png;base64,..." },
},
],
},
],
})
console.log(completion.choices[0]?.message?.content) const result = await prometheus.embeddings.create({
model: "prometheus-atlas",
input: [
"Prometheus is an OpenAI-compatible model gateway.",
"Agents should use stable prometheus-* aliases.",
],
})
console.log(result.data[0]?.embedding.length) import fs from "node:fs"
const transcript = await prometheus.audio.transcriptions.create({
model: "prometheus-echo",
file: fs.createReadStream("meeting.mp3"),
response_format: "json",
})
console.log(transcript.text)
Every /v1/*
endpoint requires the bearer key. Usage is logged against the owning
key for dashboard analytics.
Supports prometheus-spark, prometheus-lite, prometheus-core, and prometheus-vision. Required fields are model and at least one messages item. Optional fields include stream, temperature, max_tokens, top_p, tools, response_format, image content parts, and compatible extra OpenAI fields.
Uses prometheus-atlas. Send input as a string, string array, token array, or token-array batch. Optional fields include encoding_format and dimensions.
Uses prometheus-echo with multipart form data. Send file plus model. Optional fields are language, temperature, and response_format as json, text, verbose_json, srt, or vtt. Audio files are limited to 25 MB.
Returns the model aliases visible to the key in the standard OpenAI list shape: { object: "list", data: [...] }. Each model id is a Prometheus alias.
Use these when an agent cannot load the OpenAI SDK or needs a deterministic shell command.
curl -sS https://api.getprometheus.org/v1/models \
-H "Authorization: Bearer $PROMETHEUS_API_KEY" curl -sS https://api.getprometheus.org/v1/chat/completions \
-H "Authorization: Bearer $PROMETHEUS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "prometheus-core",
"messages": [
{ "role": "user", "content": "Give me a deployment checklist." }
],
"temperature": 0.2,
"max_tokens": 700
}' curl -sS https://api.getprometheus.org/v1/chat/completions \
-H "Authorization: Bearer $PROMETHEUS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "prometheus-vision",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "What is shown in this image?" },
{
"type": "image_url",
"image_url": { "url": "data:image/png;base64,..." }
}
]
}
]
}' curl -sS https://api.getprometheus.org/v1/embeddings \
-H "Authorization: Bearer $PROMETHEUS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "prometheus-atlas",
"input": ["billing webhook retry policy", "invoice payment failed"]
}' curl -sS https://api.getprometheus.org/v1/audio/transcriptions \
-H "Authorization: Bearer $PROMETHEUS_API_KEY" \
-F model=prometheus-echo \
-F response_format=json \
-F [email protected] Validation and auth failures use an OpenAI-compatible envelope. Streaming chat emits standard server-sent events and rewrites every response model back to the Prometheus alias.
{
"error": {
"message": "Missing API key. Provide it as Authorization: Bearer <key>.",
"type": "invalid_request_error",
"param": null,
"code": "invalid_api_key"
}
} Ready to test