OpenAI-compatible · text, vision, voice & embeddings

One endpoint for every model.

Prometheus is a drop-in gateway for the models your product runs on. Keep the OpenAI SDK, swap the base URL, and call chat, vision, reasoning, transcription, and embeddings through stable aliases — with predictable latency and usage-based billing.

No SDK rewrite · Pay per token · No seats, no minimums

example.ts
// drop-in: point the OpenAI SDK at Prometheus
import OpenAI from "openai"

const prometheus = new OpenAI({
  baseURL: "https://api.getprometheus.org/v1",
  apiKey: process.env.PROMETHEUS_API_KEY,
})

const res = await prometheus.chat.completions.create({
  model: "prometheus-core",
  messages: [{ role: "user", content: "Ship it." }],
})

The model line

One family, every capability

Eight stable aliases cover the work most products do. Names never change, so you can choose by speed, depth, or modality and build with confidence.

Fast chat

Prometheus Spark

prometheus-spark

A compact model for routing, classification, and short tasks where time-to-first-token is everything.

Balanced

Prometheus Lite

prometheus-lite

The everyday workhorse — high-volume product responses and agents that run all day, at a low cost.

Reasoning

Prometheus Core

prometheus-core

The deep model for planning, code, analysis, and multi-step agents that need to think before they act.

Multimodal

Prometheus Vision

prometheus-vision

Image-aware chat for screenshots, diagrams, forms, and photos — billed like chat, no per-image fee.

Realtime

Prometheus Pulse

prometheus-pulse

Sub-second voice and realtime agents — fast time-to-first-token with reliable tool calling for phone and live conversation.

Realtime

Prometheus Surge

prometheus-surge

A smarter realtime tier for voice and complex agents that still answers fast enough for a live call.

Embeddings

Prometheus Atlas

prometheus-atlas

Vector representations for RAG, memory, search, and semantic matching across your knowledge.

Audio

Prometheus Echo

prometheus-echo

Speech-to-text transcription that turns meetings, calls, and voice notes into structured text.

US East · Virginia

Ultra-low latency for US East

Our servers run in Northern Virginia (US East), next to the densest stretch of North American internet. For voice and realtime agents that means a fast, steady time-to-first-token — close enough for a live phone call to feel instant.

Region

us-east · N. Virginia

Realtime tiers

Pulse & Surge

Tuned for

Voice & live agents

deployed in us-east
Northern Virginia (us-east)
Eastern & Central US
Eastern Canada
Low-latency North American routes

Built for production

One API for every model

Think in capabilities — speed, depth, vision, voice, or vectors. One OpenAI-compatible endpoint, stable names that never change, and usage you can actually see.

A model for every task

Match each call to the model that fits — speed, depth, vision, voice, or vectors — and tune cost and quality without touching client code.

Stable aliases that never break

Your code targets prometheus-* names. They stay constant as the family improves, so an integration you ship today keeps working tomorrow.

Usage you can see

Every request is metered per key — tokens, latency, and spend on a daily and weekly chart in the dashboard.

map your calls
assistant.reply prometheus-lite
agent.plan prometheus-core
image.inspect prometheus-vision
audio.transcribe prometheus-echo
context.embed prometheus-atlas

Pricing

Pay for tokens, not complexity

Transparent, usage-based pricing per model. Chat and vision bill by input and output tokens, embeddings by input tokens, and audio by the minute. No seats, no minimums.

Model Capability Input Output
Prometheus Spark prometheus-spark Chat, routing, classification $0.15 / 1M $0.60 / 1M
Prometheus Lite prometheus-lite High-volume product responses $0.10 / 1M $0.40 / 1M
Prometheus Core prometheus-core Planning, code, deep analysis $1.10 / 1M $4.40 / 1M
Prometheus Vision prometheus-vision Image-aware chat and visual QA $0.40 / 1M $1.60 / 1M
GPT-5.2 Instant gpt-5.2-instant Low-latency voice and tool calling $0.15 / 1M $0.60 / 1M
GPT-5.2 Mini gpt-5.2-mini Smarter realtime voice and agents $0.40 / 1M $1.60 / 1M
GPT-5.2 Nano gpt-5.2-nano High-volume chat and classification $0.10 / 1M $0.40 / 1M
Claude Haiku 4.5 claude-haiku-4.5 Balanced chat and everyday agents $1.00 / 1M $5.00 / 1M
Prometheus Atlas prometheus-atlas RAG, memory, semantic search $0.02 / 1M
Prometheus Echo prometheus-echo Speech-to-text transcription $0.006 / min

Prices in USD, billed per actual usage. Vision images count as input tokens — no separate per-image fee.

Get an API key

FAQ

Questions, answered

Is it really OpenAI-compatible?

Yes. Point the OpenAI SDK (or any OpenAI-style client) at the Prometheus /v1 base URL, use a prom_sk_ key, and call chat completions, embeddings, and audio transcriptions with the request shapes you already know.

How do I choose a model?

Pick by capability. Use spark for fast, short tasks, lite for high-volume work, core for deep reasoning, vision for images, pulse and surge for realtime voice, atlas for embeddings, and echo for transcription. The aliases are the stable contract — names never change, so the model you ship today keeps working.

How does billing work?

Usage-based, per key. Chat and vision bill by input and output tokens, embeddings by input tokens, and audio by the minute of source. No seats and no monthly minimums — you pay for what you call.

Do you support streaming and tools?

Streaming chat returns standard OpenAI-style SSE chunks, and extra fields like tools, response_format, top_p, and stream_options are supported. Prefer stream: true for the lowest time-to-first-token.

Can agents call it directly?

That's the point. The docs include an agent spec you can paste into an autonomous agent, plus raw HTTP examples for clients that can't load an SDK. Everything is keyed and metered for observability.

What about images and audio limits?

Vision uses standard chat messages with image_url content parts and counts images as input tokens. Audio transcription takes multipart uploads up to 25 MB and returns json, text, verbose_json, srt, or vtt.

Bring your product to the fire

Create a key, point your OpenAI client at Prometheus, and ship. Pay only for the tokens you call — no rewrite, no minimums.