Prometheus Spark
prometheus-spark A compact model for routing, classification, and short tasks where time-to-first-token is everything.
Prometheus is a drop-in gateway for the models your product runs on. Keep the OpenAI SDK, swap the base URL, and call chat, vision, reasoning, transcription, and embeddings through stable aliases — with predictable latency and usage-based billing.
No SDK rewrite · Pay per token · No seats, no minimums
// drop-in: point the OpenAI SDK at Prometheus
import OpenAI from "openai"
const prometheus = new OpenAI({
baseURL: "https://api.getprometheus.org/v1",
apiKey: process.env.PROMETHEUS_API_KEY,
})
const res = await prometheus.chat.completions.create({
model: "prometheus-core",
messages: [{ role: "user", content: "Ship it." }],
}) The model line
Eight stable aliases cover the work most products do. Names never change, so you can choose by speed, depth, or modality and build with confidence.
prometheus-spark A compact model for routing, classification, and short tasks where time-to-first-token is everything.
prometheus-lite The everyday workhorse — high-volume product responses and agents that run all day, at a low cost.
prometheus-core The deep model for planning, code, analysis, and multi-step agents that need to think before they act.
prometheus-vision Image-aware chat for screenshots, diagrams, forms, and photos — billed like chat, no per-image fee.
prometheus-pulse Sub-second voice and realtime agents — fast time-to-first-token with reliable tool calling for phone and live conversation.
prometheus-surge A smarter realtime tier for voice and complex agents that still answers fast enough for a live call.
prometheus-atlas Vector representations for RAG, memory, search, and semantic matching across your knowledge.
prometheus-echo Speech-to-text transcription that turns meetings, calls, and voice notes into structured text.
Built for production
Think in capabilities — speed, depth, vision, voice, or vectors. One OpenAI-compatible endpoint, stable names that never change, and usage you can actually see.
Match each call to the model that fits — speed, depth, vision, voice, or vectors — and tune cost and quality without touching client code.
Your code targets prometheus-* names. They stay constant as the family improves, so an integration you ship today keeps working tomorrow.
Every request is metered per key — tokens, latency, and spend on a daily and weekly chart in the dashboard.
assistant.reply prometheus-lite agent.plan prometheus-core image.inspect prometheus-vision audio.transcribe prometheus-echo context.embed prometheus-atlas Pricing
Transparent, usage-based pricing per model. Chat and vision bill by input and output tokens, embeddings by input tokens, and audio by the minute. No seats, no minimums.
| Model | Capability | Input | Output |
|---|---|---|---|
Prometheus Spark prometheus-spark | Chat, routing, classification | $0.15 / 1M | $0.60 / 1M |
Prometheus Lite prometheus-lite | High-volume product responses | $0.10 / 1M | $0.40 / 1M |
Prometheus Core prometheus-core | Planning, code, deep analysis | $1.10 / 1M | $4.40 / 1M |
Prometheus Vision prometheus-vision | Image-aware chat and visual QA | $0.40 / 1M | $1.60 / 1M |
Prometheus Pulse prometheus-pulse | Low-latency voice and tool calling | $0.15 / 1M | $0.60 / 1M |
Prometheus Surge prometheus-surge | Smarter realtime voice and agents | $0.40 / 1M | $1.60 / 1M |
Prometheus Atlas prometheus-atlas | RAG, memory, semantic search | $0.02 / 1M | — |
Prometheus Echo prometheus-echo | Speech-to-text transcription | $0.006 / min | — |
Prices in USD, billed per actual usage. Vision images count as input tokens — no separate per-image fee.
Get an API keyFAQ
Yes. Point the OpenAI SDK (or any OpenAI-style client) at the Prometheus /v1 base URL, use a prom_sk_ key, and call chat completions, embeddings, and audio transcriptions with the request shapes you already know.
Pick by capability. Use spark for fast, short tasks, lite for high-volume work, core for deep reasoning, vision for images, pulse and surge for realtime voice, atlas for embeddings, and echo for transcription. The aliases are the stable contract — names never change, so the model you ship today keeps working.
Usage-based, per key. Chat and vision bill by input and output tokens, embeddings by input tokens, and audio by the minute of source. No seats and no monthly minimums — you pay for what you call.
Streaming chat returns standard OpenAI-style SSE chunks, and extra fields like tools, response_format, top_p, and stream_options are supported. Prefer stream: true for the lowest time-to-first-token.
That's the point. The docs include an agent spec you can paste into an autonomous agent, plus raw HTTP examples for clients that can't load an SDK. Everything is keyed and metered for observability.
Vision uses standard chat messages with image_url content parts and counts images as input tokens. Audio transcription takes multipart uploads up to 25 MB and returns json, text, verbose_json, srt, or vtt.
Create a key, point your OpenAI client at Prometheus, and ship. Pay only for the tokens you call — no rewrite, no minimums.