Cerver is an API for AI sessions.

Think of it as the plumbing every AI product needs: sessions that keep their state across turns, reconnects, model swaps, compute swaps, and updates. We made the infra simple, transparent, reliable, and inexpensive, so humans and agents can understand it without accidentally breaking the foundation.

Save engineering time. Stop rebuilding sessions, retries, transcripts, and sandbox glue.
Save AI spend. Route easy work to cheaper paths and reserve premium models for hard turns.
Stay reliable. One API for session state, compute swaps, metrics, and auditability.
session_042 live · routed
Review this auth migration and tell us the safest path.
Same session, three possible routes. Easy parts go cheap. Risky reasoning goes deep. The transcript stays intact.
model premium for risk switchable
compute Vercel → E2B mid-session
bill $0.18 actual itemised
Install the tutorial — 1 line Then say "/cerver tutorial" to your agent.
curl -fsSL https://cerver.ai/skill | bash
Vercel Cloudflare E2B your laptop
swap mid-session ·same transcript ·same API

Fast products still need stable foundations.

market
In a world where anyone can build quickly, the difference between a great product and a could-have-been-great product is often stability.
product
The parts have to fit together with precision: sessions, models, tools, compute, billing, and history.
session
A session is like the joint in precise joinery. You do not notice it when it fits. You notice immediately when it moves, leaks, or needs constant repair.
cerver
Your model can change, your CLI can change, your compute can change. The session should hold, so your team is not maintaining overnight plumbing forever.
result
Reliable sessions save engineering time. Smart routing saves AI spend. One API keeps the pieces fitted together.
behind every session

A session is the front of a four-layer stack.

Most platforms ship one of the layers and lock the rest. Cerver gives you all four — model, tools, compute, billing — and lets you swap any of them without touching the session above.

session — what your user sees model — the brain answering tools — what it can call compute — where it runs billing — whose meter ticks
billing
this turn$0.0042
this session$0.18
today$2.41
billed toyour Anthropic key
markup$0
compute
providervercel
sandboxsb_4f2e1a
regioniad1
status● ready
swap⇄ 1 POST
tools
bash12 calls
edit4 calls
grep3 calls
web_fetch1 call
mcp servers2 connected
model
nameclaude-opus-4-7
context1M
in tokens12,847
out tokens3,201
swap⇄ haiku · gpt-5 · gemini
session
summarize this PR in one line
Switches the auth middleware from session cookies to short-lived JWTs; drops the `legacy_sessions` table.
how risky is the migration?
type a message…

What your agent does, when you say "/cerver tutorial".

Four phases. About two minutes start to finish. You watch.

01

Signs you up.

Hits POST /v2/auth/login with your email. You get an API key immediately.

02

Asks for one credential.

"Want to run on your Vercel? Paste a token. Or skip and use ours." That's it.

03

Runs three demos.

Says hi · swaps compute mid-session · runs one intent through multiple brains and compares quality, speed, and cost.

04

Tells you the bill.

Token counts and dollar estimate. Offers cerver upgrade if you want a permanent key.

Switch the brain. Switch the body. Keep the memory.

Between an intent and its execution, every layer is swappable except the one that remembers.

user user user user … × thousands
intent
Intelligence Claude · GPT-5 · Gemini · Mock
runs on
Compute Vercel · Cloudflare · E2B · your laptop
all wired through
cerver
history · auth · billing · per-session itemised bill

One bill. One log. One API. Zoom out — same stack, same Cerver.

Who picks Cerver.

Three shapes of pain we hear most often. If one sounds like you, the demo curl up top is one paste away.

App developers

You're shipping an AI feature, not building infra to host it.

Persistent sessions per user. Streaming. Prompt caching that actually caches. Per-user usage on every turn so billing reconciles with one query. Stop maintaining a messages table.

→ A 50K-user chatbot in a SaaS, not a side project
Agent builders

Your agent shouldn't have to ask you for compute.

Sandbox is a tool the agent calls. It requests, runs, releases — alone. Fan out across eight boxes mid-conversation without writing a worker pool. Same session before and after.

→ Agents that scale themselves
Benchmarkers

Your team should not run on stale AI preferences.

Models, CLIs, and compute backends keep changing. Run the same intent through multiple runners and compare quality, speed, cost, and reliability before you standardize.

→ Tool choices backed by current results, not habit

Don't let yesterday's favorite AI run tomorrow's work.

The best model, CLI, and compute layer changes constantly. Most teams do not have time to follow every shift, so important work quietly runs through whatever people already prefer.

Live comparison

Same intent. Several runners. One measurable choice.

A critical developer may love one CLI, even while another CLI has been better for your workload for months.
A premium model may be worth it for architecture decisions, but wasteful for formatting, summaries, and simple lookups.
A compute backend may be faster this week, cheaper next week, or less reliable under a specific workload.

Cerver turns AI choice from opinion into live evidence: run the same intent across brains and runtimes, compare the answers, then route future work to the current winner.

intent: "Review this auth migration and tell us the safest implementation path."
CLI X
Good summary, misses rollback risk and test coverage gaps.
fast · $
CLI Y
Finds the session-cookie edge case and proposes a safer migration order.
best · $$
Model A
Strong reasoning, higher cost, worth reserving for risky turns.
deep · $$$
Model B
Cheap and clean for summaries, release notes, and routine support replies.
cheap · $

One real bug, two ways to handle it.

A user closes the tab while a tool call is running. The next message Anthropic sees has an orphan tool_use, and every subsequent turn 400s until you reconcile the transcript. Here's what the fix actually looks like.

Without Cerver ~ 25 lines · catch + parse + repair + retry
try {
  await client.messages.create({
    model: "claude-sonnet-4-5",
    messages, tools,
  });
} catch (e) {
  const msg = e?.error?.error?.message ?? "";
  if (!msg.includes("ids were found without")) throw e;

  // Parse the orphan ids out of Anthropic's error string
  const ids = [...msg.matchAll(/toolu_[A-Za-z0-9_]+/g)]
    .map(m => m[0]);

  // Inject one synthetic tool_result per orphan
  const synthetic = ids.map(id => ({
    role: "user",
    content: [{
      type: "tool_result",
      tool_use_id: id,
      content: "aborted",
      is_error: true,
    }],
  }));

  // Retry against a now-valid transcript
  messages = [...messages, ...synthetic];
  await client.messages.create({
    model: "claude-sonnet-4-5", messages, tools,
  });
}
With Cerver 0 lines · already done server-side
// You don't write this code. At all.
// Cerver detects the orphan on its side,
// flushes synthetic tool_results, and your
// next /run-llm call just works.

await fetch(`/v2/sessions/${id}/run-llm`, {
  method: "POST",
  body: JSON.stringify({ input: "continue" }),
});

Same mechanism for sandbox lifecycle, prompt caching, transcript persistence, and per-session billing — all of which you'd otherwise also write yourself.

Already running production at Kompany.dev

Where Cerver fits.

A few categories near us. Most of these solve a different problem — included so you can rule us out fast if we're not the fit.

Sessions persisted
Compute provisioned
Provider-agnostic
Free at hobby scale
Cerver
Build it yourself
your DB
your code
your problem
LangChain / LlamaIndex
library, not service
Helicone / Portkey
request log
routing
E2B / Vercel Sandbox direct
single-vendor
free tier
Same work · different compute

Run the same task on a different box.

A/B Cloudflare against Vercel against your laptop. Same transcript, same agent state — only the runtime changes.

// Mid-session: switch the compute under it
POST /v2/sessions/:id/compute
{ "compute": { "provider": "vercel" } }

// Later: swap to Cloudflare. Transcript untouched.
POST /v2/sessions/:id/compute
{ "compute": { "provider": "cloudflare" } }
Same intent · different intelligence

Run the same prompt on a different brain.

Send the same input to Claude, GPT-5, or Gemini without rewriting your prompt or your client. Compare the answers in your dashboard.

// Same session. Try Claude.
POST /v2/sessions/:id/run-llm
{ "model": "claude-sonnet-4-5",
  "input": "summarize this PR" }

// Same input. Try GPT-5.
POST /v2/sessions/:id/run-llm
{ "model": "gpt-5",
  "input": "summarize this PR" }

Not every AI turn deserves the expensive path.

Most orgs pay premium model prices for everything because their app has no cheap way to tell easy work from hard work. Cerver makes that decision explicit: classify the turn, choose the right model and compute, keep the same session.

Easy work should be cheap. Routing, extraction, classification, formatting, and cacheable answers should not hit your strongest model or most expensive compute.
Hard work still gets the best. Planning, debugging, refactors, risky decisions, and long-context reasoning can automatically move to the smarter model and stronger sandbox.
The transcript does not move. The user stays in one session while Cerver swaps model, CLI, or compute underneath. You reduce spend without rebuilding your product flow.

The pitch for high-spend teams: stop buying one intelligence tier for every turn. Let the system decide how much intelligence each turn actually needs, then show the savings per session.

Classify the turn

First ask: how hard is this?

Cerver can treat each turn as easy, normal, or hard before choosing the path. That decision becomes part of the session record.

POST /gateway/recommend
	{ "workload": "classification",
	  "policy": { "mode": "cheapest" } }
Route the spend

Then use the cheapest good-enough path.

A cheap model on local compute for simple work. A balanced model on Vercel for normal work. A premium model and persistent sandbox only when needed.

POST /v2/sessions/:id/run-llm
	{ "model": "small-fast-model",
	  "input": "summarize this ticket" }
Prove the delta

Show what you avoided paying.

Every session can show the chosen route, estimated cost, and the baseline cost if every turn had used your premium default.

GET /v2/sessions/:id/metrics
	{ "cost_estimate_usd": "0.18",
	  "provider": "vercel" }

This is how AI cost becomes engineerable.

A typical AI product run is a mix of cheap, normal, and hard turns. If every turn goes through the premium path, your bill is priced like the hardest moment. If Cerver routes each turn, your bill follows the actual work.

Model
Compute
Relative cost
Why
Easy turn
classify · route · extract
small / fast model
your laptop
$ — pennies
structured output, low risk, no big context
Normal turn
edit · summarize · review
balanced model
Vercel sandbox
$$ — middle
balanced quality, ephemeral compute is fine
Hard turn
plan · refactor · debug
premium reasoning model
E2B + persistent FS
$$$ — premium
quality matters, state needs to survive

A 30-turn session with 2 hard turns should not be priced like 30 hard turns. Same transcript. Same agent. Different routing.

Half a cent per session.

A session is one create + any number of runs + close. Roughly: ~30 turns of chat, or one agent run that spawns and closes a sandbox. You see the count in your dashboard.

First 200 are on us. After that, top up $49+ in credits and draw down at half a cent per session minimum — slightly more if your session uses a bigger model. Bring your own compute keys (Vercel, Cloudflare, E2B) and your own model keys (Claude, GPT, Gemini). You always see the itemised bill.

Top up
$49
Minimum credit purchase. ~10,000 sessions in the tank.
  • Drop $49+ of credits in once, run sessions against it
  • Same keys, same API, same dashboard
  • Itemised by provider, by session
  • More for Opus / long context — you see the math
$0.005 per session minimum, drawn from credits
Top up — $49
Teams
Talk to us
When your CISO asks where the data lives.
  • Everything above
  • SSO & multi-user roles
  • Audit logs, retention windows
  • Federated secrets (Infisical / Vault)
  • Savings-backed pilot pricing
  • Dedicated support & SLA
Contact