Think of it as the plumbing every AI product needs: sessions that keep their state across turns, reconnects, model swaps, compute swaps, and updates. We made the infra simple, transparent, reliable, and inexpensive, so humans and agents can understand it without accidentally breaking the foundation.
curl -fsSL https://cerver.ai/skill | bash
Most platforms ship one of the layers and lock the rest. Cerver gives you all four — model, tools, compute, billing — and lets you swap any of them without touching the session above.
Four phases. About two minutes start to finish. You watch.
Hits POST /v2/auth/login with your email. You get an API key immediately.
"Want to run on your Vercel? Paste a token. Or skip and use ours." That's it.
Says hi · swaps compute mid-session · runs one intent through multiple brains and compares quality, speed, and cost.
Token counts and dollar estimate. Offers cerver upgrade if you want a permanent key.
Between an intent and its execution, every layer is swappable except the one that remembers.
One bill. One log. One API. Zoom out — same stack, same Cerver.
Three shapes of pain we hear most often. If one sounds like you, the demo curl up top is one paste away.
Persistent sessions per user. Streaming. Prompt caching that actually caches. Per-user usage on every turn so billing reconciles with one query. Stop maintaining a messages table.
Sandbox is a tool the agent calls. It requests, runs, releases — alone. Fan out across eight boxes mid-conversation without writing a worker pool. Same session before and after.
Models, CLIs, and compute backends keep changing. Run the same intent through multiple runners and compare quality, speed, cost, and reliability before you standardize.
The best model, CLI, and compute layer changes constantly. Most teams do not have time to follow every shift, so important work quietly runs through whatever people already prefer.
Cerver turns AI choice from opinion into live evidence: run the same intent across brains and runtimes, compare the answers, then route future work to the current winner.
A user closes the tab while a tool call is running. The next message Anthropic sees has an orphan tool_use, and every subsequent turn 400s until you reconcile the transcript. Here's what the fix actually looks like.
try { await client.messages.create({ model: "claude-sonnet-4-5", messages, tools, }); } catch (e) { const msg = e?.error?.error?.message ?? ""; if (!msg.includes("ids were found without")) throw e; // Parse the orphan ids out of Anthropic's error string const ids = [...msg.matchAll(/toolu_[A-Za-z0-9_]+/g)] .map(m => m[0]); // Inject one synthetic tool_result per orphan const synthetic = ids.map(id => ({ role: "user", content: [{ type: "tool_result", tool_use_id: id, content: "aborted", is_error: true, }], })); // Retry against a now-valid transcript messages = [...messages, ...synthetic]; await client.messages.create({ model: "claude-sonnet-4-5", messages, tools, }); }
// You don't write this code. At all. // Cerver detects the orphan on its side, // flushes synthetic tool_results, and your // next /run-llm call just works. await fetch(`/v2/sessions/${id}/run-llm`, { method: "POST", body: JSON.stringify({ input: "continue" }), });
Same mechanism for sandbox lifecycle, prompt caching, transcript persistence, and per-session billing — all of which you'd otherwise also write yourself.
A few categories near us. Most of these solve a different problem — included so you can rule us out fast if we're not the fit.
A/B Cloudflare against Vercel against your laptop. Same transcript, same agent state — only the runtime changes.
// Mid-session: switch the compute under it POST /v2/sessions/:id/compute { "compute": { "provider": "vercel" } } // Later: swap to Cloudflare. Transcript untouched. POST /v2/sessions/:id/compute { "compute": { "provider": "cloudflare" } }
Send the same input to Claude, GPT-5, or Gemini without rewriting your prompt or your client. Compare the answers in your dashboard.
// Same session. Try Claude. POST /v2/sessions/:id/run-llm { "model": "claude-sonnet-4-5", "input": "summarize this PR" } // Same input. Try GPT-5. POST /v2/sessions/:id/run-llm { "model": "gpt-5", "input": "summarize this PR" }
Most orgs pay premium model prices for everything because their app has no cheap way to tell easy work from hard work. Cerver makes that decision explicit: classify the turn, choose the right model and compute, keep the same session.
The pitch for high-spend teams: stop buying one intelligence tier for every turn. Let the system decide how much intelligence each turn actually needs, then show the savings per session.
Cerver can treat each turn as easy, normal, or hard before choosing the path. That decision becomes part of the session record.
POST /gateway/recommend
{ "workload": "classification",
"policy": { "mode": "cheapest" } }
A cheap model on local compute for simple work. A balanced model on Vercel for normal work. A premium model and persistent sandbox only when needed.
POST /v2/sessions/:id/run-llm
{ "model": "small-fast-model",
"input": "summarize this ticket" }
Every session can show the chosen route, estimated cost, and the baseline cost if every turn had used your premium default.
GET /v2/sessions/:id/metrics
{ "cost_estimate_usd": "0.18",
"provider": "vercel" }
A typical AI product run is a mix of cheap, normal, and hard turns. If every turn goes through the premium path, your bill is priced like the hardest moment. If Cerver routes each turn, your bill follows the actual work.
A 30-turn session with 2 hard turns should not be priced like 30 hard turns. Same transcript. Same agent. Different routing.
A session is one create + any number of runs + close. Roughly: ~30 turns of chat, or one agent run that spawns and closes a sandbox. You see the count in your dashboard.
First 200 are on us. After that, top up $49+ in credits and draw down at half a cent per session minimum — slightly more if your session uses a bigger model. Bring your own compute keys (Vercel, Cloudflare, E2B) and your own model keys (Claude, GPT, Gemini). You always see the itemised bill.