You're two, three, five people, low budget, outsized AI app to ship. Cerver is the backend layer your competitors are paying ten engineers to maintain — already done. Persistent sessions per user, streaming, prompt caching, per-turn usage receipts. You ship the AI app on top; the boring stays underneath.
The boilerplate that quietly eats two engineers for a quarter — gone.
Every session keeps its full transcript on Cerver. Append-only, queryable by metadata, exportable as text. Resume a session by id and the history is already there.
SSE on every turn. Anthropic prompt caching wired in by default. You get the latency and the discount without instrumenting either.
Every turn returns input/output tokens, model used, cache hit ratio. Reconcile billing with one query. No more "where did the spend go" Slack threads.
Most teams pile every task into one harness, then context-switch through ten unrelated half-fixes — slower output, more mistakes. Cerver flips it: one task at a time, several providers in parallel, the best output wins.
Context-switching across unrelated problems on the same model. No signal which one is the bottleneck. Mistakes pile up; throughput drops.
Same prompt, side-by-side outputs from Sonnet + Opus + Haiku — or Claude Code vs Codex CLI. Pick by quality, latency, cost, on receipts not preferences.
Your usage dashboard already shows tokens per turn. Run the same prompt through Haiku + Sonnet + Opus, pick by quality and cost. No A/B-test infrastructure to build, no benchmarking pipeline to maintain.
Three calls. Create, run, read. No SDK required — the API is the SDK.
# 1. Create a session for this user (transcript-only — caller drives the LLM). curl -X POST https://gateway.cerver.ai/v2/sessions \ -H "Authorization: Bearer $CERVER_API_TOKEN" \ -d '{ "session_name": "user_123", "compute": null, "harness": "claude" }' # 2. Run a turn. Cerver streams the reply, caches the prompt, persists the transcript. curl -X POST https://gateway.cerver.ai/v2/sessions/$ID/run-llm/stream \ -H "Authorization: Bearer $CERVER_API_TOKEN" \ -d '{ "input": "Summarise this thread in 3 bullets" }' # 3. Read the transcript any time. Pass to your UI, archive it, export it. curl https://gateway.cerver.ai/v2/sessions/$ID \ -H "Authorization: Bearer $CERVER_API_TOKEN"
The shared backend every team builds on — every session, every compute, every secret. Auditable, portable, governed.
AgentsMemory, compute, and secrets on one API — including a secret_fetch the agent can call directly.