For app developers

Ship AI features as if you had a backend team. Without one.

You're two, three, five people, low budget, outsized AI app to ship. Cerver is the backend layer your competitors are paying ten engineers to maintain — already done. Persistent sessions per user, streaming, prompt caching, per-turn usage receipts. You ship the AI app on top; the boring stays underneath.

What you stop doing.

The boilerplate that quietly eats two engineers for a quarter — gone.

01

No more transcript table

Every session keeps its full transcript on Cerver. Append-only, queryable by metadata, exportable as text. Resume a session by id and the history is already there.

02

Streaming + caching that just work

SSE on every turn. Anthropic prompt caching wired in by default. You get the latency and the discount without instrumenting either.

03

Per-user usage per turn

Every turn returns input/output tokens, model used, cache hit ratio. Reconcile billing with one query. No more "where did the spend go" Slack threads.

One problem. Many runners. Keep the winner.

Most teams pile every task into one harness, then context-switch through ten unrelated half-fixes — slower output, more mistakes. Cerver flips it: one task at a time, several providers in parallel, the best output wins.

From

Ten tasks, one harness.

Context-switching across unrelated problems on the same model. No signal which one is the bottleneck. Mistakes pile up; throughput drops.

To

One task, three runners.

Same prompt, side-by-side outputs from Sonnet + Opus + Haiku — or Claude Code vs Codex CLI. Pick by quality, latency, cost, on receipts not preferences.

switch model mid-session switch compute mid-session spawn sibling sessions

Your usage dashboard already shows tokens per turn. Run the same prompt through Haiku + Sonnet + Opus, pick by quality and cost. No A/B-test infrastructure to build, no benchmarking pipeline to maintain.

The whole integration.

Three calls. Create, run, read. No SDK required — the API is the SDK.

curl · the only loop you need

# 1. Create a session for this user (transcript-only — caller drives the LLM).
curl -X POST https://gateway.cerver.ai/v2/sessions \
  -H "Authorization: Bearer $CERVER_API_TOKEN" \
  -d '{ "session_name": "user_123", "compute": null, "harness": "claude" }'

# 2. Run a turn. Cerver streams the reply, caches the prompt, persists the transcript.
curl -X POST https://gateway.cerver.ai/v2/sessions/$ID/run-llm/stream \
  -H "Authorization: Bearer $CERVER_API_TOKEN" \
  -d '{ "input": "Summarise this thread in 3 bullets" }'

# 3. Read the transcript any time. Pass to your UI, archive it, export it.
curl https://gateway.cerver.ai/v2/sessions/$ID \
  -H "Authorization: Bearer $CERVER_API_TOKEN"

Start free → Read the API