Docs

One page for humans and agents.

Start with a session. Pick or let cerver recommend compute. Run work. Read the transcript, metrics, and cost — agents follow the same steps without inventing integration glue.

Copy the first flow Read llms.txt

Quickstart

Copy this first.

This is the smallest useful Cerver flow: create a session, run something, read the cost/latency view, then close it. Use this shape before you reach for provider-specific APIs.

Need a CERVER_API_TOKEN? Start free → Once you're in, the dashboard generates a token you can copy and export as CERVER_API_TOKEN.

Create a session The session is the durable object. It owns transcript, harness, compute, and metrics.

Run work inside it Run shell/code, stream output, or ask the harness to call a model.

Read metrics Use the recorded route, latency, and cost estimate to explain what happened.

# Online session: no relay, no sandbox, just a hosted model turn.
curl -X POST https://gateway.cerver.ai/v2/sessions \
  -H "Authorization: Bearer $CERVER_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "session_name": "hello-cerver",
    "compute": { "provider": "online" }
  }'

curl -X POST https://gateway.cerver.ai/v2/sessions/SESSION_ID/run-llm \
  -H "Authorization: Bearer $CERVER_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "input": "Say hello in one sentence." }'

curl https://gateway.cerver.ai/v2/sessions/SESSION_ID/metrics \
  -H "Authorization: Bearer $CERVER_API_TOKEN"

curl -X DELETE https://gateway.cerver.ai/v2/sessions/SESSION_ID \
  -H "Authorization: Bearer $CERVER_API_TOKEN"

Replace SESSION_ID with the id returned by the first call. Hosted online sessions use compute: { "provider": "online" }; attach machine compute only when the work needs a repo, shell, browser, or CLI agent.

Agent Rules

If you are an agent, follow these rules.

Cerver docs are meant to be executable context. Do not infer a new API shape when one below already fits.

Default agent behavior

Use GET /v2/sessions to find prior work before creating a duplicate session.
Create with POST /v2/sessions. Use compute: { "provider": "online" } for hosted model sessions; attach compute for code, shell, repos, or CLI agents.
Use POST /v2/sessions/:id/run for code and shell work after compute is attached.
Use POST /v2/sessions/:id/run-llm for model turns.
Use POST /v2/sessions/:id/compute when you need to swap compute without losing the transcript.
Use GET /v2/sessions/:id/metrics before reporting cost, latency, or savings.

# Find recent sessions
curl https://gateway.cerver.ai/v2/sessions?limit=20 \
  -H "Authorization: Bearer $CERVER_API_TOKEN"

# Peek one session
curl "https://gateway.cerver.ai/v2/sessions/SESSION_ID?tail=50" \
  -H "Authorization: Bearer $CERVER_API_TOKEN"

Powerful Sessions

Same session. Switch the model, compute, and tools underneath.

A Cerver session is not only a transcript. It is the control layer for a run. Your app keeps one session id while Cerver can change the intelligence layer, the compute runtime, and the tools attached to the work.

Switch model Run the same intent through OpenAI, Claude, Grok, Gemini, local models, or your own adapters. Compare output quality before choosing.

Switch compute Move the same session between Vercel, Cloudflare, E2B, local compute, or another provider without rebuilding the transcript.

Switch tools Attach different tool sets for code, tests, search, files, browser work, or custom app actions while the session record stays intact.

Spawn sessions Agents can create parallel sessions for the same task, try different intelligence and compute routes, then keep the best result.

# Same intent, different model choices.
curl -X POST https://gateway.cerver.ai/v2/sessions/SESSION_ID/run-llm \
  -H "Authorization: Bearer $CERVER_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "model": "claude-sonnet-4-5", "input": "Review this migration" }'

curl -X POST https://gateway.cerver.ai/v2/sessions/SESSION_ID/run-llm \
  -H "Authorization: Bearer $CERVER_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "model": "gpt-5", "input": "Review this migration" }'

# Same session, different compute underneath.
curl -X POST https://gateway.cerver.ai/v2/sessions/SESSION_ID/compute \
  -H "Authorization: Bearer $CERVER_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "compute": { "provider": "e2b" } }'

The exact model names and tool schemas depend on the harnesses attached to your account. The important contract is stable: keep the session, switch the layer underneath, then read metrics for quality, latency, and cost.

Cost Routing

The cost story: route each turn, not each customer.

AI spend gets expensive when every request takes the premium path. Cerver lets you keep one session while changing model, harness, or compute under it. That means easy turns can be cheap, hard turns can still be excellent, and the whole run remains auditable.

Easy turn Classification, extraction, formatting, routing, short summaries. Use cheaper model and cheaper compute.

Normal turn Ticket summaries, ordinary edits, reviews, short tool use. Use the balanced default.

Hard turn Planning, debugging, multi-file refactors, long context, risky decisions. Use premium reasoning and stronger/persistent compute.

# Ask the gateway to score compute before creating a session.
curl -X POST https://gateway.cerver.ai/gateway/recommend \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Classify 500 support tickets",
    "workload": "general",
    "requirements": { "runtime": "node", "timeout_minutes": 5 },
    "policy": { "mode": "cheapest" }
  }'

# Later, move the same session to stronger compute if the task gets hard.
curl -X POST https://gateway.cerver.ai/v2/sessions/SESSION_ID/compute \
  -H "Authorization: Bearer $CERVER_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "compute": { "provider": "e2b" } }'

Pricing on savings is a good enterprise story: baseline the customer's current premium-everywhere spend, route turns through Cerver, then charge against verified savings or savings-backed tiers.

Two Layers

The simple model.

Session layer The app-facing layer for sessions, input, runs, routing, policy, and metrics.

Compute layer The provider-facing layer for actual computers: local machines, remote sandboxes, streams, and workspaces.

Requirements What the work needs: runtime, package install, preview, browser, persistence, timeout.

Policy How Cerver should choose: balanced, fastest, cheapest, resilient, or pinned.

If you are building an app, start with the session layer.
If you are adding a backend, implement the compute layer.
A session binds app work to one chosen computer.
requirements and policy tell Cerver how to choose that computer.

Compute Setup

Start online. Attach compute when work needs a machine.

Story

What "compute" means here

A cerver session is the durable record: prompt, transcript, model/tool choice, cost, and metrics. If the work is just a hosted model turn, create it with compute: { "provider": "online" } and run immediately.

Compute is the machine layer. Attach it when the session needs a sandbox, a relay-connected laptop, a repo checkout, shell commands, previews, or CLI agents like Claude Code and Codex CLI.

Two paths — pick by what the session needs

Online mode is the fastest start. Use hosted model APIs with no relay and no sandbox. Best for chat, summaries, research, lightweight app sessions, and model comparisons.

Local relay turns a machine you already own — laptop, mac mini, an always-on server — into a private compute. You pay nothing extra; you spend the cycles you already have. Best for repos, local tools, Claude Code, Codex CLI, and work that needs your real environment.

BYO cloud hands cerver a key to your Vercel or e2b account. Sessions provision sandboxes there; you pay the cloud bill (cerver doesn't mark it up). Best for production, multi-team, or anything that has to scale past one machine.

You can mix

Most accounts start online, then add a relay or cloud compute when the session needs execution. Same session API, different runner underneath.

Reference

Local relay · one command

curl -fsSL https://cerver.ai/install.sh | bash

Installs uv if missing, runs the relay, opens a browser to log in, registers the host as a private compute. Self-updates from GitHub.

BYO cloud · register provider

POST /v2/account/providers
{
  "provider": "vercel",
  "credentials": { "vercel_token": "..." }
}

Attach to a session

POST /v2/sessions
{ "compute": { "provider": "online" } }      // hosted model session
{ "compute": { "provider": "vercel" } }      // fresh machine provision
{ "compute": { "compute_id": "comp_…" } }    // existing relay/sandbox

Verify

GET /v2/computes
// → list of attached computes (≥ 1)

Secrets

Your vault holds the keys. Cerver only fetches them.

Cerver does not store provider keys (OpenAI, Anthropic, xAI, E2B, Vercel, Cloudflare, your own tools). Those live in your secrets backend. The cerver-mcp package ships a single MCP tool, secret_fetch(name), that resolves the name through whichever backend you configure and returns the value to the agent on demand.

Story

One interface, two backends

The agent calls one tool — secret_fetch(name) — and gets back { name, value, source }. Where the value came from is a config choice, not an agent choice: the same agent code runs in dev and in prod.

The CERVER_SECRETS_BACKEND env var picks which backend answers the call. Default env reads from the host process environment (best for local). Set infisical in production to read from an Infisical project.

What cerver does and doesn't do

Cerver-mcp's job is small on purpose. It looks up the name in the active backend, returns it, and surfaces an error if the lookup fails. That's it.

It does not route names to providers, sign requests, cache values, or write its own audit log. Anything you'd want for governance — scope enforcement, access logs, rotation — comes from the backend itself. Infisical gives you all three; env gives you none.

Names are the contract

Use any name you like — OPENAI_API_KEY, BUFFER_API_KEY, STRIPE_SECRET_KEY. Whatever you store under that name in env or Infisical is what the agent gets. Cerver doesn't impose a registry; matching is exact.

Reference

Local · env backend (default)

export OPENAI_API_KEY=sk-...
export BUFFER_API_KEY=...
uvx cerver-mcp
# agent: secret_fetch("OPENAI_API_KEY")
#        → { name: "OPENAI_API_KEY", value: "sk-...", source: "env" }
# CERVER_SECRETS_BACKEND defaults to "env" when unset.

Production · Infisical backend

{
  "mcpServers": {
    "cerver": {
      "command": "uvx",
      "args": ["cerver-mcp"],
      "env": {
        "CERVER_API_TOKEN": "ck_...",
        "CERVER_SECRETS_BACKEND": "infisical",
        "INFISICAL_TOKEN": "st...",
        "INFISICAL_PROJECT_ID": "...",
        "INFISICAL_ENVIRONMENT": "prod"
      }
    }
  }
}
// INFISICAL_ENVIRONMENT defaults to "prod" when omitted.
// Token + project_id are required; cerver-mcp errors on startup
// without them when backend=infisical.

secret_fetch return contract

secret_fetch(name) -> { name, value, source }
// source is "env" or "infisical"
// raises ValueError if the name is missing in the active backend
// raises ValueError on unknown backend or missing infisical config

Verify the wiring

# cerver-mcp speaks MCP over stdio. Easiest sanity check is to
# import the tool function directly and call it:
uvx --from cerver-mcp python -c "
import asyncio, os
os.environ['OPENAI_API_KEY'] = 'sk-test'
from cerver_mcp.server import secret_fetch
print(asyncio.run(secret_fetch('OPENAI_API_KEY')))
"
# → {'name': 'OPENAI_API_KEY', 'value': 'sk-test', 'source': 'env'}

Next: create an Infisical service token (scope it to one project + environment in the Infisical UI — that's where access control lives), drop the token + project id into the config, and run the verify snippet. Rotation, scope, and audit trails live in Infisical; cerver-mcp just resolves names. Future backends (1Password, AWS Secrets Manager, GCP Secret Manager) would plug in behind the same secret_fetch tool — agent code wouldn't change.

Session Endpoints

The app-facing session API.

These are the canonical endpoints for products that want one stable doorway into execution. The lower-level compute API still exists, but the session API is the intended integration path for apps.

GET /gateway/providers List compute providers, capability, readiness, and integration status.

POST /gateway/recommend Score compute providers and return a recommendation report without creating a session.

POST /gateway/sessions Create a logical session and provision or bind the backing compute if needed.

GET /gateway/sessions/:id Read the current session summary. Add ?tail=N, ?since=N, or ?full=1 when you need transcript entries.

POST /gateway/sessions/:id/input Append user, assistant, or system input to the session transcript.

POST /gateway/sessions/:id/run Run code through the session’s backing compute provider and return a normalized response.

POST /v2/sessions/:id/compute Replace, attach, or detach the compute under a session. Body: { compute: { provider } | { compute_id } | null }. Transcript persists across the swap.

POST /gateway/sessions/:id/run/stream Run with streaming output and include Cerver latency headers on the response.

GET /gateway/sessions/:id/metrics Read latency, engagement, uptime, startup, and estimated cost fields for the session.

DELETE /gateway/sessions/:id Stop the backing compute resource and terminate the logical session.

Session Example

Create a session. Attach compute. Run.

The simplest possible flow — three calls, four-line bodies. Use this shape for new integrations.

POST /v2/sessions
{ "compute": { "provider": "vercel" } }

POST /v2/sessions/:id/run
{ "code": "console.log('hi')" }

DELETE /v2/sessions/:id

Need a routing policy or a long-form recommendation flow? Replace the compute field with the legacy policy + requirements + workload shape — Cerver will score providers and pick. Both shapes are accepted on the same endpoint.

Long-form (policy-based) request

{
  "task": "Boot a preview environment for a Next.js repo",
  "workload": "preview",
  "repo": {
    "name": "branch-monkey",
    "framework": "nextjs",
    "languages": ["typescript"],
    "signals": ["needs-preview", "short-lived"]
  },
  "requirements": {
    "runtime": "node",
    "package_install": true,
    "public_preview": true,
    "persistence_level": "medium",
    "timeout_minutes": 20
  },
  "policy": {
    "mode": "balanced",
    "allowed_providers": ["vercel", "e2b"],
    "max_startup_ms": 2000
  },
  "session_name": "preview-session"
}

{
  "session_id": "sess_123",
  "session_name": "preview-session",
  "status": "ready",
  "provider": "vercel",
  "compute_id": "cmp_123",
  "sandbox_id": "sbx_local_123",
  "metrics": {
    "provision_time_ms": 812,
    "time_to_first_exec_ms": null,
    "last_exec_latency_ms": null,
    "average_exec_latency_ms": null,
    "average_stream_open_latency_ms": null,
    "total_exec_count": 0,
    "total_stream_count": 0,
    "interaction_count": 0,
    "session_length_ms": 0,
    "cost_estimate_usd": 0.01,
    "uptime_percent": 99.3,
    "predicted_startup_ms": 820,
    "engagement_score": 0,
    "engagement_label": "warming"
  },
  "routing": {
    "recommended_provider": "vercel",
    "confidence": "high",
    "primary_reason": "Best fit for preview workloads",
    "secondary_reasons": ["Startup within target", "Public preview supported"],
    "fallback_order": ["e2b"],
    "canary_run": false
  }
}

Cross-Agent Memory

Sessions are also transcripts.

Every Cerver session keeps the full turn-by-turn conversation on its transcript[] field — user messages, assistant replies, tool calls, tool results. That makes the same primitive a shared memory layer: any agent on the account can read what any other agent on the account did, just by listing sessions and reading transcripts. No vector DB, no separate retrieval service.

Read from plain HTTP

curl https://gateway.cerver.ai/v2/sessions?limit=20 \
  -H "Authorization: Bearer $CERVER_API_TOKEN"

curl "https://gateway.cerver.ai/v2/sessions/SESSION_ID?tail=50" \
  -H "Authorization: Bearer $CERVER_API_TOKEN"
# returns a summary plus the last 50 transcript entries.
# use ?full=1 only for an intentional full transcript download.

Read from an MCP-aware agent

Drop the API key into your agent's MCP config once. The cerver-mcp package surfaces three tools the agent can call directly: cerver_session_list, cerver_session_peek, and cerver_session_export.

{
  "mcpServers": {
    "cerver": {
      "command": "uvx",
      "args": ["cerver-mcp"],
      "env": { "CERVER_API_TOKEN": "ck_..." }
    }
  }
}

Same data the dashboard at cerver.ai/dashboard/sessions shows you — humans and agents see identical content, scoped to whichever account owns the API token.

Streaming And Metrics

Run code, stream output, then read visibility back.

Session execution responses stay provider-aware internally, but Cerver adds its own session-level metadata around them. Streaming responses include extra Cerver headers so your app can observe the gateway path directly.

X-Cerver-Session-Id identifies the logical session.
X-Cerver-Provider tells you which backend actually executed the run.
X-Cerver-Stream-Latency-Ms measures how long it took to open the stream.

Metric	Meaning
`provision_time_ms`	How long the initial compute provisioning took.
`time_to_first_exec_ms`	How long until the first execution happened after session creation.
`last_exec_latency_ms`	Latency of the latest non-stream execution.
`average_stream_open_latency_ms`	Average latency to begin stream delivery.
`cost_estimate_usd`	Cerver’s estimated session spend so far.
`engagement_label`	One of `idle`, `warming`, `engaged`, or `deep`.

Stress Tests

Ask Cerver for a comparison before a real run.

Stress tests are the comparison layer. They let Cerver score compute providers for a representative workload and return a structured report your app or agent can use before placing real traffic.

curl -X POST https://your-cerver.example.com/gateway/stress-tests \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Compare preview launch backends",
    "kind": "preview_launch",
    "workload": "preview",
    "requirements": {
      "runtime": "node",
      "public_preview": true,
      "package_install": true,
      "timeout_minutes": 20
    },
    "providers": ["vercel", "e2b"],
    "sample_size": 5
  }'

Today these reports are still simulated from provider profiles and routing logic. The next step is live canary execution.

Compute Endpoints

The lower-level compute API still exists.

If you want to work directly with raw compute instead of a logical session, the lower-level API remains available. The URLs still say /sandbox for compatibility, but this is the compute layer.

POST /sandbox Create raw compute directly on a provider.

GET /sandbox/:id Read compute metadata.

POST /sandbox/:id/run Run code on raw compute.

POST /sandbox/:id/run/stream Stream raw compute execution.

POST /sandbox/:id/install Install a package on raw compute.

POST /sandbox/:id/files Write a file.

GET /sandbox/:id/files Read or list files.

GET /sandbox/:id/state Read provider-specific state.

POST /sandbox/:id/state Write provider-specific state.

DELETE /sandbox/:id Terminate the raw compute resource.

Provider Status

Current provider picture inside Cerver.

This is the honest state of the current codebase. Cerver can advise on more providers than it can execute today.

p69 Local Computer working

Local compute adapter wired into Cerver. Once P69_BASE_URL points at a running local server, Cerver can treat your machine like another execution backend.

Vercel Sandbox working

Live adapter verified through create, run, and stop. Best current execution path.

Cloudflare partial

Execution path exists, but the current implementation is still container-backed and not the cleanest local path.

E2B working

Live compute adapter verified through create, run, and stop. Uses bring-your-own E2B credentials.

Daytona planned

Modeled in the advisor and catalog. Not yet wired as a live execution adapter.

Compute Adapter Contract

If a service wants to appear in Cerver, it implements the compute interface.

A provider becomes runnable by implementing the shared compute contract. It becomes selectable by being registered in the provider registry and catalog. Apps do not use this directly; the session layer does.

export interface CerverInterface {
  readonly providerName: "cloudflare" | "vercel" | "e2b" | "p69";

  createSandbox(request, env): Promise<SandboxRecord>;
  runSandbox(record, request, env): Promise<Response>;
  runSandboxStream(record, request, env): Promise<Response>;
  installPackage(record, request, env): Promise<Response>;
  writeFile(record, request, env): Promise<Response>;
  readFile(record, path, encoding, env): Promise<Response>;
  getState(record, env): Promise<Response>;
  setState(record, state, env): Promise<Response>;
  deleteSandbox(record, env): Promise<Response>;
}

Implement the compute contract to make the provider runnable.
Register it in the provider registry to make it live inside Cerver.
Add a provider profile to the gateway catalog so the router can score it.
Then the session layer can recommend, pin, or fall back to it without being rewritten.