C cerver
gateway protocol v0.3.0

Cerver Agent Protocol

This page describes the current execution gateway surface in api/cerver-api. The field names and envelopes below match the live contract.

Three calls are enough.

Open a session, stream the run, then read the metrics. The gateway owns provider choice and execution visibility.

POST /gateway/sessions
POST /gateway/sessions/:id/run/stream
GET  /gateway/sessions/:id/metrics
One session surface The agent opens one logical session, even if Cerver changes providers underneath.
One stream surface The output comes back through one stream, not through four vendor SDKs.

Create a session

{
  "task": "boot a preview and run a smoke check",
  "workload": "preview",
  "repo": {
    "name": "branch-monkey",
    "framework": "nextjs",
    "languages": ["typescript"]
  },
  "requirements": {
    "runtime": "node",
    "package_install": true,
    "public_preview": true,
    "persistence_level": "medium",
    "timeout_minutes": 15
  },
  "policy": {
    "mode": "balanced",
    "allowed_providers": ["vercel", "cloudflare", "e2b"]
  }
}

Session response

{
  "session_id": "ses_123",
  "session_name": "branch-monkey",
  "status": "ready",
  "provider": "vercel",
  "sandbox_id": "sbx_123",
  "metrics": {
    "provision_time_ms": 820,
    "predicted_startup_ms": 820,
    "cost_estimate_usd": 0.123
  },
  "routing": {
    "recommended_provider": "vercel",
    "confidence": "high",
    "fallback_order": ["cloudflare", "e2b"],
    "canary_run": false
  }
}
Important: POST /gateway/sessions calls the real routing engine with requireExecutable: true. If no executable provider qualifies, the API returns 409 with the recommendation report.

What the agent sends

Agents should describe the work, requirements, and policy. Vendor choice remains a gateway concern.

Task What the agent is trying to do: coding, preview, browser work, long job, or validation.
Requirements Runtime, package install, public preview, browser need, timeout, desktop need, or persistence.
Policy Balanced, fastest, cheapest, resilient, or pinned, with optional provider allowlists and ceilings.
Context Repo hints, stack signals, and why the run matters.
{
  "task": "boot a preview and run a smoke check",
  "workload": "preview",
  "repo": {
    "name": "branch-monkey",
    "framework": "nextjs",
    "languages": ["typescript"],
    "signals": ["public preview expected", "package install likely"]
  },
  "requirements": {
    "runtime": "node",
    "package_install": true,
    "public_preview": true,
    "persistence_level": "medium",
    "timeout_minutes": 15
  },
  "policy": {
    "mode": "balanced",
    "allowed_providers": ["vercel", "cloudflare", "e2b"],
    "max_cost_usd": 0.2,
    "max_startup_ms": 2500
  },
  "metadata": {
    "importance": "high"
  }
}
Use the real field names: the current gateway expects public_preview and timeout_minutes, not ad-hoc names like public_port or max_duration_minutes.

What Cerver can route today

This section reflects the current implementation, not the long-term roadmap.

Cloudflare Integrated, ready, and always executable in the current gateway.
Vercel Sandbox Integrated and executable when Vercel credentials are configured. Otherwise it stays visible as planned.
E2B Modeled in the advisor and scoring engine, but not wired for live execution yet.
Daytona Modeled in the advisor and scoring engine, but not wired for live execution yet.
Practical meaning: recommendation endpoints can score all four providers, but session creation only succeeds when at least one executable provider survives filtering.

How Cerver decides

  • Filter providers that cannot satisfy the hard requirements.
  • Score the remaining providers for cost, speed, reliability, and fit.
  • Run a canary only when confidence is low or the run matters enough to compare.
  • Return one winner, one fallback path, and the reasons behind the choice.

Hard filters

Cerver filters a provider out before scoring if any of these are true:

  • The provider is blocked by allowed_providers.
  • The request needs live execution and the provider is not executable in Cerver yet.
  • The runtime, browser, desktop, preview, timeout, or cost ceilings cannot be met.
  • The provider startup estimate exceeds max_startup_ms.

Routing modes

balanced Capability fit leads, with latency, cost, reliability, persistence, health, and readiness all contributing.
fastest Startup time dominates, but capability fit and readiness still matter.
cheapest Cost dominates, but capability fit and readiness still matter.
resilient Reliability and persistence carry more weight and Cerver always asks for a canary when two executable providers exist.
pinned Acts like balanced scoring, but the pinned provider gets a large score boost if it still passes the hard filters.

Confidence and canary rules

  • Confidence is high when the top executable score is strong and the gap to second place is large.
  • Confidence drops to medium or low as the score or the gap shrinks.
  • Cerver sets canary_run when the top two executable providers are close, or when the workload is long_running.

Session lifecycle and metrics

A gateway session is the canonical record for routing, transcript, events, and metrics, even though execution happens inside a provider sandbox.

provisioning Reserved for sandbox boot or setup.
ready Sandbox exists and the session can accept work.
running Used conceptually while execution is in progress.
idle The last input or execution completed and the session is waiting.
failed Terminal failure state for session management.
terminated The session has been closed or its sandbox is gone.

Metrics tracked today

provision_time_ms How long session creation took including sandbox provisioning.
time_to_first_exec_ms Elapsed time from session creation to the first code execution event.
average_exec_latency_ms Average duration across non-stream execution events.
average_stream_open_latency_ms Average measured latency for stream-open requests.
session_length_ms Derived from the current time and createdAt.
engagement_score Computed from session length, interaction count, and execution count.
Streaming note: POST /gateway/sessions/:id/run/stream adds X-Cerver-Session-Id, X-Cerver-Provider, and X-Cerver-Stream-Latency-Ms response headers.

What comes back to the agent

The response envelope should be executable by an agent and readable by a human without translation.

{
  "request_id": "req_123",
  "task": "boot a preview and run a smoke check",
  "workload": "preview",
  "providers": [
    {
      "name": "vercel",
      "status": "viable",
      "score": 0.862,
      "estimated_cost_usd": 0.123
    }
  ],
  "decision": {
    "recommended_provider": "vercel",
    "confidence": "high",
    "primary_reason": "Vercel Sandbox is the best fit for short preview-style execution right now.",
    "secondary_reasons": [
      "Supports preview-style execution cleanly",
      "Available for live routing through Cerver today"
    ],
    "fallback_order": ["cloudflare"],
    "canary_run": false
  },
  "human_summary": {
    "headline": "Route this task through Vercel Sandbox.",
    "next_action": "Start a session on Vercel Sandbox and keep Cloudflare available."
  }
}

Session metrics response

{
  "session_id": "ses_123",
  "provider": "vercel",
  "status": "idle",
  "routing": {
    "recommended_provider": "vercel",
    "confidence": "high"
  },
  "metrics": {
    "provision_time_ms": 820,
    "time_to_first_exec_ms": 1110,
    "last_exec_latency_ms": 920,
    "average_exec_latency_ms": 980,
    "average_stream_open_latency_ms": 350,
    "total_exec_count": 3,
    "total_stream_count": 1,
    "interaction_count": 4,
    "session_length_ms": 440000,
    "cost_estimate_usd": 0.071,
    "uptime_percent": 99.3,
    "predicted_startup_ms": 820,
    "engagement_score": 0.61,
    "engagement_label": "engaged"
  }
}

Gateway endpoints

POST /gateway/recommend Ask Cerver to pick a provider Use when the agent wants a recommendation before opening a session.
POST /gateway/sessions Open one logical execution session Use when the agent is ready to start work through the gateway.
POST /gateway/sessions/:id/input Store chat or task input Use to append user, assistant, or system messages into the gateway transcript.
POST /gateway/sessions/:id/run Execute without streaming Use when the agent wants a JSON result wrapper with duration and metrics.
POST /gateway/sessions/:id/run/stream Execute and stream output Use when the agent wants one stream regardless of which provider wins.
GET /gateway/sessions/:id/metrics Read provider and latency data Use to explain what happened and decide whether policy should change.
GET /gateway/sessions/:id Read the session record Use to inspect routing, transcript, events, and fresh metrics together.
DELETE /gateway/sessions/:id Terminate the logical session Use to clean up both the gateway record and the backing sandbox.
POST /gateway/stress-tests Compare providers under the same workload Use to benchmark cold starts, streaming, long jobs, and recovery behavior.

When to benchmark instead of guessing

  • When a repo is new and Cerver has little history for it.
  • When the run is expensive or important enough to justify a canary.
  • When providers have changed in speed, health, or cost.
  • When the team wants a repeatable comparison report across vendors.

Kinds supported now

The current gateway understands cold_start, stream_response, package_install, preview_launch, and long_session.

Stress-test request

{
  "kind": "cold_start",
  "workload": "preview",
  "providers": ["vercel", "cloudflare", "e2b"],
  "sample_size": 12,
  "requirements": {
    "runtime": "node",
    "public_preview": true,
    "timeout_minutes": 15
  },
  "policy": {
    "mode": "balanced"
  }
}

Stress-test result

{
  "testId": "test_123",
  "kind": "cold_start",
  "workload": "preview",
  "sampleSize": 12,
  "winner": "vercel",
  "summary": "Vercel Sandbox is the strongest cold start candidate.",
  "report": {
    "headline": "Vercel Sandbox is the strongest cold start candidate.",
    "guidance": "Across 12 simulated runs, Vercel Sandbox shows the best mix of startup time, cost, and availability.",
    "next_policy": "Route preview runs to vercel first, then fall back to cloudflare, e2b."
  }
}
Current implementation: stress tests are simulated from provider baselines and recommendation scores today. They are not yet live multi-provider canaries.