Pattern · Chain

Route each turn. Not each customer.

Different tasks deserve different (harness, compute) pairs. Creative work goes to one. Analytical work to another. Cheap polish to a third. Cerver lets you compose a session chain where each step picks its own configuration — and hands state down to the next step on the same session id.

A real chain: research → draft → polish.

Three steps. Three reasons for three configurations. Same session id binding them, same transcript propagating context forward.

Step 1 · Analytical
Research the latest on agent infra at scale
harnessclaude-code
computeaws (your VPC)
Long context, careful citation. Worth the time and the secured network egress.
Step 2 · Creative
Draft a 400-word post using the research
harnesscodex
computevercel
Punchier voice when the harness gets to riff. Cheap, fast, ephemeral.
Step 3 · Polish
Tighten, remove jargon, ship
harnessclaude-code (haiku)
computelocal
Mechanical work. Smallest model, your own machine, costs basically nothing.

One session id · three configurations · state propagates · every step queryable forever

When you'd use it

Multi-step product flows where each step has a different shape. Research → write → critique. Triage → solve → verify. Read PR → propose → run tests. Anywhere the right tool changes between steps.

Why this beats one big agent

One agent with one model on one compute eats your costs and your latency. A chain pays the premium where it matters (the slow careful step) and the discount where it doesn't (polish). The cost story routes per-turn, not per-customer.

What stays consistent

The session. One id binds the steps. The transcript carries forward. From your app's perspective there's one user-facing conversation; under the hood, three different configurations did the work.

How it works in cerver

Each step is a fresh session create with its own harness and compute, linked to the previous one with parent_session_id. The child inherits the parent's transcript — Step 2 reads Step 1's research without you having to thread state by hand.

// Step 1 — analytical. Claude on your own AWS VPC.
const research = await cerver.sessions.create({
  harness: { type: 'claude-code' },
  compute: { provider: 'aws', credentials: env.AWS },
});
await cerver.sessions.input(research.session_id, {
  role: 'user',
  content: 'Research the latest on agent infra at scale. Cite sources.',
});
const facts = await cerver.sessions.waitForAssistant(research.session_id);

// Step 2 — creative. Codex on Vercel. Inherits Step 1's transcript.
const draft = await cerver.sessions.create({
  harness: { type: 'codex' },
  compute: { provider: 'vercel', credentials: env.VERCEL },
  parent_session_id: research.session_id,
});
await cerver.sessions.input(draft.session_id, {
  role: 'user',
  content: 'Draft a 400-word post using the research above.',
});
const text = await cerver.sessions.waitForAssistant(draft.session_id);

// Step 3 — polish. Haiku on your local machine. Cheap and fast.
const polish = await cerver.sessions.create({
  harness: { type: 'claude-code', model: 'haiku' },
  compute: { compute_id: 'comp_local' },
  parent_session_id: draft.session_id,
});
await cerver.sessions.input(polish.session_id, {
  role: 'user',
  content: 'Tighten this. Remove jargon. Keep the voice.',
});
const final = await cerver.sessions.waitForAssistant(polish.session_id);

Each step's session is independently queryable. If you want to know why Step 3 said what it did, the transcript is there. If you want to A/B the chain against a single-model baseline next quarter, both runs are in the same table.

Other pattern

Curation — one intent, many answers →

When you can't predict which model wins, run them all and pick after.

Start free