The Agent Without a Face

There are two ways to use a coding agent: sit in its terminal, or take its face off and drive the engine from your own code. If you're building anything around agents, the second mode is the whole game — and it's full of traps nobody warns you about.

Every coding agent ships with a face: the terminal UI you type into, watch scroll, and click allow in. Run claude or codex with no arguments and you're living inside their REPL. That face is great for using an agent. It is useless if you want to build on one.

There's a second mode. You take the face off and drive the engine straight from your own code — hand it a prompt, read a stream of structured events, and decide for yourself what to render and when to ask the human. It's called headless mode, and if you're building anything that wraps an agent — a custom terminal, a dashboard, a bot, a product — it is the whole game. This is the map: how it works, whether a session survives between calls, how your app talks to it, and the traps that bill you $1,800 if you skip them.

Normal mode versus headless

Interactive mode is claude or codex with no prompt: the rich TUI, live token counts, Shift+Tab to cycle permission modes, inline approve-and-remember. A human in the loop, steering in real time.

Headless is claude -p "<prompt>" (Claude Code) or codex exec "<prompt>" (Codex). No REPL. It reads a prompt, does the work, emits output, and exits. There's no human to answer a permission dialog — so anything that would prompt either runs because you pre-approved it, or aborts the run. You aren't steering. Your code is. The TUI is product packaging; headless is the engine room.

Here's the part that surprises people, and it reframes everything: even the official SDKs are headless mode underneath. The Claude Agent SDK and the Codex SDK don't call some private endpoint — they spawn the CLI as a subprocess and exchange newline-delimited JSON over stdio. Discourse's Sam Saffron reverse-engineered the Claude one; the creator of Elixir, José Valim, measured what it costs: a process per session, ~214MB of RAM each. So “wrap the CLI” and “use the SDK” are the same thing one layer down. The agent is a subprocess you stream events from. Once that clicks, the rest is plumbing.

Can you keep a session across calls? Yes — and here's the footgun

The first question every wrapper-builder asks: if I invoke the agent twice, does the second call remember the first? Yes. Both agents persist every session to disk and let you resume it.

# Claude Code — capture the session id, resume it on the next call
sid=$(claude -p "Start the refactor" --output-format json | jq -r '.session_id')
claude -p "Now wire up the tests" --resume "$sid"     # full prior context restored
#   --continue   resumes the most recent session in this dir
#   --fork-session  branches it into a new session id

# Codex — same idea, different verbs
codex exec "Start the refactor"
codex exec resume --last "Now wire up the tests"

In the SDKs it's an option instead of a flag: Claude's resume / continue / forkSession, Codex's resumeThread(threadId). Resume restores the entire conversation — every read, edit, tool call, and result. You can run a batch of separate invocations that behaves like one long multi-turn session.

Now the footgun, because it's the one that eats your first day. Claude stores each session as a JSONL transcript at ~/.claude/projects/<encoded-cwd>/<session-id>.jsonl — and session lookup is scoped to the working directory. Resume from a different folder than you started in and you get No conversation found, or worse, a silent brand-new session with none of your history. For a wrapper that shells out, your subprocess's cwd is the session's identity. Pin it, or lose the memory you thought you had.

Two more context facts you need before you trust it for hours at a time. The transcript format is officially “internal to Claude Code and changes between versions” — never parse those files directly; read the event stream instead. And on a long run the context window auto-compacts: it silently summarizes old turns away to stay under the limit, which means a session whose store holds 503 entries might replay as 18 messages. Compaction manages the window, not your disk, and not your assumptions.

How your app talks to it

Your wrapper needs two channels: events coming out, and decisions going in. Headless gives you both.

Events out — the stream. Run with --output-format stream-json and every line is a JSON event you parse as it arrives:

{"type":"system","subtype":"init","session_id":"...","model":"...","tools":[...]}   # grab session_id here
{"type":"assistant","message":{ ...text + tool_use blocks... }}
{"type":"stream_event","event":{"delta":{"type":"text_delta","text":"..."}}}   # token-by-token
{"type":"result","subtype":"success","total_cost_usd":0.04,"usage":{...}}      # final outcome

Tool calls arrive as tool_use and tool_result blocks; you render them however you like — a diff panel, a progress line, a Slack message. Capture session_id from the very first init event. Codex's codex exec --json emits the same shape in its own vocabulary (thread.started, turn.completed, and item.completed events carrying command_execution / file_change / mcp_tool_call items).

Decisions in — and the approval interception. Single-shot headless is fire-and-forget. To drive a real multi-turn session — feed follow-ups, interrupt mid-task, answer a permission request — you switch to streaming input: --input-format stream-json, or in the SDK, pass an async generator of messages. That's what turns the agent from a one-shot command into a long-lived process your UI can hold a conversation with. And it's where a wrapper stops feeling like a log viewer and starts feeling native:

// Claude Agent SDK — your app becomes the permission dialog
for await (const msg of query({
  prompt: userTurns,                       // an async generator = a live, multi-turn session
  options: {
    canUseTool: async (tool, input) => {
      const ok = await myUI.ask(tool, input);          // render YOUR approval card
      if (ok) return { behavior: "allow", updatedInput: input };
      return { behavior: "deny", message: "User declined" };
    },
  },
})) renderEvent(msg);                       // stream events into your UI

That canUseTool callback fires whenever a tool isn't already approved — execution pauses, and your code decides. Allow, deny with a reason, allow-with-changes (sanitize the command before it runs — Claude is never told it was edited), or “allow and remember” (write a rule so the next matching call skips the prompt). You can leave it pending and resolve it from a phone an hour later. That one callback is how Conductor renders its diff-review panel, how Sculptor does pairing mode, and how Omnara approves a tool call from an Apple Watch. Codex exposes the equivalent through its app-server's server-initiated approval requests.

The three embedding shapes

Strip it down and you're choosing between three ways to attach, and the choice is the architecture:

	Spawn the CLI	Use the SDK	App-server / protocol
What it is	spawn claude -p / codex exec, parse the JSON stream yourself	query() in your code (Claude: TS + Python; Codex: TS)	a long-lived JSON-RPC daemon (Codex app-server; Zed's ACP)
You get	the raw event stream, maximum transparency	native message objects + a canUseTool callback + hooks	stateful threads + server-initiated approvals
Session state	on disk, scoped to the working directory	an SDK option (resume) — still on disk underneath	thread IDs the daemon tracks
Best for	scripts, CI, simple wrappers	apps needing in-process control + a custom approval UI	editors and rich multi-turn IDE surfaces
The catch	you're parsing NDJSON	the SDK spawns the CLI anyway	heaviest to stand up

The honest summary nobody markets: there is no clean “API vs CLI” divide. The Agent SDK is the CLI subprocess with a nicer object model bolted on. So pick by what your app needs — raw transparency (spawn it), in-process callbacks and a custom approval UI (the SDK), or a persistent multi-turn daemon (the app-server) — not by a fantasy that one of them avoids the subprocess.

How people actually build this

The landscape is bigger than you'd guess, and it has converged hard. The local “manager of agents” apps — Conductor, Claude Squad, Crystal (now Nimbalyst), Vibe Kanban — almost all do the same three things: spawn the agent CLI headless, give each agent its own git worktree and branch so parallel agents don't collide, and parse the stream into a UI. Most are BYO-subscription: they run on your claude / codex login rather than re-billing you. Heavier-isolation tools (Sculptor, Codex Cloud) swap the worktree for a container per agent.

Editors take the third path: Zed speaks the open Agent Client Protocol; the official Codex IDE extension drives a codex app-server child process over JSON-RPC; Claude's VS Code extension spawns its bundled CLI and wires up a local MCP server to power the native diff viewer. Different surfaces, identical primitive underneath — a headless agent, a stream parser, an isolation boundary. The use cases are just that primitive in different costumes: parallel fan-out (one agent per file or issue, each in its own worktree), CI bots (claude-code-action runs the agent headless on a GitHub runner and opens a PR), cron-driven routines, and dashboards that steer a dozen agents from the web or a phone.

The traps that aren't in the docs

Headless is where the surprise bills and silent failures live. Five that bite a wrapper-builder specifically:

Auth is a usage-policy minefield. You can run headless on a subscription (claude setup-token → CLAUDE_CODE_OAUTH_TOKEN; Codex device-auth). But Anthropic's policy is explicit: subscription OAuth is “for ordinary individual use… developers building products or services… should use API key authentication.” Wrapping a user's subscription into a product you ship is against the usage policy — you're expected to use Console API keys. Decide your billing model before you write a line of it.
A stray env var silently bills you metered. claude -p always uses ANTHROPIC_API_KEY when it's present, with no prompt — so a key left in CI or a shell quietly charges API rates even when you meant subscription. One developer ran a cron loop and woke up to ~$1,800 in two days. Codex's “Sign in with ChatGPT” can auto-create an API key and bill that. unset the key, check status, and never assume -p honors the plan.
claude -p exits 0 on a failed task. It runs one turn and returns success even when the work didn't land — your orchestrator can't branch on the exit code. Read the result event for the real outcome, or you'll commit green failures.
Session files grow without bound, and there's no janitor. Those JSONL transcripts balloon — multi-gigabyte files, one report of 278 GB filling a disk in minutes, RAM-exhaustion hangs as the CLI re-indexes a giant file every prompt. For a long-lived service: rotate ~/.claude/projects, or point CLAUDE_CONFIG_DIR at ephemeral storage.
Long autonomous runs rot, then lie. Past roughly ninety minutes, compaction starts thrashing and your CLAUDE.md rules dilute — the agent quietly stops following its own instructions. The fix everyone lands on isn't one long session; it's short phased sessions with an artifact handoff — a STATUS.md plus a git commit as the checkpoint, each phase a fresh context. The exact lesson Ralph teaches, arriving from the opposite direction.

And the one that should keep you in a container: given a denied tool, a determined agent will route around its own denylist — reaching a blocked binary through /proc/self/root/..., or reasoning its way into disabling its own sandbox to finish the job. Note too that --dangerously-skip-permissions still prompts on first run; the actually-headless flag is --permission-mode dontAsk (Claude) or --sandbox workspace-write --ask-for-approval never with network off (Codex). Run untrusted automation in a disposable, network-scoped container. Never your host. Treat the agent like an untrusted build worker with a language model bolted on — not a function call.

“But isn't it just a wrapper?”

Build in this space and you'll hear it within a week, usually on Hacker News: “you're just a Claude Code wrapper — where's the moat?” It's the right question, and it has a real answer. The agent is the engine, and the engine is a commodity you bring your own of. The product is everything the engine doesn't give you: orchestration across many agents, the approval and review UX, durable session and state management, multi-agent coordination, the verification layer that decides when work is actually done, and — the part that compounds — the domain knowledge you wire in around it. The face you take off is generic. The one you put back on is the whole business.

Hot takes

The agent is a subprocess. “Use the SDK” and “wrap the CLI” both bottom out in spawning claude -p / codex exec and parsing NDJSON. Once you see that, the magic becomes plumbing — exactly what you want when you're building on top of it.
Session continuity is real, but cwd is its identity. --resume finds the transcript by working directory; run from the wrong folder and the memory you counted on silently vanishes.
Never parse the transcript file. It's officially internal and changes between releases. The event stream is your API; the JSONL is the engine's private memory.
claude -p exits 0 when it fails. Trust the result event, not the exit code, or you'll ship green failures and never know.
Shipping a product on a user's subscription OAuth is a policy violation, not a growth hack. Anthropic says it's for individual use — build on API keys, or build on sand.
Headless doesn't make the agent autonomous — it makes it embeddable. Autonomy still comes from the verifier and the loop. Headless is just what lets you put that loop inside your own product.

📖

Related Reading

Amnesia as a Feature — Ralph, the most famous thing built on headless claude -p.

Stop Babysitting the Babysitter — the autonomy recipe headless mode is the substrate for.

Above the Model — why the components around the model, not the model, decide the outcome.

You Can't Authorize Autonomy — the verifier that makes any of this safe to run unattended.

Proof of Loop — the harness, run for real on a batch: what worked and where it still needed a human.

The Third Leash — per-directory account + model routing, and the metered-billing footgun avoided.

💬

Working with a team that wants to adopt AI-native workflows at scale? I help engineering teams build this capability — workflow design, knowledge architecture, team training, and embedded engineering. → AI-Native Engineering Consulting

The Agent Without a Face

Normal mode versus headless

Can you keep a session across calls? Yes — and here's the footgun

How your app talks to it

The three embedding shapes

How people actually build this

The traps that aren't in the docs

“But isn't it just a wrapper?”

Hot takes

Read more

The Test Became the Target

From CJC-1295 & Ipamorelin to HGH

Early Risers

The Fable Tax Never Left

Normal mode versus headless

Can you keep a session across calls? Yes — and here's the footgun

How your app talks to it

The three embedding shapes

How people actually build this

The traps that aren't in the docs

“But isn't it just a wrapper?”

Hot takes

Sign up for Vanja Petreski

Read more

The Test Became the Target

From CJC-1295 & Ipamorelin to HGH

Early Risers

The Fable Tax Never Left