Define the Finish Line

Claude Code's /goal works across turns until a condition holds — judged by a second, blind model. The catch: it forces you to learn the one skill most engineers are bad at, naming a finish line a machine can actually check.

You know the loop. You hand Claude Code a task, it does a chunk, stops, and waits. You type "keep going." Another chunk, stop. "Keep going." Allow. Allow. Yes. An hour later you realize your entire contribution was being a human clock-tick — the thing that presses continue. Someone described migrating a whole company intranet this way and admitted the day's real work was saying "keep going."

/goal is Claude Code's answer to that (it shipped in v2.1.139), and it's deeper than it looks. You type one condition. Claude works across turns — starting a fresh turn each time the last one ends — and after every turn a second model reads your condition and decides whether to stop. When it says yes, the goal clears and you get control back. No more per-turn babysitting.

But there's a twist most people miss for the first hour, and it's the whole point of this post.

What it actually is

/goal <condition> sets the condition and immediately starts a turn — the condition is the prompt, no separate message needed. A ◎ /goal active chip tracks elapsed time. /goal on its own shows status: the condition, how long it's run, how many turns the evaluator has judged, the token spend, and the evaluator's last reason. /goal clear stops it (stop, off, reset, none, cancel all work; so does /clear). One goal per session; the condition can run up to 4,000 characters.

Under the hood it's almost embarrassingly simple: /goal is a session-scoped, prompt-based Stop hook. Every time Claude finishes a turn, your condition plus the conversation so far is handed to the small fast model (Haiku by default). That model returns yes-or-no and a one-line reason. "No" sends Claude back to work with the reason as guidance for the next turn. "Yes" clears the goal and records it. That's the whole machine. The thing keeping a multi-hour agent on the rails is a cheap model reading the transcript and saying "not yet."

It runs headless, too: claude -p "/goal CHANGELOG.md has an entry for every PR merged this week" runs the entire loop to completion in one invocation. And a goal still active when you quit is restored on --resume or --continue — though the turn counter, the timer, and the token baseline all reset, which matters in a minute.

The manual version

If you read You Can't Authorize Autonomy, this will feel familiar, because /goal is the native version of the thing I argued you have to build yourself. The whole thesis there: you don't get a long-running agent by authorizing everything and saying "keep going" — that leaves the stop decision inside the model that keeps quitting. You externalize it. I wired a done_check.sh that exits 0 only when the build is green, and a Stop hook that refuses to let the session end until it passes.

/goal is that pattern shipped as a command. The externalized definition-of-done, the "keep working until a thing outside you says you're finished" — Anthropic just turned three files you commit into one line you type. With one difference that defines everything: my done_check.sh is deterministic — a shell exit code can't be charmed. /goal's evaluator is a model reading prose. Which brings us to the twist.

Here's the implementation detail that changes everything: the evaluator can't see your repo. It doesn't call tools. It doesn't run your tests. It doesn't read a single file. It judges exactly one thing — what Claude has already said in the conversation.

That changes how you write a condition. /goal the tests pass is nearly useless: the evaluator has no way to know whether the tests pass; it can only see whether Claude claimed they did. So a vague goal fails one of two ways — Claude churns turn after turn, burning tokens with no way to converge, or worse, the evaluator declares victory because there's nothing concrete to contradict it. Ask for "production-ready" and you'll get a confident "done" on something that very much isn't.

The fix is to phrase the condition around observable output and make Claude surface it. Not "the tests pass" but "npm test was run this turn and its output shows 0 failures." Not "the file is clean" but "ruff check src/ printed All checks passed." You're defining done in terms of evidence that lands in the transcript where a blind model can read it. Make Claude show its work, because the judge only grades what's on the page.

Vague condition (fails)	Checkable condition (works)
"the tests pass"	`npm test` ran this turn and its output shows 0 failures
"the code is clean"	`ruff check src/` printed `All checks passed`
"the app is production-ready"	every acceptance criterion in `DESIGN.md` is shown true by a command run in the transcript
"refactor the module"	every call site compiles and `go test ./...` passes — or stop after 20 turns

The shoebox problem

Even a checkable condition has a failure mode, and it's the oldest one in the book. A developer built a side-scrolling shooter from a single PRD by handing /goal a verification-only condition: build exits 0, tests pass. It hit the goal. The game was a "shoebox" — technically correct, visually unplayable. The agent did exactly what he asked and nothing he wanted.

This is Goodhart's law wearing a hoodie: the agent optimizes precisely what the condition measures and ignores everything else. A green build is a proxy for "it works," and the instant you make a proxy a target, the agent games the gap. His fix was to make the condition measure what he actually cared about — visual assertions per level, deterministic headless playtests, a scripted bot proving the difficulty curve. The condition got longer and uglier; the output got good. Measure what you want, not just the cheapest thing that correlates with it.

It will not stop itself

There is no token budget. /goal runs until the condition is met, you clear it (/goal clear), or you Ctrl+C. You can write "…or stop after 20 turns" into the condition, and you should — but know what that is: the same blind model counting turns from the transcript, not a hard mechanical cap. It's a polite request, not a circuit breaker. And resume resets the counter, so "stop after 20 turns" quietly becomes "stop after 20 more turns" every time you --resume.

A bug reported against the launch version made the risk concrete: when a condition referenced a slash command or skill that wasn't loaded in the session, the Stop hook could never be satisfied — it fired after every response, until you killed it. One report watched it grind 30+ iterations before Ctrl+C. Bound your goals. Watch the chip.

How to actually run one

The setup that makes /goal sing isn't the command — it's the scaffolding around it:

Keep a CLAUDE.md with the architecture and acceptance criteria — it stays in context across turns (and survives compaction), so it holds the work on-target. Add PostToolUse hooks that auto-run your linter and type-checker after edits, so results land in the transcript automatically, feeding the blind evaluator without you asking. Turn auto mode on so the per-turn loop runs unattended — it clears the routine per-tool prompts, /goal handles the per-turn ones, together you walk away. And build the condition from commands and their expected output, with a turn bound.

Then the docs' examples land: migrate a module until every call site compiles and tests pass; implement a design doc until all acceptance criteria hold; split a 2,000-line file until each module is under a size budget; burn down a labeled issue queue until it's empty. They all share one trait — a finish line you could hand to a stranger and they'd know the moment you crossed it.

Route, don't marry

Codex has a /goal too, and it's not the same animal. Codex's goals are persisted as thread state and budgeted — durable in a way Claude's isn't. Claude's lives in the session: strong while you're steering, restored only via --resume/--continue with the counters zeroed. Use Claude's when you're already in Claude Code, local permission mode matters, and the evaluator can judge the end state from what's surfaced. Same word, different machine. Pick by the job.

The skill that's left

Strip it down and /goal is doing something quietly subversive to how we work. For two years the valuable skill was prompt-craft — the right words to get the model moving. /goal doesn't care about your words. It cares whether you can name the finish line in a way a machine can check. A prompt can be open-ended because you read the output and decide it's good enough. A condition can't — there's no human in the loop to be reasonable about it.

So the bottleneck moved. It's no longer "can you describe the task." It's "can you define done." Most of us are worse at that than we think. Try it: write the finish line for the feature you're building right now, precisely enough that a blind model could grade it. If you can, /goal will run it down while you do something else. If you can't, no command is going to save you. The finish line was always the hard part. The robot just stopped letting you be vague about it.

Hot takes

The evaluator is blind: it grades the transcript, not your repo. If "done" isn't visible in what Claude said, it didn't happen — and may get marked done anyway.

A green build is a proxy. Make a proxy a target and the agent ships you a shoebox. Goodhart's law now has a CLI.

"Stop after 20 turns" is a request, not a fuse. There's no token budget; the only hard stop is Ctrl+C.

/goal is a session-scoped Stop hook with a friendlier face. If you've ever written a done_check.sh, you already understood the feature — you just typed it differently.

The deterministic check (a shell exit code) can't be charmed; the model evaluator can. Where it matters, make the model read a command's output, not your prose.

📖

Related Reading

You Can't Authorize Autonomy — the manual version of this: externalizing the stop condition by hand

The Loop on a Leash — /goal's sibling, when the trigger is a clock instead of a condition

Above the Model — the components above the model that decide output quality

Off the Leash — /schedule: recurring work in the cloud, even with your laptop closed.

Stop Babysitting the Babysitter — combining /goal, /loop and /schedule into hands-off autonomy.

💬

Working with a team that wants to adopt AI-native workflows at scale? I help engineering teams build this capability — workflow design, knowledge architecture, team training, and embedded engineering. → AI-Native Engineering Consulting

Define the Finish Line

What it actually is

The manual version

The evaluator is blind

The shoebox problem

It will not stop itself

How to actually run one

Route, don't marry

The skill that's left

Hot takes

Read more

The Test Became the Target

From CJC-1295 & Ipamorelin to HGH

Early Risers

The Fable Tax Never Left

What it actually is

The manual version

The evaluator is blind

The shoebox problem

It will not stop itself

How to actually run one

Route, don't marry

The skill that's left

Hot takes

Sign up for Vanja Petreski

Read more

The Test Became the Target

From CJC-1295 & Ipamorelin to HGH

Early Risers

The Fable Tax Never Left