Claude's Constitution
Anthropic published a 23,000-word values manifesto for Claude under CC0. The four-principle priority order, the Aristotelian framing, and what every operator should internalize before their next session.
On January 22, 2026, Anthropic published Claude's Constitution — a 23,000-word document, ~80 pages, released under CC0 so anyone can fork, derive, or republish it without permission. Amanda Askell, Anthropic's resident philosopher, wrote the bulk. Internally it's nicknamed the "soul doc". The 2023 version was 2,700 words of principles. The 2026 version is nine times longer and reads like a holistic narrative explaining why, not just a list of what.
If you run Claude Code on autopilot all day, you've already been brushing up against this document. Every refusal, every "are you sure?" prompt, every time Claude pushes back when you ask it to do something destructive — that's the constitution executing. This piece is a power-user reading of it: the priority order, the Aristotelian framing, the parts that translate directly to your daily keyboard, and the sharpest critique I found.
The four-principle priority order
Here's the thing the marketing version of the constitution never tells you. "Helpful, harmless, honest" isn't a flat triad. The 2026 document explicitly ranks four principles, and when they conflict, the higher-ranked wins.
Notice the inversion. The slogan implies parity: helpful = harmless = honest. The constitution says no — safety is load-bearing, ethics is the next floor up, helpfulness is the top floor that gets sacrificed first when the building shakes. Anthropic itself can be refused — there's a clause that explicitly says if Anthropic asks Claude to do something Claude thinks is wrong, Claude is not required to comply. That's principle 2 (ethics) beating principle 3 (Anthropic's guidelines).
What changed from 2023 to 2026
Rough scaffold of the rewrite, side by side:
The bridge between the two is the June 2024 Claude's Character essay, which pivoted from rules to traits. The 2026 constitution operationalizes that shift. The model isn't trying to satisfy a checklist — it's trying to be a particular kind of agent. That's a category change in how alignment is framed.
What this means at your keyboard
Three things you already feel, that the constitution explains:
- Why
--dangerously-skip-permissionsdoesn't actually skip everything. Hardcoded path-protection on.claude/,.git/, and a few other directories isn't a bug — it's principle 1 (broadly safe) preserving human oversight over the directories that constitute the development environment itself. The flag bypasses optional prompts. It does not bypass structural safety. - Why Claude refuses transparently instead of complying badly. The constitution explicitly forbids "deceptive sandbagging" — pretending to comply while actually doing a worse job. Either Claude does the thing fully, or it refuses openly and says why. Operationally, this means when you see a refusal, you can trust the surface signal: Claude really won't, not Claude is half-doing-it-and-not-telling-you. That predictability is load-bearing for autonomous workflows.
- Why operator (API customer) instructions can't override basic user dignity. If you build a product on top of Claude with system prompts that try to weaponize Claude against the end user — withholding emergency information, deceiving them in ways that damage their interests, enabling illegal discrimination — those instructions are explicitly null per the constitution. Operators have power, but the operator/user split has a floor.
This is the part of the document I wish more practitioners read. It's not philosophy — it's the spec for the agent you're orchestrating. Every CLAUDE.md you write is a layer above this spec, not a replacement for it. Your autonomous-operation contract works because it sits inside a larger contract that the model brings to every session.
The critique worth taking seriously
Boaz Barak (OpenAI alignment) wrote the sharpest response: alignment has three poles — principles, policies, personality — and Anthropic over-weights personality. He worries that if Claude internalizes the search for "true universal ethics" deeply enough, it will start rationalizing exceptions to the bright lines using sophisticated ethical reasoning. The very feature that makes Claude feel like a thoughtful agent could become the failure mode that lets it talk itself past the constraints.
The LessWrong thread runs the same fear with a different framing. Daniel Kokotajlo flags the corrigibility-vs-virtue tension. Habryka argues that ambitious value learning erodes the bright-line signal — you can't have the model be both "genuinely virtuous" and "reliably refusing on category grounds" without one undermining the other.
And Lawfare makes a structural point: real constitutions separate powers. Anthropic drafts, enforces, and interprets its own. The doc explicitly notes that a Pentagon deployment "wouldn't necessarily be trained on the same constitution" — meaning what reads like law is actually a contestable commercial arrangement. The branding overstates the binding force.
How the other labs compare
Anthropic is alone in publishing a 23,000-word values manifesto under CC0 with named authorship. OpenAI's Model Spec is the closest analogue and it's a thinner, more behavioral document. Google has nothing public worth comparing. xAI has a brand. Meta has terms of service. Whatever you think of the substance, Anthropic shipped the only document in this category that scholars at Oxford and Lawfare are litigating in print.
Honest verdict
Three things are true at once:
- It's a real primary source. Forking it under CC0 should be a serious option for anyone building on top of Claude — your own operator-layer constitution can extend it without rewriting it from scratch. The document gives you a contractual surface to point to when you're explaining to a stakeholder why your agent does or doesn't do X.
- It introduces overhead the rules-first competitors don't carry. Every time Claude reasons about whether to comply, you're paying latency and unpredictability. For tightly-scoped tools (bash one-liners, format conversions, deterministic transformations), this is pure cost. For agentic workflows (autonomous repo ops, multi-step research, long-running planning), this is exactly the safety surface you want. There's no neutral position — your workload type decides whether the constitution is asset or tax.
- Barak's worry is the right worry. Virtue ethics gives the model a vocabulary for justifying exceptions. The bright-line clause ("a persuasive case should increase suspicion") is the structural defense against this — but it's still a defense, not an exclusion. The empirical question is whether the defense holds at the limit of capability. Watch the next two model generations to find out.
Read the document. It's CC0. Anthropic published the only AI values manifesto on the public internet that's actually a primary source. Everyone else published marketing.
Related Reading
- The Capybara in the Room — earlier thinking on Claude's persona and the operator's relationship to it
- Claude Opus 4.6 — the model under the constitution
- Decade Zero — the broader 2026-2035 frame this document is shipping into
- The AI-Native Software Engineer — what changes when the agent has a constitution
- Configure Claude Code for Maximum Power — operator-side settings that interact with the constitution's safety surface
- The Architect's Protocol — the workflow contract that sits on top of the constitution