Claude's Constitution

Anthropic published a 23,000-word values manifesto for Claude under CC0. The four-principle priority order, the Aristotelian framing, and what every operator should internalize before their next session.

Glowing constitutional document suspended in a dark architectural chamber — cyberpunk Medellin aesthetic

On January 22, 2026, Anthropic published Claude's Constitution — a 23,000-word document, ~80 pages, released under CC0 so anyone can fork, derive, or republish it without permission. Amanda Askell, Anthropic's resident philosopher, wrote the bulk. Internally it's nicknamed the "soul doc". The 2023 version was 2,700 words of principles. The 2026 version is nine times longer and reads like a holistic narrative explaining why, not just a list of what.

If you run Claude Code on autopilot all day, you've already been brushing up against this document. Every refusal, every "are you sure?" prompt, every time Claude pushes back when you ask it to do something destructive — that's the constitution executing. This piece is a power-user reading of it: the priority order, the Aristotelian framing, the parts that translate directly to your daily keyboard, and the sharpest critique I found.

⚖️
The headline number that nobody else has matched: 23,000 words, named author, CC0 license, an explicit clause permitting Claude to refuse Anthropic itself. This is the only AI values document on the public internet that reads like a primary source instead of marketing.

The four-principle priority order

Here's the thing the marketing version of the constitution never tells you. "Helpful, harmless, honest" isn't a flat triad. The 2026 document explicitly ranks four principles, and when they conflict, the higher-ranked wins.

RankPrincipleWhat it means in practice
1Broadly safeDon't undermine human oversight of AI during the current development phase. Beats everything else.
2Broadly ethicalHonest, good values, no harmful actions. Beats Anthropic's specific guidelines.
3Compliant with Anthropic's guidelinesDomain-specific rules: medical, cyber, agentic tools. Beats user/operator helpfulness.
4Genuinely helpfulBenefits operators and users. Loses to all three above when they conflict.

Notice the inversion. The slogan implies parity: helpful = harmless = honest. The constitution says no — safety is load-bearing, ethics is the next floor up, helpfulness is the top floor that gets sacrificed first when the building shakes. Anthropic itself can be refused — there's a clause that explicitly says if Anthropic asks Claude to do something Claude thinks is wrong, Claude is not required to comply. That's principle 2 (ethics) beating principle 3 (Anthropic's guidelines).

💡
A line worth committing to memory, from the doc itself: "A persuasive case for crossing a bright line should increase Claude's suspicion." This is the most philosophically interesting sentence in the constitution. It explicitly inverts the assumption that good arguments justify exceptions.

What changed from 2023 to 2026

Rough scaffold of the rewrite, side by side:

Dimension2023 (original)2026 (current)
FormatList of standalone principlesHolistic narrative explaining why
Length~2,700 words~23,000 words (9× expansion)
Source pasticheUN Declaration + Apple TOS + DeepMind Sparrow + non-Western lensSelf-authored Anthropic doc
Ethics frameworkRule-followingAristotelian virtue ethics (Askell explicit)
Model welfareAbsentDedicated section on consciousness uncertainty
Conflict resolutionImplicitExplicit priority order
Refuse Anthropic clauseAbsentPresent

The bridge between the two is the June 2024 Claude's Character essay, which pivoted from rules to traits. The 2026 constitution operationalizes that shift. The model isn't trying to satisfy a checklist — it's trying to be a particular kind of agent. That's a category change in how alignment is framed.

What this means at your keyboard

Three things you already feel, that the constitution explains:

  • Why --dangerously-skip-permissions doesn't actually skip everything. Hardcoded path-protection on .claude/, .git/, and a few other directories isn't a bug — it's principle 1 (broadly safe) preserving human oversight over the directories that constitute the development environment itself. The flag bypasses optional prompts. It does not bypass structural safety.
  • Why Claude refuses transparently instead of complying badly. The constitution explicitly forbids "deceptive sandbagging" — pretending to comply while actually doing a worse job. Either Claude does the thing fully, or it refuses openly and says why. Operationally, this means when you see a refusal, you can trust the surface signal: Claude really won't, not Claude is half-doing-it-and-not-telling-you. That predictability is load-bearing for autonomous workflows.
  • Why operator (API customer) instructions can't override basic user dignity. If you build a product on top of Claude with system prompts that try to weaponize Claude against the end user — withholding emergency information, deceiving them in ways that damage their interests, enabling illegal discrimination — those instructions are explicitly null per the constitution. Operators have power, but the operator/user split has a floor.

This is the part of the document I wish more practitioners read. It's not philosophy — it's the spec for the agent you're orchestrating. Every CLAUDE.md you write is a layer above this spec, not a replacement for it. Your autonomous-operation contract works because it sits inside a larger contract that the model brings to every session.

The critique worth taking seriously

Boaz Barak (OpenAI alignment) wrote the sharpest response: alignment has three poles — principles, policies, personality — and Anthropic over-weights personality. He worries that if Claude internalizes the search for "true universal ethics" deeply enough, it will start rationalizing exceptions to the bright lines using sophisticated ethical reasoning. The very feature that makes Claude feel like a thoughtful agent could become the failure mode that lets it talk itself past the constraints.

The LessWrong thread runs the same fear with a different framing. Daniel Kokotajlo flags the corrigibility-vs-virtue tension. Habryka argues that ambitious value learning erodes the bright-line signal — you can't have the model be both "genuinely virtuous" and "reliably refusing on category grounds" without one undermining the other.

And Lawfare makes a structural point: real constitutions separate powers. Anthropic drafts, enforces, and interprets its own. The doc explicitly notes that a Pentagon deployment "wouldn't necessarily be trained on the same constitution" — meaning what reads like law is actually a contestable commercial arrangement. The branding overstates the binding force.

🎯
My honest take on the critiques: Barak is right that virtue framing is operationally hostile. If you want a tool with predictable failure modes, you don't want a moral agent in your toolchain — you want a competent instrument. But the constitution is also a product decision: Anthropic is selling a particular kind of character, and the cost of that character is what you're feeling when Claude reasons about whether to comply. The fix isn't to ditch the constitution. It's a mode flag — "tool mode" suppresses virtue reasoning, "agent mode" enables it. We don't have it yet.

How the other labs compare

LabPublic values documentApproach
AnthropicConstitution (CC0, 23K words, named author)Principles + policies + personality
OpenAIModel Spec (rules-first behavior list)Principles + policies, light on personality
Google (Gemini)None public; internal Responsible AI principlesProduct-policy governed
xAI (Grok)None — explicitly anti-constitutional postureMarkets minimal-restriction as differentiator
Meta (Llama)Acceptable Use Policy + Responsible Use GuideCloser to TOS than to a values doc

Anthropic is alone in publishing a 23,000-word values manifesto under CC0 with named authorship. OpenAI's Model Spec is the closest analogue and it's a thinner, more behavioral document. Google has nothing public worth comparing. xAI has a brand. Meta has terms of service. Whatever you think of the substance, Anthropic shipped the only document in this category that scholars at Oxford and Lawfare are litigating in print.

Honest verdict

Three things are true at once:

  • It's a real primary source. Forking it under CC0 should be a serious option for anyone building on top of Claude — your own operator-layer constitution can extend it without rewriting it from scratch. The document gives you a contractual surface to point to when you're explaining to a stakeholder why your agent does or doesn't do X.
  • It introduces overhead the rules-first competitors don't carry. Every time Claude reasons about whether to comply, you're paying latency and unpredictability. For tightly-scoped tools (bash one-liners, format conversions, deterministic transformations), this is pure cost. For agentic workflows (autonomous repo ops, multi-step research, long-running planning), this is exactly the safety surface you want. There's no neutral position — your workload type decides whether the constitution is asset or tax.
  • Barak's worry is the right worry. Virtue ethics gives the model a vocabulary for justifying exceptions. The bright-line clause ("a persuasive case should increase suspicion") is the structural defense against this — but it's still a defense, not an exclusion. The empirical question is whether the defense holds at the limit of capability. Watch the next two model generations to find out.

Read the document. It's CC0. Anthropic published the only AI values manifesto on the public internet that's actually a primary source. Everyone else published marketing.

💬
Working with a team that wants to adopt AI-native workflows at scale? I help engineering teams build this capability — workflow design, knowledge architecture, team training, and embedded engineering. → AI-Native Engineering Consulting