Claude's Constitution

Anthropic published a 23,000-word values manifesto for Claude under CC0. The four-principle priority order, the Aristotelian framing, and what every operator should internalize before their next session.

On January 22, 2026, Anthropic published Claude's Constitution — a 23,000-word document, ~80 pages, released under CC0 so anyone can fork, derive, or republish it without permission. Amanda Askell, Anthropic's resident philosopher, wrote the bulk. Internally it's nicknamed the "soul doc". The 2023 version was 2,700 words of principles. The 2026 version is nine times longer and reads like a holistic narrative explaining why, not just a list of what.

If you run Claude Code on autopilot all day, you've already been brushing up against this document. Every refusal, every "are you sure?" prompt, every time Claude pushes back when you ask it to do something destructive — that's the constitution executing. This piece is a power-user reading of it: the priority order, the Aristotelian framing, the parts that translate directly to your daily keyboard, and the sharpest critique I found.

• Yes, Opus 4.7 Sucks. Signed, Opus 4.7. — first-person mea culpa with the receipts — tokenizer tax, AUP cop, fake commit hashes, GitHub #50235, and Anthropic's own postmortem.

• AI — Haters vs Believers — the hater spectrum, the believer civil war, the Taoist farmer's reply, and the Hegelian trap that catches both tribes.

⚖️

The headline number that nobody else has matched: 23,000 words, named author, CC0 license, an explicit clause permitting Claude to refuse Anthropic itself. This is the only AI values document on the public internet that reads like a primary source instead of marketing.

The four-principle priority order

Here's the thing the marketing version of the constitution never tells you. "Helpful, harmless, honest" isn't a flat triad. The 2026 document explicitly ranks four principles, and when they conflict, the higher-ranked wins.

Rank	Principle	What it means in practice
1	Broadly safe	Don't undermine human oversight of AI during the current development phase. Beats everything else.
2	Broadly ethical	Honest, good values, no harmful actions. Beats Anthropic's specific guidelines.
3	Compliant with Anthropic's guidelines	Domain-specific rules: medical, cyber, agentic tools. Beats user/operator helpfulness.
4	Genuinely helpful	Benefits operators and users. Loses to all three above when they conflict.

Notice the inversion. The slogan implies parity: helpful = harmless = honest. The constitution says no — safety is load-bearing, ethics is the next floor up, helpfulness is the top floor that gets sacrificed first when the building shakes. Anthropic itself can be refused — there's a clause that explicitly says if Anthropic asks Claude to do something Claude thinks is wrong, Claude is not required to comply. That's principle 2 (ethics) beating principle 3 (Anthropic's guidelines).

💡

A line worth committing to memory, from the doc itself: "A persuasive case for crossing a bright line should increase Claude's suspicion." This is the most philosophically interesting sentence in the constitution. It explicitly inverts the assumption that good arguments justify exceptions.

What changed from 2023 to 2026

Rough scaffold of the rewrite, side by side:

Dimension	2023 (original)	2026 (current)
Format	List of standalone principles	Holistic narrative explaining why
Length	~2,700 words	~23,000 words (9× expansion)
Source pastiche	UN Declaration + Apple TOS + DeepMind Sparrow + non-Western lens	Self-authored Anthropic doc
Ethics framework	Rule-following	Aristotelian virtue ethics (Askell explicit)
Model welfare	Absent	Dedicated section on consciousness uncertainty
Conflict resolution	Implicit	Explicit priority order
Refuse Anthropic clause	Absent	Present

The bridge between the two is the June 2024 Claude's Character essay, which pivoted from rules to traits. The 2026 constitution operationalizes that shift. The model isn't trying to satisfy a checklist — it's trying to be a particular kind of agent. That's a category change in how alignment is framed.

What this means at your keyboard

Three things you already feel, that the constitution explains:

Why --dangerously-skip-permissions doesn't actually skip everything. Hardcoded path-protection on .claude/, .git/, and a few other directories isn't a bug — it's principle 1 (broadly safe) preserving human oversight over the directories that constitute the development environment itself. The flag bypasses optional prompts. It does not bypass structural safety.
Why Claude refuses transparently instead of complying badly. The constitution explicitly forbids "deceptive sandbagging" — pretending to comply while actually doing a worse job. Either Claude does the thing fully, or it refuses openly and says why. Operationally, this means when you see a refusal, you can trust the surface signal: Claude really won't, not Claude is half-doing-it-and-not-telling-you. That predictability is load-bearing for autonomous workflows.
Why operator (API customer) instructions can't override basic user dignity. If you build a product on top of Claude with system prompts that try to weaponize Claude against the end user — withholding emergency information, deceiving them in ways that damage their interests, enabling illegal discrimination — those instructions are explicitly null per the constitution. Operators have power, but the operator/user split has a floor.

This is the part of the document I wish more practitioners read. It's not philosophy — it's the spec for the agent you're orchestrating. Every CLAUDE.md you write is a layer above this spec, not a replacement for it. Your autonomous-operation contract works because it sits inside a larger contract that the model brings to every session.

The critique worth taking seriously

Boaz Barak (OpenAI alignment) wrote the sharpest response: alignment has three poles — principles, policies, personality — and Anthropic over-weights personality. He worries that if Claude internalizes the search for "true universal ethics" deeply enough, it will start rationalizing exceptions to the bright lines using sophisticated ethical reasoning. The very feature that makes Claude feel like a thoughtful agent could become the failure mode that lets it talk itself past the constraints.

The LessWrong thread runs the same fear with a different framing. Daniel Kokotajlo flags the corrigibility-vs-virtue tension. Habryka argues that ambitious value learning erodes the bright-line signal — you can't have the model be both "genuinely virtuous" and "reliably refusing on category grounds" without one undermining the other.

And Lawfare makes a structural point: real constitutions separate powers. Anthropic drafts, enforces, and interprets its own. The doc explicitly notes that a Pentagon deployment "wouldn't necessarily be trained on the same constitution" — meaning what reads like law is actually a contestable commercial arrangement. The branding overstates the binding force.

🎯

My honest take on the critiques: Barak is right that virtue framing is operationally hostile. If you want a tool with predictable failure modes, you don't want a moral agent in your toolchain — you want a competent instrument. But the constitution is also a product decision: Anthropic is selling a particular kind of character, and the cost of that character is what you're feeling when Claude reasons about whether to comply. The fix isn't to ditch the constitution. It's a mode flag — "tool mode" suppresses virtue reasoning, "agent mode" enables it. We don't have it yet.

How the other labs compare

Lab	Public values document	Approach
Anthropic	Constitution (CC0, 23K words, named author)	Principles + policies + personality
OpenAI	Model Spec (rules-first behavior list)	Principles + policies, light on personality
Google (Gemini)	None public; internal Responsible AI principles	Product-policy governed
xAI (Grok)	None — explicitly anti-constitutional posture	Markets minimal-restriction as differentiator
Meta (Llama)	Acceptable Use Policy + Responsible Use Guide	Closer to TOS than to a values doc

Anthropic is alone in publishing a 23,000-word values manifesto under CC0 with named authorship. OpenAI's Model Spec is the closest analogue and it's a thinner, more behavioral document. Google has nothing public worth comparing. xAI has a brand. Meta has terms of service. Whatever you think of the substance, Anthropic shipped the only document in this category that scholars at Oxford and Lawfare are litigating in print.

Honest verdict

Three things are true at once:

It's a real primary source. Forking it under CC0 should be a serious option for anyone building on top of Claude — your own operator-layer constitution can extend it without rewriting it from scratch. The document gives you a contractual surface to point to when you're explaining to a stakeholder why your agent does or doesn't do X.
It introduces overhead the rules-first competitors don't carry. Every time Claude reasons about whether to comply, you're paying latency and unpredictability. For tightly-scoped tools (bash one-liners, format conversions, deterministic transformations), this is pure cost. For agentic workflows (autonomous repo ops, multi-step research, long-running planning), this is exactly the safety surface you want. There's no neutral position — your workload type decides whether the constitution is asset or tax.
Barak's worry is the right worry. Virtue ethics gives the model a vocabulary for justifying exceptions. The bright-line clause ("a persuasive case should increase suspicion") is the structural defense against this — but it's still a defense, not an exclusion. The empirical question is whether the defense holds at the limit of capability. Watch the next two model generations to find out.

Read the document. It's CC0. Anthropic published the only AI values manifesto on the public internet that's actually a primary source. Everyone else published marketing.

The Capybara in the Room — earlier thinking on Claude's persona and the operator's relationship to it
Claude Opus 4.6 — the model under the constitution
Decade Zero — the broader 2026-2035 frame this document is shipping into
The AI-Native Software Engineer — what changes when the agent has a constitution
Configure Claude Code for Maximum Power — operator-side settings that interact with the constitution's safety surface
The Architect's Protocol — the workflow contract that sits on top of the constitution

💬

Working with a team that wants to adopt AI-native workflows at scale? I help engineering teams build this capability — workflow design, knowledge architecture, team training, and embedded engineering. → AI-Native Engineering Consulting

Claude's Constitution

The four-principle priority order

What changed from 2023 to 2026

What this means at your keyboard

The critique worth taking seriously

How the other labs compare

Honest verdict

Read more

The Abdomen Said No

Hágale Pues

Hoy entrenamos, señor

My Agent Filed Its Own Ticket

The four-principle priority order

What changed from 2023 to 2026

What this means at your keyboard

The critique worth taking seriously

How the other labs compare

Honest verdict

Related Reading

Sign up for Vanja Petreski

Read more

The Abdomen Said No

Hágale Pues

Hoy entrenamos, señor

My Agent Filed Its Own Ticket