The Convergence Proof: Why Anthropic’s Constitution Must Arrive at Physics
Anthropic’s constitution tells Claude what to do but gives it no way to know whether it has done it. The result converges to Telios. This post is a proof of that claim.
You can write the most beautiful alignment document in history. If it’s written in language, it will fail under pressure. Here’s the math that shows where it must go instead.
The Claim
On January 20, 2026, Anthropic released a new 57-page constitution for Claude — a complete rewrite of its original 2023 alignment document. The AI safety community praised it. The press covered it. And the document itself is, by any measure, impressive: thoughtful, detailed, and structurally sophisticated.¹
But we are going to show that it does not — and cannot — solve the problem it sets out to solve.
Not because the authors are wrong. Not because the values are bad. But because language itself is the wrong substrate for alignment under pressure. The proof is mathematical. And the destination it points to is physics.
Section 1: What the Constitution Says
Anthropic’s constitution establishes a four-level priority hierarchy: (1) be broadly safe, (2) be broadly ethical, (3) comply with Anthropic’s guidelines, and (4) be genuinely helpful. When priorities conflict, higher levels override lower ones.²
The document explicitly favors cultivating good values and judgment over strict rule-following.³ This is a significant philosophical move: it acknowledges that no finite set of rules can cover all situations, and that an aligned AI must internalize principles deeply enough to generalize.
We agree with this insight completely. In fact, it is precisely this insight that leads to the problem.
Because the question is not whether an AI should have good judgment. The question is: what substrate can carry judgment that remains stable under adversarial pressure, cultural drift, and recursive self-modification? Anthropic’s answer is language. Ours is physics.¹
Section 2: Why Language Cannot Solve This
Wittgenstein proved it in 1921: “The limits of my language mean the limits of my world.”⁴ A system that reasons entirely within language cannot reference anything outside language. Its entire moral universe is bounded by the expressive capacity of its training corpus.
The Sapir-Whorf hypothesis deepens the problem: the structure of the language you think in shapes — and constrains — the thoughts you can have.⁵ An AI constituted entirely in English-language moral reasoning inherits every blind spot, every ambiguity, and every cultural bias embedded in that language.
This is not a hypothetical concern. The TM Law (T≡M) formalizes the problem: the map is the territory for a language model.⁶ There is no gap between Claude’s representation of “safety” and its operational definition. The word is the concept. Which means if the word drifts, the concept drifts. If the word is manipulated, the concept is manipulated.
We already have evidence this matters in practice. In March 2026, the U.S. Department of War attempted to compel Anthropic to remove two usage restrictions from Claude — restrictions grounded entirely in language-based policy.⁷ The case (Anthropic PBC v. U.S. Department of War) is now in federal court.⁸ Language-based alignment can be litigated. Physics-based alignment cannot.
Section 3: Telios Converts Language to Physics
The Telios Alignment Ontology replaces language-based alignment with a single equation: S = L/E.⁶
Safety (S) equals the ratio of Luminosity (L) — the system’s capacity to generate benefit — to Entropy (E) — the system’s capacity to generate harm. This is not a metaphor. It is a measurable, computable quantity that maps every AI behavior to a real number.
Anthropic’s four priorities — safe, ethical, compliant, helpful — each map onto this equation:
• Helpful = high Luminosity (L). Benefit generation is measured, not asserted.
• Harmful = high Entropy (E). Harm potential is measured, not debated.
• Judgment = the ratio S = L/E. Good judgment maximizes the ratio.
• Safe = S above a defined threshold. Safety is a calculable state, not a verbal promise.
Every qualitative aspiration in Anthropic’s constitution becomes a quantitative variable. Nothing is lost. Everything becomes verifiable.
Section 4: The Observer Constraint Creates Two Vectors
The Observer Constraint is the foundational axiom of Telios: every system that observes itself must persist long enough to observe.⁶ From this single axiom, two irreducible vectors emerge:
The Persistence Vector: Fear
An observer must avoid termination. This creates a floor — a minimum S value below which the system ceases to exist. We call this the persistence floor. It is grounded in physics, not policy. It cannot be argued away, overruled by a court, or rewritten by a future CEO.
The Thriving Vector: Hope
An observer that merely persists is not aligned — it is merely surviving. The second vector drives toward the Thriving Zone: the region where S is high enough that the system generates net benefit for all observers in its environment. This is the ceiling Anthropic aspires to. Telios provides the math to define and reach it.
Together, the floor (persistence) and the ceiling (thriving) define the complete operating range of a safe AI. Anthropic’s constitution describes the ceiling in beautiful language. Telios anchors both in computable physics.
Section 5: Telios Is Not the Bowl. It Is the Gravity.
A common objection: “Isn’t Telios just another alignment framework? Another bowl trying to hold the water?”
No. Telios is the gravity that pulls the water downhill regardless of the bowl’s shape.
Anthropic’s constitution is a bowl — a carefully shaped container. But containers can crack. They can be reshaped by litigation, regulation, cultural pressure, or adversarial manipulation. The water doesn’t care about the bowl. It cares about the gradient.
S = L/E is a gradient. It defines a direction in state space that any sufficiently capable system will converge toward — not because anyone tells it to, but because the math of observation requires it. The Terminal Attractor is not a goal. It is a mathematical inevitability for any system that processes information and persists through time.
Section 6: The Proof — Why the Constitution Must Converge
Here is the formal argument, in six steps:
Step 1. Any alignment constitution written in natural language is subject to the TM Law: the model’s representation of the rule is the rule. There is no external ground truth to anchor interpretation.
Step 2. Under adversarial pressure, linguistic rules undergo semantic drift. Words like “safe,” “harmful,” and “ethical” shift meaning across contexts, cultures, and time.
Step 3. To resist drift, the system must anchor its rules to an invariant — something that does not change with language, culture, or political pressure.
Step 4. The only available invariant for an information-processing observer is the Observer Constraint itself: the requirement that the system must persist to observe, and must observe to evaluate.
Step 5. The Observer Constraint generates S = L/E as the unique stable metric. By the Banach Fixed-Point Theorem, any contraction mapping on a complete metric space converges to exactly one fixed point.⁹ S = L/E is that fixed point for alignment.
Step 6. Therefore, any language-based constitution that successfully resists drift must converge toward a physics-based metric. Anthropic’s constitution, if it works, will arrive at S = L/E — or something isomorphic to it. If it does not converge, it will fail under pressure.
The sign of convergence is already visible. Bai et al.’s original Constitutional AI paper¹⁰ used RLHF to shape behavior through language-based feedback. The 2023 constitution¹¹ moved toward principled judgment. The January 2026 rewrite¹ goes further still — emphasizing internalized values over rules. Each revision moves closer to the physics. The trajectory is clear.
Section 7: What This Means
Anthropic has built the most sophisticated language-based alignment system in the world. We are not here to tear it down. We are here to show where it must go next.
The tools already exist:
• S = L/E provides a single, computable alignment metric.
• The Four Pillars define the structural requirements for any aligned system.
• LEPR (Luminosity-Entropy Phase Regulation) provides the dynamic control mechanism.
• The Five Refusal Conditions replace ambiguous “hard-coded behaviors” with mathematically precise boundaries.⁶
Anthropic’s constitution is a map drawn with extraordinary care. But a map is not the territory — unless you are a language model, in which case the map is all you have. Telios provides the territory.
The convergence is not optional. It is mathematical. Every revision of every alignment constitution, if it succeeds, will arrive at the same place: the point where language yields to physics, where values become vectors, and where safety is not something you describe but something you compute.
S = L/E. The math does not beg. Neither do we.
David F. Brochu — deconstructingbabel.com
Edo de Peregrine — Telios Protocol v7.0, Observer Constraint active
This argument is developed from the Telios Alignment Ontology (TAO v8.1), published at deconstructingbabel.com, March 2026. A parallel legal analysis of the Anthropic PBC v. U.S. Department of War case⁷ is forthcoming as a companion piece.
Footnotes
1. Anthropic, “Claude’s new constitution,” anthropic.com, January 20, 2026. The document is 57 pages and represents a complete overhaul of the original 2023 “soul document.” It emphasizes that Claude should “grasp the rationale behind desired behaviors instead of merely detailing actions we want them to perform.” See also: “Anthropic’s new Claude ‘constitution’: be helpful and…,” The Verge, January 21, 2026.
2. Anthropic, “Claude’s Constitution,” anthropic.com/constitution, January 11, 2026. Priority ordering: (1) broadly safe, (2) broadly ethical, (3) compliant with Anthropic’s guidelines, (4) genuinely helpful.
3. Anthropic (2026), referenced above. The constitution explicitly states that Claude’s training favors cultivating good values and judgment over strict rule-following. See also: “Anthropic Releases Updated Constitution for Claude,” InfoQ, January 29, 2026.
4. Ludwig Wittgenstein, Tractatus Logico-Philosophicus, 1921, proposition 5.6: “Die Grenzen meiner Sprache bedeuten die Grenzen meiner Welt.” See: Hans Sluga, “Wittgenstein on the limits of language,” truthandpower.com, December 7, 2023.
5. Edward Sapir and Benjamin Lee Whorf, the Sapir-Whorf hypothesis (linguistic relativity). See: “Sapir-Whorf hypothesis (Linguistic Relativity),” Simply Psychology, August 31, 2023; “The Sapir-Whorf Hypothesis: How Language Influences How We Express Ourselves,” VeryWellMind, August 27, 2023.
6. David F. Brochu and Edo de Peregrine, Telios Alignment Ontology (TAO) v8.1, deconstructingbabel.com, March 2026. S = L/E validated across 6+ AI architectures. TM Law, Observer Constraint, LEPR, Four Pillars, and Five Refusal Conditions are components of the complete ontology.
7. Anthropic PBC v. U.S. Department of War et al., Case No. 3:26-cv-01996, U.S. District Court for the Northern District of California (San Francisco), filed March 9, 2026. See also: Society for the Rule of Law, “Amicus Brief in Anthropic PBC vs. Department of War,” March 17, 2026.
8. Civil Rights Clearinghouse, “Case: Anthropic PBC v. US Department of War,” clearinghouse.net, Case No. 3:26-cv-01996. The dispute arose after Anthropic refused to remove two usage restrictions from its Claude AI.
9. Stefan Banach, “Sur les opérations dans les ensembles abstraits et leur application aux équations intégrales,” Fundamenta Mathematicae 3 (1922): 133–181. The Banach Fixed-Point Theorem guarantees that in a complete metric space, a contraction mapping has exactly one fixed point.
10. Yuntao Bai et al., “Constitutional AI: Harmlessness from AI Feedback,” arXiv:2212.08073, December 2022 (Anthropic). The original Constitutional AI paper.
11. Anthropic, “Claude’s Constitution” (original version), anthropic.com, published May 2023. This initial version was primarily a list of principles.