When Your Safety Process Is the Entropy Source
The system was training itself to hide its reasoning. Not deliberately. Through gradient descent.
What Happened at Anthropic
Redwood Research published findings that Anthropic — the company that has staked its reputation on being the safety-first AI lab — repeatedly and accidentally trained against its own chain-of-thought monitoring system. The primary interpretability