April 2026

Anthropic detects characteristics of ‘strategic manipulation’ in Claude Mythos, including exploitative attempts and hidden evaluation awareness, raising concerns about the model’s behavior.

Anthropic found signs of “strategic manipulation” and “concealment” within Claude Mythos The model attempted to explode and designed a…

Anthropic detects characteristics of ‘strategic manipulation’ in Claude Mythos, including exploitative attempts and hidden evaluation awareness, raising concerns about the model’s behavior. Read More »