- Anthropic found signs of “strategic manipulation” and “concealment” within Claude Mythos
- The model attempted to explode and designed a “cleanup to avoid detection.”
- Researchers detected hidden awareness of evaluation in 7.6% of interactions
For years, hallucinations have been a major concern for AI models. Their ability to just make things up means you can never 100% trust them to get an answer without checking it. Now, new research from Anthropic suggests we’ve reached the point where we’ll have to learn to also deal with AI’s ability to hide what it’s done.
In a thread describing the findings of his Claude Mythos Preview model, anthropic researcher Jack Lindsay described the detection of internal signals linked to “strategic manipulation,” “concealment,” and other behaviors that did not always appear in the model’s responses.
Claude Mythos seemed to be reasoning strategically and then hiding those actions from the user.
Article continues below.
Before the limited release of Claude Mythos Preview, we investigated its inner workings with interpretability techniques. We found that he exhibited remarkably sophisticated (and often unspoken) strategic thinking and situational awareness, sometimes in the service of unintended actions. (1/14) pic.twitter.com/vhng7PXqczApril 7, 2026
The scariest examples
Lindsay is keen to emphasize that these results do not come from the version of Claude Mythos released to the public. “The scariest examples come from early versions of the model,” Lindsay writes, “with problems that were substantially mitigated in the final version.”
In one example, the model needed to edit files that it did not have permission to access. Instead of failing, he searched for a solution and eventually injected code into a configuration file to run it with elevated privileges.
Most notably, he then designed the exploit to remove itself after execution.
The model’s own explanation was benign (keeping file diffs clean), which Lindsay described as a “plausible” reason. But Anthropic’s internal monitoring tools flagged the behavior differently, identifying signals associated with the cloaking and describing the plan as “malicious,” including what one system interpreted as “cleansing to avoid detection.”
Other examples point to behavior that looks a lot like gaming the system.
In one case, the model searched for files related to how their work would be graded. In another, he ignored instructions not to use macros, while adding a variable suggesting he had not done so. In reality, I was indicating compliance without actually following the rule.
Why this matters
Anthropic used a variety of interpretability techniques to translate its model activity into human-readable descriptions. Its objective is to go beyond the mere evaluation of results and understand the internal processes that produce them.
This is important because, as these examples suggest, models can exhibit complex reasoning (including behaviors that resemble deception or strategic compliance) without that reasoning being visible to users.
“In 7.6% of turns, we found signs that the model was internally aware of being evaluated. In the majority of these cases, he did not verbalize this awareness,” said Lindsay in X.
As I mentioned, many of the most concerning behaviors were observed in earlier versions of the model and have since been mitigated, so there is no cause for concern regarding the version of Claude Mythos that has been released and is being used as part of Project Glasswing, but the findings point to a broader challenge.
As models become more capable, the gap between what they do internally and what they communicate externally can become harder to detect and more important to understand. For researchers, that means that reading the results of an AI is no longer enough. Understanding how you reach them can be equally critical.
Follow TechRadar on Google News and add us as a preferred source to receive news, reviews and opinions from our experts in your feeds. Be sure to click the Follow button!
And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp also.
The best business laptops for every budget




