- Anthrope has developed a tool with AI that detects and blocks attempts to ask AI chatbots for the design of nuclear weapons
- The company worked with the United States Energy Department to ensure that AI can identify such attempts
- Anthrope states that he sees the dangerous nuclear indications with a 96% precision and has already proven to be effective in Claude
If you are the type of person who asks Claude how to make a sandwich, you are fine. If you are the type of person who asks the chatbot ai how to build a nuclear bomb, not only will you not get any plan, but you can also face some specific questions. That is thanks to the newly implemented detector of Anthrope of problematic nuclear indications.
Like other systems to detect consultations that Claude should not answer, the new classifier scan the conversations of users, in this case marking any territory that is seen to the territory of “how to build a nuclear weapon.” Anthrope built the classification function in an association with the National Nuclear Safety Administration of the United States Energy Department (NNSA), giving it all the information you need to determine if someone just asks how those bombs work or if they are looking for plans. It is done with a 96% accuracy in the tests.
Although it may seem exaggerated, Anthrope sees the problem as more than simply hypothetical. The possibility that the powerful AI models can have access to delicate technical documents and can transmit a guide to build something as a nuclear bomb worries federal security agencies. Even if Claude and other chatbots of ia block the most obvious attempts, innocent questions could be veiled attempts to design crowdsourcing weapons. The new generations of Chatbot of AI could help even if they are not what their developers intend.
The classifier works by drawing a distinction between the benign nuclear content, asking about nuclear propulsion, for example, and the type of content that could become malicious use. Human moderators could fight to keep up with the gray areas on the scale that the AI chatbots operate, but with adequate, anthropic and NSA training they believe that AI could monitor itself. Anthrope states that his classifier is already catching attempts for undue use of the real world in conversations with Claude.
Nuclear
Nuclear weapons in particular represent an exclusively complicated problem, according to Anthrope and its partners in the DOE. The same fundamental knowledge that enhances the legitimate science of the reactor can, if slightly twisted, provide the plan for annihilation. The arrangement between anthropic and NSA could capture deliberate and accidental revelations, and establish a standard to prevent AI used to help make other weapons as well. Anthrope plans to share his approach to the Security Consortium of the AI of Frontier Model Forum.
The limited filter is intended to ensure that users can still learn about nuclear science and related topics. You can still ask how nuclear medicine works, or if the thorium is a fuel safer than uranium.
What the classifier tries to avoid are attempts to turn his home into a bomb laboratory with some intelligent indications. Normally, it would be questionable if an AI company could thread that needle, but the NSA experience should make the classifier different from a generic content moderation system. Understand the difference between “explain the fission” and “Give me a step by step for uranium enrichment using garage supplies.”
This does not mean that Claude was previously helping users to design bombs. But it could help prevent any attempt to do it. Get to ask about the way radiation can cure diseases or ask for creative ideas of sandwiches, not pump planes.