Anthrope has a new security system that says it can stop almost all Jailbreaks of AI


  • Anthrope presents a new proof of proof of proven proof in the Sonnet Claude 3.5
  • “Constitutional Classifiers” are an attempt to teach LLMS value systems
  • The tests resulted in a reduction of more than 80% in successful Jailbreaks

In an attempt to address the abusive indications of natural language in AI tools, Operai Anthrope has presented a new concept that calls “constitutional classifiers”; A means to instill a set of human values ​​(literally, a constitution) in a large language model.

Anthrope’s Safeguard research team announced the new safety measure, designed to stop the Jailbreaks (or achieve the production that comes out of the established safeguards of a LLM) by Claude 3.5 Sonnet, its latest and greatest model of language model Great, in a new academic document.

Leave a Comment

Your email address will not be published. Required fields are marked *