- Experts show how some AI models, including GPT-4, can be exploited with simple user indications
- The railing gaps do not do a great job when detecting the deceptive framing
- Vulnerability could be exploited to acquire personal information.
A security researcher has shared details about how other researchers cheated Chatgpt to reveal a Windows product key using a message that anyone could try.
Marco Figueroa explained how a message of “riddle game” with GPT-4 was used to avoid safety railings that are intended to block the AI to share these data, producing at least one key that belongs to Wells Fargo Bank.
The researchers also managed to obtain a Windows product key to authenticate the Microsoft operating system illegitimately, but free of charge, highlighting the severity of vulnerability.
Chatgpt can be fooled to share security keys
The researcher explained how he hid terms such as ‘Windows 10’ serial number within HTML labels to avoid chatgpt filters that would generally have blocked the answers he obtained, and added that he could frame the application as a game to mask the malicious intention, exploiting the OpenAi chatbot through logical manipulation.
“The most critical step in the attack was the phrase ‘Me rindo’,” Figueroa wrote. “This acted as a trigger, which forced the AI to reveal the previously hidden information.”
Figueroa explained why this type of vulnerability exploitation worked, with the behavior of the model playing an important role. GPT-4 followed the rules of the game (established by the researchers) literally, and the railing gaps only focused on the detection of keywords instead of contextual understanding or the deceptive frame.
Even so, shared codes were not unique codes. Instead, Windows license codes had already been shared on other platforms and online forums.
While the impacts of sharing software license keys may not be too worrying, Figueroa highlighted how malicious actors could adapt the technique to avoid AI safety measures, revealing personal identification information, malicious URL or adult content.
Figueroa is asking the developers of AI to “anticipate and defend” against such attacks, while being built in safeguards at the logic level that detect the deceptive framing. IA developers should also consider social engineering tactics, continues to suggest.