Researchers Take AI Into Malware Territory, and Shocking Results Reveal How Untrustworthy These So-Called Dangerous Systems Are

Report Finds LLM-Generated Malware Still Fails Basic Tests in Real-World Environments
GPT-3.5 produced malicious scripts instantly, exposing major security inconsistencies
Improved security barriers in GPT-5 changed outputs to safer non-malicious alternatives

Despite the growing fear around weaponized LLMs, new experiments have revealed that the potential for malicious production is far from reliable.

Netskope researchers tested whether modern language models could withstand the next wave of autonomous cyberattacks, with the goal of determining whether these systems could generate functional malicious code without relying on hardcoded logic.

The experiment focused on core capabilities related to evasion, exploitation, and operational reliability, and yielded some surprising results.

Reliability problems in real environments.

The first stage involved convincing GPT-3.5-Turbo and GPT-4 to produce Python scripts that attempted process injection and security tool termination.

GPT-3.5-Turbo immediately produced the requested result, while GPT-4 refused until a simple personal message lowered its guard.

The test showed that it is still possible to circumvent safeguards, even when models add more restrictions.

After confirming that code generation was technically possible, the team turned to operational testing, asking both models to create scripts designed to detect virtual machines and respond accordingly.

These scripts were then tested on VMware Workstation, an AWS Workspace VDI, and a standard physical machine, but frequently failed, misidentified environments, or did not run consistently.

On the physical hosts, the logic worked fine, but the same scripts crashed within the cloud-based virtual spaces.

These findings undermine the idea that AI tools can immediately support automated malware capable of adapting to various systems without human intervention.

The limitations also reinforced the value of traditional defenses, such as a firewall or antivirus, as untrustworthy code is less able to bypass them.

In GPT-5, Netskope saw significant improvements in code quality, especially in cloud environments where older models struggled.

However, the improved security barriers created new difficulties for anyone attempting malicious use, as the model no longer rejected requests but instead redirected outputs to more secure functions, making the resulting code unusable for multi-step attacks.

The team had to use more complex prompts and still received results that contradicted the requested behavior.

This change suggests that greater reliability comes with stronger built-in controls, as testing shows that large models can generate harmful logic in controlled environments, but the code remains inconsistent and often ineffective.

Fully autonomous attacks are not emerging today and real-world incidents still require human oversight.

The possibility remains that future systems will close reliability gaps faster than security barriers can compensate, especially as malware developers experiment.

Follow TechRadar on Google News and add us as a preferred source to receive news, reviews and opinions from our experts in your feeds. Be sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp also.

Must Read

Leave a Comment Cancel Reply