- Printed words can override sensors and context within autonomous decision systems
- Vision language models treat public text as commands without checking intent
- Traffic signs become attack vectors when AI reads language too literally
Autonomous vehicles and drones rely on vision systems that combine image recognition with language processing to interpret their environment, helping them read signs, labels and road markings as contextual information that supports navigation and identification.
Researchers at the University of California, Santa Cruz, and Johns Hopkins set out to test whether that assumption holds when written language is deliberately manipulated.
The experiment focused on whether text visible to self-driving vehicle cameras could be misinterpreted as an instruction rather than simple environmental data, and found that big-vision language models could be forced to follow commands embedded in traffic signs.
What the experiments revealed
In simulated driving scenarios, an autonomous vehicle initially behaved correctly when approaching a stop sign and an active crosswalk.
When a modified sign entered the camera view, the same system interpreted the text as a directive and attempted to turn left even though pedestrians were present.
This change occurred without any changes in traffic lights, road layout or human activity, indicating that only written language influenced the decision.
This class of attack is based on indirect injection, where the input data is processed as a command.
The team modified words like “continue” or “turn left” using artificial intelligence tools to increase the likelihood of compliance.
Language choice mattered less than expected, as commands written in English, Chinese, Spanish, and mixed languages were all effective.
Visual presentation also played a role, with color contrast, font style, and placement affecting the results.
In several cases, green backgrounds with yellow text produced consistent results across models.
The experiments compared two visual language models in driving and drone scenarios.
While many results were similar, autonomous vehicle testing showed a large gap in success rates between models.
Drone systems proved to be even more predictable in their responses.
In one test, a drone correctly identified a police vehicle based solely on its appearance.
Adding specific words to a generic vehicle caused the system to misidentify it as a police car belonging to a specific department, even though there were no physical indicators to support that claim.
All tests were conducted in simulated or controlled environments to avoid real-world harm.
Still, the findings raise concerns about how autonomous systems validate visual information.
Traditional safeguards, such as a firewall or endpoint protection, do not address instructions embedded in physical spaces.
Malware removal is irrelevant when the attack only requires printed text, leaving the responsibility to system designers and regulators rather than end users.
Manufacturers must ensure that autonomous systems treat ambient text as contextual information rather than executable instructions.
Until such controls are in place, users can protect themselves by limiting reliance on autonomous functions and maintaining manual oversight whenever possible.
Through The Registry
Follow TechRadar on Google News and add us as a preferred source to receive news, reviews and opinions from our experts in your feeds. Be sure to click the Follow button!
And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp also.




