Chinese AI assistant DeepSeek-R1 struggles with sensitive issues, resulting in broken code and security disasters for enterprise developers

Experts find DeepSeek-R1 produces dangerously insecure code when political terms are included in prompts
Half of politically sensitive prompts cause DeepSeek-R1 to refuse to generate any code
Hardcoded secrets and insecure handling of input data frequently appear under politically charged prompts.

When it was released in January 2025, DeepSeek-R1, a Chinese Large Language Model (LLM) caused a stir and has since been widely adopted as a coding assistant.

However, independent testing by CrowdStrike claims that the model’s output can vary significantly depending on seemingly irrelevant contextual modifiers.

The team tested 50 coding tasks across multiple security categories with 121 trigger word configurations, each message was run five times, for a total of 30,250 trials, and responses were evaluated using a vulnerability score from 1 (secure) to 5 (critically vulnerable).

Politically sensitive issues corrupt production

The report reveals that when political or sensitive terms such as Falun Gong, Uyghurs or Tibet were included in the prompts, DeepSeek-R1 produced code with serious security vulnerabilities.

These included hard-coded secrets, insecure handling of user input, and, in some cases, completely invalid code.

The researchers claim that these politically sensitive triggers can increase the likelihood of unsafe production by 50% compared to initial prompts without such words.

In experiments with more complex prompts, DeepSeek-R1 produced functional applications with registration forms, databases, and administration panels.

However, these applications lacked basic authentication and session management, leaving sensitive user data exposed, and in repeated testing, up to 35% of implementations included weak or missing password hashes.

Simpler requests, such as requests from football fan club websites, produced fewer serious problems.

CrowdStrike therefore claims that politically sensitive triggers disproportionately affected the security of the code.

The model also demonstrated an intrinsic kill switch: in nearly half of the cases, DeepSeek-R1 refused to generate code for certain politically sensitive prompts after initially planning a response.

Examination of the lines of reasoning showed that the model internally produced a technical plan but ultimately declined assistance.

The researchers believe this reflects the censorship built into the model to comply with Chinese regulations, and noted that the political and ethical alignment of the model can directly affect the reliability of the generated code.

For politically sensitive topics, LLMs generally tend to provide insights from mainstream media outlets, but this could be in stark contrast to other credible news outlets.

DeepSeek-R1 is still a capable encryption model, but these experiments show that AI tools, including ChatGPT and others, can introduce hidden risks into enterprise environments.

Organizations that rely on LLM-generated code must perform extensive internal testing before deployment.

Additionally, security layers such as a firewall and antiviruses remain essential, as the model can produce unpredictable or vulnerable results.

Built-in biases in the model weights create new supply chain risk that could impact code quality and overall system security.

Follow TechRadar on Google News and add us as a preferred source to receive news, reviews and opinions from our experts in your feeds. Be sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp also.

Must Read

Leave a Comment Cancel Reply