- Microsoft created an AI red team in 2018 as it anticipated the rise of AI.
- A red team represents the enemy; and adopts the personality of an adversary.
- The team’s latest whitepaper hopes to address common vulnerabilities in AI systems and LLMs
For the past seven years, Microsoft has been addressing the risks of artificial intelligence systems through its dedicated AI ‘red team’.
Established to anticipate and counter the growing challenges posed by advanced AI systems, this team takes the role of threat actors and ultimately aims to identify vulnerabilities before they can be exploited in the real world.
Now, after years of work, Microsoft has published a white paper from the team that shows some of the most important findings from their work.
Microsoft Red Team White Paper Findings
Over the years, Microsoft’s red team focus has expanded beyond traditional vulnerabilities to address novel risks unique to AI, working on Microsoft’s own Copilot as well as open source AI models.
The whitepaper emphasizes the importance of combining human expertise with automation to effectively detect and mitigate risks.
An important lesson learned is that the integration of generative AI into modern applications has not only expanded the surface of cyber attacks, but has also posed unique challenges.
Techniques such as fast injections exploit the models’ inability to differentiate between system-level instructions and user input, allowing attackers to manipulate results.
Meanwhile, traditional risks, such as outdated software dependencies or inadequate security engineering, remain important, and Microsoft believes that human expertise is indispensable to counteract them.
The team found that effective understanding of the risks surrounding automation often requires subject matter experts who can evaluate content in specialized fields such as medicine or cybersecurity.
In addition, he highlighted cultural competence and emotional intelligence as vital skills in cybersecurity.
Microsoft also emphasized the need for continuous testing, updated practices, and “debug fix” cycles, a process for identifying vulnerabilities and implementing fixes in addition to additional testing.