- AI promises a great revolution for developers, but is it only for the creation of code?
- Anthropic and OpenAi’s popular models are not excellent in purification
- Microsoft researchers are open to their tools to facilitate research
Although the generative AI is increasingly being integrated into programming workflows, Microsoft’s new research reveals that large language models are not yet up to purification.
The research suggests that even advanced models still fight with purification tasks that are quite simple for experienced developers, highlighting the continuous importance of human programmers.
However, AI seems to have a solid case, with Google now claiming that about 25% of the new code is generated by AI. Meta has also noticed the wide implementation of AI for coding.
Ai is good for the creation of code, but not for purification
The report explores how 11 Microsoft researchers tested nine AI models in Swe-Bench Lite, a popular clearance reference point. Claude 3.7 Sonnet offered the highest success rate to 48.4%farther. Opgai’s O1 and O3-MINI registered lower success rates of 30.2% and 22.1% respectively.
“Even with purification tools, our simple agent based on the notice rarely solves more than half of the problems of Swe-Bench Lite,” the researchers wrote, blaming the suboptimal performance due to the lack of data that represent the sequential behavior of decision making.
However, all hope is not lost. “We believe that training or fine adjustment LLM can improve their interactive purification skills,” they added. Researchers intend to adjust a model for searching for specialized information in the collection of the information necessary to solve errors, but in the meantime, they promise the open source purification gym to facilitate that others carry out similar research.
The purification Gym is described as an “environment that allows code repair agents to access the tools for active information search behavior.”
However, for now, artificial intelligence may not be contributing so much value to the life of developers as Ia companies suggest. “Most developers spend most of their time purification code,” the researchers wrote, indicating that even if they benefit from the code generation, they might not be saving so much time.