Claude only defeated GPT-5, Gemini and Grok in workproof of the real world, according to Opensei’s own study



  • Operai has launched GDPval, a new evaluation system to test how AI works in work -related tasks
  • Claude Opus 4.1 comes to mind, with ‘chatgpt-5 high’ in second place
  • Tasks include things like sending an email an answer to a unsatisfied customer

We are all familiar with AI’s reference points, which measure performance in certain tasks, but often these tasks do not reflect the real world and how people really use AI, especially at work.

To combat this problem, OpenAI, the chatgpt manufacturer, is introducing GDPval, a new way of measuring the performance of the AI ​​model using real world’s work tasks compared to a real human in 44 occupations, from software and lawyers developers to registered nurses and mechanical engineers.

Leave a Comment

Your email address will not be published. Required fields are marked *