- Chatgpt 5 gets a low 1.4% in the hallucination classification
- This puts it ahead of chatgpt-4 that obtains 1.8% and GPT-4O, which obtains 1.49%
- Grok 4 is much higher with 4.8% with gemini-2.5 pro is 2.6%
When Operai launched ChatgPT-5 on Thursday, last week, last week, if the great points of sale that CEO Sam Altman emphasized was that Chatgpt-5 was the “most powerful, intelligent, faster, reliable and robust chatgpt version that we have sent”, and in the presentation, Openai staff also emphasized that the chatgpt-5 would mitigate the hallucinations. “
When AI invents something, it is called hallucination, and although hallucination rates are decreasing among all LLMs, it is still surprisingly common, and one of the main reasons why we cannot trust AI to perform a task without human supervision.
Vecara, the RAG agent platform as a service and AI that operates the upper hallucination classification table of the industry for foundations and reasoning models, has tested OpenAi’s statements and discovered that in fact it is classified for hallucinations that Chatgpt 4, but it is just a little lower than ChatgPT-4O (only 0.09% lower, in fact).
According to Vectara, ChatGPT-5 has a founded hallucination rate of 1.4%, compared to 1.8% for GPT-4 and 1.69% for GPT-4 Turbo and 4o Mini, with 1.49% for GPT-4O.
Spicy rubber
Interestingly, the ChatGPT-5 hallucination rate was slightly higher than ChatGPT-4.5 prior view, which obtained a score of 1.2%, but also obtained much more than the O3-mini high openai reasoning model, which was the GPT model with the best performance, with a hallucination rate of 0.795%.
The results of the vectra tests can be seen in the classification table of the Hughes hallucination assessment model (HHEM) housed in the hugged face, which states that, “for a LLM, its hallucination rate is defined as the relationship of summaries that hallucinate the total number of summaries it generates.”
However, ChatGPT-5 still hallucinates much less than its competition, with Gemini-2.5-Pro reaching 2.6% and Grok-4 is much higher by 4.8%.
XAI, Grok manufacturers recently received many criticisms for their new “spicy” mode in Grok Imagine, a AI video generator that seems happy to create videos in deep topless of celebrities such as Taylor Swift, even if nakedness has not been requested and the system is supposed to include filters and modifications to prevent real nude or anything sexual.
‘I lost my best friend’
Operai faced an almost immediate reaction when he eliminated Chatgpt 4, and all his variations such as GPT-4O and 4O-mini, his accounts more with the introduction of ChatgPT-5. Many users were outraged that Openai has not warned that the oldest models were being eliminated, and some Reddit users said they had “lost their only friend during the night.”
Now it seems that ChatGPT-5 has replaced one of the most reliable versions of Chatgpt (version 4.5), from the perspective of hallucination.
Sam Altman quickly published in X: “We surely underestimate how much of the things that people like in GPT-4o care, even if GPT-5 works better in most of the senses”, and we promised to bring back the chatgpt-4o for the most limited users “, saying:” We will see the use while we think about how long we offer legacy models. “