- Gemini 3 Flash often makes up answers instead of admitting when he doesn’t know something
- The problem arises with factual or high-stakes issues.
- But it still proves to be the most accurate and capable AI model.
Gemini 3 Flash is fast and smart. But if you ask him something he doesn’t actually know—something obscure, complicated, or simply outside his training—he will almost always try to deceive, according to a recent evaluation by the independent testing group Artificial Analysis.
It appears that Gemini 3 Flash hit 91% on the “hallucination rate” portion of the AA-Omniscience benchmark. That means that when he didn’t have the answer, he gave one anyway, almost all the time, one that was completely fictitious.
AI chatbots that invent things have been a problem since their debut. Knowing when to stop and say I don’t know is just as important as knowing how to respond in the first place. At the moment, Google Gemini 3 Flash AI doesn’t do it very well. That’s what the test is for: to see if a model can differentiate real knowledge from an assumption.
So that the number does not distract from reality, it should be noted that Gemini’s high rate of hallucinations does not mean that 91% of their total answers are false. Rather, it means that in situations where the correct answer would be “I don’t know,” he made up an answer 91% of the time. That’s a subtle but important distinction, but one that has real-world implications, especially as Gemini is built into more products like Google Search.
Ok, it’s not just me. Does Gemini 3 Flash have a 91% hallucination rate in the artificial analysis omniscience hallucination rate benchmark?December 18, 2025
This result does not diminish the power and usefulness of Gemini 3. The model remains the highest performer in general-purpose tests and ranks alongside, or even ahead of, the latest versions of ChatGPT and Claude. It just errs on the side of confidence when it should be modest.
Overconfidence when responding also arises among Gemini’s rivals. What makes the Gemini number stand out is how often it occurs in these uncertainty scenarios, where there is simply no correct answer in the training data or no definitive public source to point to.
Honesty hallucination
Part of the problem is simply that generative AI models are largely word prediction tools, and predicting a new word is not the same as evaluating the truth. And that means the default behavior is to make up a new word, even when saying “I don’t know” would be more honest.
OpenAI has started to address this and get their models to recognize what they don’t know and say it so clearly. It is somewhat difficult to train, because reward models do not usually value a blank response more than a safe (but incorrect) one. Still, OpenAI has made it a target for the development of future models.
And Geminis often cite sources when they can. But even then, it doesn’t always stop when it should. That wouldn’t matter much if Gemini were just a research model, but as Gemini becomes the voice behind many Google features, being confidently wrong could hurt quite a bit.
There is also a layout option here. Many users expect their AI assistant to respond quickly and smoothly. Saying “I’m not sure” or “Let me check” may seem clumsy in a chatbot context. But it’s probably better than being fooled. Generative AI is still not always reliable, but checking any AI response is always a good idea.
Follow TechRadar on Google News and add us as a preferred source to receive news, reviews and opinions from our experts in your feeds. Be sure to click the Follow button!
And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp also.




