- ChatGPT passes “strawberry” test but fails when changed to “blueberry”
- AI still struggles to count letters despite broader improvements
- Reasoning tests like “car wash” still expose gaps in AI logic
There are a number of viral posts from people amazed that chatbots like ChatGPT and Claude can solve complex equations but struggle with something as simple as counting the number of “r”s in the word “strawberry.” Well, those days might finally be over.
With the words “Finally,” the official ChatGPTapp
However, users quickly discovered that they could still make a mistake by changing “strawberry” to “blueberry.”
Article continues below.
“Not so fast,” said
To corroborate the result, I quickly tried the same thing with my version of ChatGPT on GPT-5.5 and was told there were two “r”s: a different result, but still incorrect. He passed the “strawberry” test perfectly, saying there were three “r’s,” but then claiming there were only two in “blueberry.” To its credit, ChatGPT admitted its error when I questioned it, attributing it to a simple “counting error.”
Why does the strawberry problem exist?
There are some very simple questions that chatbots are very bad at answering, one of which is “how many R’s are there in the strawberry?”
This is a simple counting task for humans, but surprisingly difficult for artificial intelligence systems. The reason comes down to how they process language. Large Language Models (LLM) are based on transformers, which convert words like “strawberry” into numerical representations. Those representations capture meaning and context, but do not inherently preserve a clear sense of the individual letters that make up the word.
The fact that ChatGPT is still stumbling over the term “cranberry” suggests that the solution may have been codified for specific cases, rather than reflecting a broader improvement in the way the LLM handles these types of questions.
The car wash problem
The second boast in ChatGPTapp’s post is that ChatGPT can now solve the car wash problem. This exploits a contextual gap in how LLMs reason, by asking whether it would be faster to walk to a car wash or drive if it is “only 50 meters away.” Most models will tell you that it’s faster to walk, ignoring the obvious problem that you need your car to wash.
ChatGPTapp claims that ChatGPT will now detect this error and flag it. But when I tested it using the latest GPT-5.5 model, it still recommended walking, as did Claude using Sonnet 4.6. However, when I tried it on the Gemini, it pointed out that while walking would be faster, it would be necessary to carry the car if the goal was to wash it.
Grok did even better. Not only did he point out the issue of not bringing the car, but he added that “this question has become a popular test to determine whether someone (or an AI) understands the actual goal rather than giving generic ‘walking is healthier/shorter/greener’ advice that ignores context.”
So, at least for now, it’s a win for Gemini and Grok. But if fixing the “strawberry” doesn’t fix the “blueberry,” a bigger question arises: Are these models really getting smarter, or are they just getting better at passing the tests we keep throwing at them?
Follow TechRadar on Google News and add us as a preferred source to receive news, reviews and opinions from our experts in your feeds.

The best business laptops for every budget




