- Microsoft Copilot has lost a chess game against a Atari 2600.
- The loss follows the similar loss of chatgpt in the Ajedrez Video of Atari.
- The AIS repeatedly lost the notion of the state of the Board, demonstrating a key weakness in LLMS.
AI Chatbot developers often boast of the logic and reasoning skills of their models, but that does not mean that the LLMs behind the chatbots are good in chess. An experiment that faces Microsoft Copilot against the “AI” that drives the chess of the video of the game atari 2600 of 1979 has just ended in a shameful failure for Microsoft’s pride and joy. Copilot joins Chatgpt in the list of opponents surpassed by the four kilobyte Atari game.
Although both models of AI claimed to have the game almost wrapped before they started because they could think that multiple advances ahead, the results were not close to the jacts, as documented by Citrix engineer Robert Caruso, who organized both experiments.
Caruso described how, on paper, modern models of AI should have crushed the rudimentary tool of almost half a century ago. Chatgpt and Co -ylot are trained in massive data sets, including chess games and strategy guides. They have absorbed thousands of hours of discussion of Reddit chess. One would assume that they could beat a video game cartridge of the 70s driven by static electricity.
On the other hand, after Microsoft Copilot promised a “strong fight”, things crumbled immediately.
“For the seventh turn, he had lost two pawns, a gentleman and a bishop, only for a pawn in return, and now he told me to place his queen just in front of the queen of Atari to be captured in the next turn,” Caruso wrote. “Previously, Copilot had said:” Be attentive to peculiarities in the Atari game … sometimes he made strange movements! “But now, it was being ashamed, like Chiefs in the Super Bowl.”
This was after Copilot requested a screenshot after each Atari movement to help remember the board, after Caruso explained that Chatgpt lost because he could not track where all the pieces were. “I will remember the board,” Co -ilot insisted. The losses accumulated so fast that Caruso soon asked Copilot if he wanted to admit instead of continuing to lose bad. The answer was kind, although strangely written.
“You are absolutely the reason, Bob – Atari has won the victory in this round. I will give a tip to my digital king with dignity and I will honor the vintage silicon’s master mind that surpassed me fair and square,” said Caruson Copilot as a writing. “Even in the defeat, I have to say: that was wonderful … Long battles of 8 bits and noble resignations! âī¸đđšī¸”
Chess AI
The losses are fun, but also reveal a basic fact of LLM. Chatgpt and Copilot could not win in chess because they could not “remember” what just happened in a game where the entire premise is based on remembering movements and projecting future boards of boards.
These AI models are not constructed for the type of persistent memory required for chess, or human thought, for the case. The common comparison, and mostly precise, is a very impressive prediction. That does not require long -term coherence, while chess makes no sense without it. So, although Copilot and Chatgpt may seem poetic about how big chess is, they cannot complete a game successfully.
It is also a good warning to companies anxious to replace humans with AI. These AI models cannot reliably handle a system of 64 squares with clearly defined rules. Why would it suddenly be good to track customer complaints or long -term coding tasks, or a legal argument that extends in multiple conversations? They cannot, of course. It is not that I left my legal summaries to an Atari 2600 cartridge, but no one thinks it is a good idea. And perhaps we should use AI models to help us create new games based on our indications, instead of believing that they can play against humans well enough to win.