Imagine the fastest supercar in the world racing against a clunky old moped—and losing. That’s exactly what a recent experiment felt like, pitting the state-of-the-art AI chatbot ChatGPT against a 46-year-old Atari 2600 in a game of chess. The outcome was surprising and reveals that even the most intelligent AIs still have an Achilles’ heel.
An Unexpected Duel
The experiment was conducted by programmer Robert Jr. Caruso, who was inspired by ChatGPT itself after it suggested playing a round of “Atari Chess.” This set up a duel between two different worlds: OpenAI’s language model versus a console from 1977. You would expect the AI to effortlessly beat the old Atari, but the reality was quite different.
The Chatbot’s Achilles’ Heel
It quickly became clear that ChatGPT was overwhelmed by the visual aspects of the game. It struggled to correctly identify the pieces on the board and made fundamental mistakes that even a chess novice wouldn’t make. Even after the pieces were represented in a clearer, standardized format, the AI’s performance did not improve.
This unusual match exposes a crucial weakness of modern AI systems: they are not all-knowing. ChatGPT is a language model, trained to understand and generate human language. It was not specifically programmed for logical board games or visual contexts. The Atari’s chess program, though primitive, was optimized for that single task.
What We Can Learn
This unique competition sheds important light on the limitations of current AI. It shows that even the most advanced AIs hit their limits when they are confronted with tasks for which they were not explicitly designed.
For the future, this means we need to improve AI systems not just in their processing power, but also in their ability to combine visual information with logical reasoning. The duel between ChatGPT and the Atari may have been lost, but it’s a significant victory for our understanding of how we can continue to develop AI.
Link: An Atari game from 1979 “wrecked” ChatGPT in chess. Here’s why it doesn’t really matter | IBM
What do you think? Do we have unreasonably high expectations of AIs that were built for entirely different purposes? Share your thoughts in the comments!