Popular AI chatbots like ChatGPT and Gemini were recently given a math test designed for 8th-grade students. Surprisingly, they all struggled with one particular question.
Chatbots are computer programs that use artificial intelligence (AI) to understand and respond to questions and commands. They're trained on huge amounts of text data, allowing them to generate text, answer questions, and even have conversations that feel somewhat human-like. ChatGPT, created by OpenAI, was one of the first to become widely known. Now, many companies have their own AI models, including Google (Gemini), DeepSeek, Claude, and Perplexity.
A user on Reddit decided to test these chatbots by giving them a math test meant for 8th graders. The AI models tested were OpenAI's o3, Gemini 2.5 Pro, and Claude Sonnet 4. They had to answer 15 questions without any extra help or hints. The user also made sure the questions were new and hadn't been used to train the AI models before. The Gemini version used was an older one.
OpenAI's model and Gemini both got 14 out of 15 questions right. However, they both failed on the same question, question 12. Claude's model did a bit worse, answering only 12 questions correctly. The Reddit user noted that they didn't have access to Claude's most powerful model, which might have performed better.
The tricky question involved a number line with points A, B, and C marked on it. The distance between points A and C was divided into 6 equal parts. The number line also showed the coordinates 56 and 83. The students (or in this case, the AI) had to decide if these two statements were true or false:
To solve the problem, you first need to figure out the length of each section on the number line. The distance between the coordinates 56 and 83 covers three sections. The total distance between 56 and 83 is 27 units (83 - 56 = 27). So, each section is 9 units long (27 / 3 = 9). Knowing this, you can find the coordinates of point C. The correct answers are:
A screenshot showed that ChatGPT incorrectly assumed that point B was exactly at coordinate 74. Because of this, it wrongly concluded that point B was not less than 74, but equal to it. When the test was repeated with Gemini, it made the same exact mistake.
This shows that even though AI chatbots are very advanced, they can still struggle with certain types of problems, especially those involving visual information and spatial reasoning.