ARTICLE AD
These days, artificial intelligence can generate photorealistic images, write novels, do your homework, and even predict protein structures. New research, however, reveals that it often fails at a very basic task: telling time.
Researchers at Edinburgh University have tested the ability of seven well-known multimodal large language models—the kind of AI that can interpret and generate various kinds of media—to answer time-related questions based on different images of clocks or calendars. Their study, forthcoming in April and currently hosted on the preprint server arXiv, demonstrates that the LLMs has difficulty with these basic tasks.
“The ability to interpret and reason about time from visual inputs is critical for many real-world applications—ranging from event scheduling to autonomous systems,” the researchers wrote in the study. “Despite advances in multimodal large language models (MLLMs), most work has focused on object detection, image captioning, or scene understanding, leaving temporal inference underexplored.”
The team tested OpenAI’s GPT-4o and GPT-o1; Google DeepMind’s Gemini 2.0; Anthropic’s Claude 3.5 Sonnet; Meta’s Llama 3.2-11B-Vision-Instruct; Alibaba’s Qwen2-VL7B-Instruct; and ModelBest’s MiniCPM-V-2.6. They fed the models different images of analog clocks—timekeepers with Roman numerals, different dial colors, and even some missing the seconds hand—as well as 10 years of calendar images.
For the clock images, the researchers asked the LLMs, what time is shown on the clock in the given image? For the calendar images, the researchers asked simple questions such as, what day of the week is New Year’s Day? and harder queries including what is the 153rd day of the year?
“Analogue clock reading and calendar comprehension involve intricate cognitive steps: they demand fine-grained visual recognition (e.g., clock-hand position, day-cell layout) and non-trivial numerical reasoning (e.g., calculating day offsets),” the researchers explained.
Overall, the AI systems did not perform well. They read the time on analog clocks correctly less than 25% of the time. They struggled with clocks bearing Roman numerals and stylized hands as much as they did with clocks lacking a seconds hand altogether, indicating that the issue may stem from detecting the hands and interpreting angles on the clock face, according to the researchers.
Google’s Gemini-2.0 scored highest on the team’s clock task, while GPT-o1 was accurate on the calendar task 80% of the time—a far better result than its competitors. But even then, the most successful MLLM on the calendar task still made mistakes about 20% of the time.
“Most people can tell the time and use calendars from an early age. Our findings highlight a significant gap in the ability of AI to carry out what are quite basic skills for people,” Rohit Saxena, a co-author of the study and PhD student at the University of Edinburgh’s School of Informatics, said in a university statement. “These shortfalls must be addressed if AI systems are to be successfully integrated into time-sensitive, real-world applications, such as scheduling, automation and assistive technologies.”
So while AI might be able to complete your homework, don’t count on it sticking to any deadlines.