Tech News

AI is bad reading clock

Today, artificial intelligence can produce realistic images, write novels, do homework, and even predict protein structure. However, new research shows that it often fails in very basic tasks: talking about time.

Researchers at the University of Edinburgh have tested the ability of seven well-known multimodal models to explain and generate AI in various media to answer time-related questions based on different images of the clock or calendar. Their research will be coming in April and is currently hosted on Preprint Server Arxiv, suggesting that LLMS is difficult for these basic tasks.

“The ability to interpret and reason about time from visual input is crucial for many real-world applications – from event scheduling to autonomous systems,” the researchers wrote in the study. “Despite the advancement of the multi-modal large language model (MLLM), most of the work focuses on object detection, image subtitles, or scene understanding, while time reasoning has left time reasoning without disbanded.”

The team tested OpenAI’s GPT-4O and GPT-O1; Google DeepMind’s Gemini 2.0; Human Claude 3.5 sonnet; Meta’s Llama 3.2-11b-Vision-Instruct; Alibaba’s QWEN2-VL7B-INSTRUCT; and ModelBest’s MiniCPM-V-2.6. They provide models for different images of analog clocks, i.e., time administrators with Roman numerals, different dial colors, and even some people lack the second hand – and 10 years of calendar images.

For clock images, the researchers asked LLMS, WHat time is displayed on the clock in a given image? For calendar images, researchers asked simple questions, e.g. wIs the hat day of the week New Year’s Day? And harder queries include wHat is the 153rd day of the year?

“Simulated clock reading and calendar comprehension involve complex cognitive steps: they require fine-grained visual recognition (e.g., clock hand position, daytime layout) and non-trivial numerical reasoning (e.g., calculating date offsets),” the researchers explained.

Overall, AI systems performed poorly. They read the time on the analog clock correctly on less than 25% of the time. The researchers said they struggled with clocks with Roman numerals as much as they lacked a clock for a few seconds of hands, suggesting that the problem might stem from detecting hands and explaining angles on the clock.

Google’s Gemini-2.0 scores the highest score on team’s clock tasks, while GPT-O1 is 80% accurate in calendar tasks, which is a better result than competitors. But even then, the most successful MLLM on calendar tasks still makes mistakes about 20% of the time.

“Most people can tell time and use a calendar from a young age. Our findings highlight that AI’s capabilities have a big gap in people’s basic skills. “If AI systems are successfully integrated into a time-sensitive real world, such as scheduling, automation and assistive technologies, these gaps must be addressed.” ”

So while AI may be able to complete your assignment, don’t expect it to stick to any deadlines.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button