Openai’s O3 inference model runs very expensive

Admin April 4, 2025

0 0 2 minutes read

Openai’s O3 inference model runs very expensive

Openai’s O3 model emphasizes advanced reasoning. Photos of Dima Solomin on Unsplash

Ironically, measuring the intelligence of artificial intelligence is a very difficult task. That’s why the tech industry has proposed benchmarks such as ARC-AGI, which tests the capabilities of new technologies through a range of visual tasks, which is particularly challenging for AI models. Last December, OpenAI’s O3 inference model became the first AI system to pass the test with an 87.5% score.

But this is not without a price. At the time, the ARC Awards Foundation, which manages the Arc-Agi benchmark, estimated the cost of testing an OpenAI model was about $3,400 per task. For high-efficiency O3, scored 75.7% in the test, and that figure totaled $20 per task.

It turns out that the actual cost may be significantly higher, to be precise, advanced costs. While the ARC Awards Foundation’s O3 pricing was initially derived from the cost of Openai’s O1 model and the reasoning of O3’s predecessors, the nonprofit now matches OpenAI’s newly released 01-Pro. The O1-Pro model is announced, and the operating model is ten times more expensive, making it the most expensive model to date.

According to the new O1-Pro pricing, O3 can cost as much as $30,000 per task. The cost of more efficient O3 strains listed thereafter is $200 per task.

“Our belief has not been verified by Openai that the O3 price will be closer to O1-Pro pricing than O1, we were told in December, ARC Awards Foundation president Greg Kamradt told Observer. “In view of this, we have updated the metrics.” ”

The ARC Awards Foundation has edited its ARC-AGI leadership committee to exclude a compute-intensive version of O3, noting that “only display systems running less than $10,000”.

What is Arc-Agi?

Founded in 2019 by researcher François Chollet, the Arc-Agi benchmark relies on a range of puzzles that track the distance between AI systems and human-level intelligence. Contrary to the ability to simply analyze the model to draw from the dataset, it examines whether it can adapt to new problems and learn new task-specific skills. “Think of it as a test of the ability to learn new things,” Kamradt said.

OpenAI’s O3 was particularly successful in testing because the model was able to pause and consider many potential hints and then respond with the most accurate answer. Although O3 pricing has not been confirmed by OpenAI, the ARC Awards Foundation estimates will remain closer to the cost of O1-Pro until the official pricing is released. “It might be higher, but we’re not sure,” Kamrat said. “We just tried our best to use the available information we have.” Openai did not respond to an observer’s request for comment.

Although recent AI releases are getting closer to 100% on Arc-Agi, they are largely plagued by the new release released last month. The test, called ARC-AGI-2, contains tasks that are more difficult for AI systems, especially designed specifically for tasks that are specifically reasoned. So far, no model has been able to achieve a 5% mark.

Openai's O3 inference model runs very expensive