Bitcoin

Bitcoin

$111,012.50

BTC -1.73%

Ethereum

Ethereum

$4,002.44

ETH -2.11%

  • Login
  • Register
Metaverse Media Group
  • Home
  • Crypto
  • NFTs
  • Artificial Intelligence
  • More
    • Technology
    • Business
    • Newsletter
No Result
View All Result
  • Home
  • Crypto
  • NFTs
  • Artificial Intelligence
  • More
    • Technology
    • Business
    • Newsletter
No Result
View All Result
Metaverse Media Group

OpenAI’s math gold hints that AI may soon tackle even longer and harder tasks

OpenAI’s math gold hints that AI may soon tackle even longer and harder tasks

The Decoderby The Decoder
21 July 2025
An unreleased AI model from OpenAI has reportedly solved five out of six problems from the International Mathematical Olympiad (IMO) under competition conditions. But the real story is not what it solved, but how it did it. The article OpenAI’s math gold hints that AI may soon tackle even longer and harder tasks appeared first on THE DECODER….

summary
Summary

An unreleased AI model from OpenAI has reportedly solved five out of six problems from the International Mathematical Olympiad (IMO) under competition conditions. But the real story is not what it solved, but how it did it.

OpenAI says an experimental language model scored 35 out of 42 possible points in an internal IMO-style test – enough for a gold medal. Three former IMO winners independently graded the model’s natural language proofs, which were evaluated just like submissions from human contestants. According to the company, the test mirrored real IMO rules: two four-and-a-half-hour sessions, no internet, no external tools or code – just text.

OpenAI claims the model wasn’t specifically trained on IMO tasks. Instead, it was developed as a general-purpose reasoning model, drawing on recent advances in reinforcement learning and using substantial compute during inference. Researcher Alexander Wei emphasized in an X post that this was not a task-specific system, but one capable of autonomously generating complex, multi-page proofs. There are hints it might even be a multi-agent system.

Sustained reasoning without tools

What makes this achievement stand out is that the model reasoned consistently for hours at a time without any symbolic tools like code interpreters or mathematical software. That sets it apart from other high-performing systems such as DeepMind’s AlphaProof, which rely on hybrid neuro-symbolic approaches.

THE DECODER Newsletter
The most important AI news straight to your inbox.
✓ Weekly
✓ Cancel at any time

Until recently, it was widely believed that language models couldn’t sustain consistent mathematical reasoning over long sessions. As recently as June, mathematician Terence Tao said on the Lex Fridman Podcast that IMO-level problems were too difficult for AI to solve in real time. “You can’t hire enough humans to grade those,” Tao said, referring to the labor-intensive verification of long proofs in reinforcement learning training.

The result came as a surprise, even to prediction markets, which put the odds of an AI winning IMO gold before the end of 2025 at under 20 percent. (These forecasts used slightly stricter criteria.)

Both the markets and Tao seemed to assume that a reasoning model like o3 would need to be trained explicitly for IMO proofs, receiving expert feedback at every step. OpenAI, however, appears to have found a more general method for eliciting this behavior. Wei also highlighted that the model wasn’t tailored for the task, but instead was a generalist reasoning system.

OpenAI researcher Jerry Tworek says the reinforcement learning system used here also helped train ChatGPT Agent and the model that recently took second place at the Heuristics World Finals on AtCoder, where it generated code non-stop for nearly ten hours.

Transparency questions

As usual, OpenAI’s claims have sparked criticism. Gary Marcus called the achievement impressive but raised a list of questions in an X post: How is the model architecturally different from its predecessors? What were the costs per problem? Was the model trained on raw text or preprocessed data? And how transferable are these results to other scientific domains? So far, OpenAI has kept all those details under wraps.

Recommendation

OpenAI has faced similar criticism before, notably for a lack of transparency around the ARC-AGI benchmark test. The ARC Prize Foundation found that the final o3 model performed worse than a previously tested preview version. It also only came to light after the fact that OpenAI funded the supposedly independent FrontierMath benchmark, just after it hit a record result there.

A scalable approach to reasoning?

In a recent essay, “How o3 and Grok 4 accidentally vindicated neurosymbolic AI,” Marcus argued that modern AI models are increasingly relying on symbolic tools like code interpreters to overcome the limits of pure language models.

OpenAI’s IMO system, on the other hand, worked entirely in text – no tools – which, if the results hold up, would be a notable exception. If the model’s ability to generalize is confirmed, it could call Marcus’s thesis into question, at least in part. Still, his main criticism remains: without methodological transparency, it’s hard to interpret these achievements.

For now, OpenAI seems to have built a language model that can reason consistently for hours – without any external tools. That would have been hard to imagine just a short time ago. The generalist reasoning approach appears to scale, at least for now. According to OpenAI, the next step is reasoning sessions that last several days.

Join our community
Join the DECODER community on Discord, Reddit or Twitter – we can’t wait to meet you.

Read the full article on The-Decoder.com
in AI
Reading Time: 4 mins read
0
0
25
VIEWS
Share on TwitterShare on Facebook

Subscribe to our newsletter

For the latest news & monthly prize giveaways
Join Now

Subscribe to our newsletter

For the latest news & monthly prize giveaways
Join Now
ADVERTISEMENT

Related Posts

OpenAI prepares to launch GPT-5, but big leaps are unlikely
AI

OpenAI prepares to launch GPT-5, but big leaps are unlikely

2 months ago
29
Psychologist says ChatGPT helps understanding even if it may not understand
AI

Psychologist says ChatGPT helps understanding even if it may not understand

2 months ago
26
Under mounting pressure, Apple plans to increase its spending on artificial intelligence projects
AI

Under mounting pressure, Apple plans to increase its spending on artificial intelligence projects

3 months ago
26

Comments

Please login to join discussion
ADVERTISEMENT

Latest News

  • All
  • Crypto
  • NFTs
  • Technology
  • Business
XRP Positioned to Play Key Role in Bridging Tokenized Assets Across Jurisdictions, Says Ripple CTO
Crypto

XRP Positioned to Play Key Role in Bridging Tokenized Assets Across Jurisdictions, Says Ripple CTO

Bitcoin.com News
by Bitcoin.com News
2 months ago
189
SEC v Ripple: SEC Has Just Days Left to Respond on XRP Case, Says Legal Expert
Crypto

SEC v Ripple: SEC Has Just Days Left to Respond on XRP Case, Says Legal Expert

Bitcoin.com News
by Bitcoin.com News
2 months ago
31
XRP ETF From Teucrium Sees Enormous Interest, Gains Massive Traction With Extraordinary Inflows
Crypto

XRP ETF From Teucrium Sees Enormous Interest, Gains Massive Traction With Extraordinary Inflows

Bitcoin.com News
by Bitcoin.com News
2 months ago
31
Latam Insights: El Salvador’s Bitcoin ‘Shuffling,’ Brazil Gets 50% Tariffs
Crypto

Latam Insights: El Salvador’s Bitcoin ‘Shuffling,’ Brazil Gets 50% Tariffs

Bitcoin.com News
by Bitcoin.com News
2 months ago
37
Mining Crunch? Bitcoin Hashrate Slides Below 900 EH/s
Crypto

Mining Crunch? Bitcoin Hashrate Slides Below 900 EH/s

Bitcoin.com News
by Bitcoin.com News
2 months ago
33
Justin Sun Makes History as Youngest Chinese Commercial Astronaut With Blue Origin’s NS-34 Spaceflight
Crypto

Justin Sun Makes History as Youngest Chinese Commercial Astronaut With Blue Origin’s NS-34 Spaceflight

Bitcoin.com News
by Bitcoin.com News
2 months ago
26
Load More
Next Post
Solana Market Cap Blasts Past $100B as Shorts Get Obliterated in $16M Liquidation Wave

Solana Market Cap Blasts Past $100B as Shorts Get Obliterated in $16M Liquidation Wave

ADVERTISEMENT

Follow Us

Categories

  • Crypto
  • NFTs
  • AI
  • Technology
  • Business
  • Crypto
  • NFTs
  • AI
  • Technology
  • Business
Subscribe to our Newsletter

© 2022 Metaverse Media Group – The Metaverse Mecca

Privacy and Cookie Policy | Sitemap

Welcome Back!

Sign In with Google
OR

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Sign Up with Google
OR

Fill the forms below to register

*By registering into our website, you agree to the Terms & Conditions and Privacy Policy.
All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Home
  • Crypto
  • NFTs
  • Artificial Intelligence
  • More
    • Technology
    • Business
    • Newsletter
Bitcoin

Bitcoin

$111,012.50

BTC -1.73%

Ethereum

Ethereum

$4,002.44

ETH -2.11%

  • Login
  • Sign Up
This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.

Subscribe to our newsletter

Get the latest news & win monthly prizes

Subscribe to our newsletter

For the Latest News and Monthly Prize Giveaways

Join Now
Join Now