Bitcoin

Bitcoin

$77,213.55

BTC 0.28%

Ethereum

Ethereum

$2,106.63

ETH 0.42%

  • Login
  • Register
Metaverse Media Group
No Result
View All Result
No Result
View All Result
Metaverse Media Group

OpenAI’s new agent moves its 2017 vision for AI closer to reality

OpenAI’s new agent moves its 2017 vision for AI closer to reality

The Decoderby The Decoder
22 July 2025
OpenAI has been working on the development of a versatile AI agent for years. With the new ChatGPT agent, the company is relying on massive computing power, targeted reinforcement learning and a strong pre-trained basis – and is pursuing a vision that goes back to 2017. The article OpenAI’s new agent moves its 2017 vision for AI closer to reality appeared first on THE DECODER….

summary
Summary

OpenAI’s 2017 research paper “World of Bits” ended with a clear-eyed assessment: “We showed that while standard supervised and reinforcement learning techniques can be applied to achieve adequate results across these environments, the gap between agents and humans remains large, and welcomes additional modeling advances.”

That paper outlined a long-term vision for the company, one that’s now inching closer to reality with the new ChatGPT agent. Casey Chu, a member of the development team, confirmed in a recent interview that this goal never faded: “This project has a very long lineage, dating back to around 2017. Our codename is ‘World of Bits 2’ for the computer use part.” The lineage stretches back even further – in 2016, OpenAI released a blog post about the related training environment Universe.

But the way OpenAI tries to close that “large gap” has fundamentally changed. The biggest shift is the starting point: instead of beginning from scratch, the new agent is built on top of a massive, unsupervised, pretrained foundation model. That baseline competence is now required for everything that follows. “Before we apply Reinforcement Learning, the model must be good enough to achieve a basic completion of tasks” says Issa Fulford.

According to OpenAI, reinforcement learning is very data-efficient

OpenAI now relies on reinforcement learning (RL) for crucial fine-tuning, calling the process extremely data-efficient: “The scale of the data is minuscule compared to the scale of pre-training data. We are able to teach the model new capabilities by curating these much smaller, high-quality datasets,” Fulford explains. These datasets are made up of dynamic collections of difficult, targeted tasks. The team starts by defining what they want the agent to accomplish, then designs training scenarios accordingly. “We work backwards from the use cases we want to solve to train the model and build the product,” Fulford adds.

THE DECODER Newsletter
The most important AI news straight to your inbox.
✓ Weekly
✓ Cancel at any time

When it comes to hands-on training, the agent faces these tasks and has to figure out solutions without being told how. As Chu puts it, “We essentially give the model all these tools, lock it in a room, and it experiments. We don’t tell it when to use what tool, it figures that out by itself.” The mechanism driving this experimental learning is simple but effective: a reward system based on the outcome. Edward Sun explains: “As long as you can grade the task—judge whether the model’s performance on the result was good or not—you can reliably train the model to become even better at it..”

Massive scaling of computing power

This approach, where only the final result needs to be evaluated, is far more efficient than collecting thousands of human demonstrations for every mouse click and keystroke. It lets OpenAI train agents across hundreds of thousands of virtual machines at once, allowing them to independently discover the best solutions to complex problems.

The “further advances” called for in the 2017 paper didn’t come from a new algorithm, but from scaling up on every level. “Essentially, the scale of the training has changed,” Chu says. ” I don’t know the exact multiplier, but it must be something like 100,000x in terms of compute.”

For now, OpenAI says the agent still shouldn’t be used for critical tasks.

Join our community
Join the DECODER community on Discord, Reddit or Twitter – we can’t wait to meet you.

Read the full article on The-Decoder.com
in AI
Reading Time: 3 mins read
0
0
44
VIEWS
Share on TwitterShare on Facebook

Subscribe to our newsletter

For the latest news & monthly prize giveaways
Join Now

Subscribe to our newsletter

For the latest news & monthly prize giveaways
Join Now
ADVERTISEMENT

Related Posts

Fields Medalist says ChatGPT 5.5 Pro delivered “PhD-level” math research in under two hours with zero human help
AI

Fields Medalist says ChatGPT 5.5 Pro delivered “PhD-level” math research in under two hours with zero human help

3 weeks ago
26
Fields Medalist says ChatGPT 5.5 Pro delivered “PhD-level” math research in under two hours with zero human help
AI

Fields Medalist says ChatGPT 5.5 Pro delivered “PhD-level” math research in under two hours with zero human help

3 weeks ago
29
Fields Medalist says ChatGPT 5.5 Pro delivered “PhD-level” math research in under two hours with zero human help
AI

Broadcom reportedly won’t build OpenAI’s custom chip unless Microsoft buys 40 percent of them

3 weeks ago
27

Comments

Please login to join discussion
ADVERTISEMENT

Latest News

  • All
  • Crypto
  • NFTs
  • Technology
  • Business
Fields Medalist says ChatGPT 5.5 Pro delivered “PhD-level” math research in under two hours with zero human help
AI

Fields Medalist says ChatGPT 5.5 Pro delivered “PhD-level” math research in under two hours with zero human help

The Decoder
by The Decoder
3 weeks ago
29
Fields Medalist says ChatGPT 5.5 Pro delivered “PhD-level” math research in under two hours with zero human help
AI

Fields Medalist says ChatGPT 5.5 Pro delivered “PhD-level” math research in under two hours with zero human help

The Decoder
by The Decoder
3 weeks ago
26
Fields Medalist says ChatGPT 5.5 Pro delivered “PhD-level” math research in under two hours with zero human help
AI

Broadcom reportedly won’t build OpenAI’s custom chip unless Microsoft buys 40 percent of them

The Decoder
by The Decoder
3 weeks ago
24
Fields Medalist says ChatGPT 5.5 Pro delivered “PhD-level” math research in under two hours with zero human help
AI

Broadcom reportedly won’t build OpenAI’s custom chip unless Microsoft buys 40 percent of them

The Decoder
by The Decoder
3 weeks ago
27
XRP Positioned to Play Key Role in Bridging Tokenized Assets Across Jurisdictions, Says Ripple CTO
Crypto

XRP Positioned to Play Key Role in Bridging Tokenized Assets Across Jurisdictions, Says Ripple CTO

Bitcoin.com News
by Bitcoin.com News
10 months ago
473
SEC v Ripple: SEC Has Just Days Left to Respond on XRP Case, Says Legal Expert
Crypto

SEC v Ripple: SEC Has Just Days Left to Respond on XRP Case, Says Legal Expert

Bitcoin.com News
by Bitcoin.com News
10 months ago
45
Load More
Next Post
Anthropic’s CEO admits compromising with authoritarian regimes to secure AI funding

Anthropic’s CEO admits compromising with authoritarian regimes to secure AI funding

ADVERTISEMENT

Follow Us

Categories

Subscribe to our Newsletter

© 2022 Metaverse Media Group – The Metaverse Mecca

Privacy and Cookie Policy | Sitemap

Welcome Back!

Sign In with Google
OR

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Sign Up with Google
OR

Fill the forms below to register

*By registering into our website, you agree to the Terms & Conditions and Privacy Policy.
All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
Bitcoin

Bitcoin

$77,213.55

BTC 0.28%

Ethereum

Ethereum

$2,106.63

ETH 0.42%

  • Login
  • Sign Up
This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.

Subscribe to our newsletter

Get the latest news & win monthly prizes

Subscribe to our newsletter

For the Latest News and Monthly Prize Giveaways

Join Now
Join Now