OpenAI’s new agent moves its 2017 vision for AI closer to reality

OpenAI’s 2017 research paper “World of Bits” ended with a clear-eyed assessment: “We showed that while standard supervised and reinforcement learning techniques can be applied to achieve adequate results across these environments, the gap between agents and humans remains large, and welcomes additional modeling advances.”

That paper outlined a long-term vision for the company, one that’s now inching closer to reality with the new ChatGPT agent. Casey Chu, a member of the development team, confirmed in a recent interview that this goal never faded: “This project has a very long lineage, dating back to around 2017. Our codename is ‘World of Bits 2’ for the computer use part.” The lineage stretches back even further – in 2016, OpenAI released a blog post about the related training environment Universe.

But the way OpenAI tries to close that “large gap” has fundamentally changed. The biggest shift is the starting point: instead of beginning from scratch, the new agent is built on top of a massive, unsupervised, pretrained foundation model. That baseline competence is now required for everything that follows. “Before we apply Reinforcement Learning, the model must be good enough to achieve a basic completion of tasks” says Issa Fulford.

According to OpenAI, reinforcement learning is very data-efficient

OpenAI now relies on reinforcement learning (RL) for crucial fine-tuning, calling the process extremely data-efficient: “The scale of the data is minuscule compared to the scale of pre-training data. We are able to teach the model new capabilities by curating these much smaller, high-quality datasets,” Fulford explains. These datasets are made up of dynamic collections of difficult, targeted tasks. The team starts by defining what they want the agent to accomplish, then designs training scenarios accordingly. “We work backwards from the use cases we want to solve to train the model and build the product,” Fulford adds.

When it comes to hands-on training, the agent faces these tasks and has to figure out solutions without being told how. As Chu puts it, “We essentially give the model all these tools, lock it in a room, and it experiments. We don’t tell it when to use what tool, it figures that out by itself.” The mechanism driving this experimental learning is simple but effective: a reward system based on the outcome. Edward Sun explains: “As long as you can grade the task—judge whether the model’s performance on the result was good or not—you can reliably train the model to become even better at it..”

Massive scaling of computing power

This approach, where only the final result needs to be evaluated, is far more efficient than collecting thousands of human demonstrations for every mouse click and keystroke. It lets OpenAI train agents across hundreds of thousands of virtual machines at once, allowing them to independently discover the best solutions to complex problems.

The “further advances” called for in the 2017 paper didn’t come from a new algorithm, but from scaling up on every level. “Essentially, the scale of the training has changed,” Chu says. ” I don’t know the exact multiplier, but it must be something like 100,000x in terms of compute.”

For now, OpenAI says the agent still shouldn’t be used for critical tasks.

Join our community

Join the DECODER community on Discord, Reddit or Twitter – we can’t wait to meet you.

Read the full article on The-Decoder.com

OpenAI’s new agent moves its 2017 vision for AI closer to reality

OpenAI’s new agent moves its 2017 vision for AI closer to reality

Subscribe to our newsletter

For the latest news & monthly prize giveaways

Subscribe to our newsletter

For the latest news & monthly prize giveaways

Related Posts

Fields Medalist says ChatGPT 5.5 Pro delivered “PhD-level” math research in under two hours with zero human help

Fields Medalist says ChatGPT 5.5 Pro delivered “PhD-level” math research in under two hours with zero human help

Broadcom reportedly won’t build OpenAI’s custom chip unless Microsoft buys 40 percent of them

Comments

Latest News

Fields Medalist says ChatGPT 5.5 Pro delivered “PhD-level” math research in under two hours with zero human help

Fields Medalist says ChatGPT 5.5 Pro delivered “PhD-level” math research in under two hours with zero human help

Broadcom reportedly won’t build OpenAI’s custom chip unless Microsoft buys 40 percent of them

Broadcom reportedly won’t build OpenAI’s custom chip unless Microsoft buys 40 percent of them

XRP Positioned to Play Key Role in Bridging Tokenized Assets Across Jurisdictions, Says Ripple CTO

SEC v Ripple: SEC Has Just Days Left to Respond on XRP Case, Says Legal Expert

Anthropic’s CEO admits compromising with authoritarian regimes to secure AI funding

Follow Us

Categories

Welcome Back!

Create New Account!

Retrieve your password

Subscribe to our newsletter

Get the latest news & win monthly prizes

Subscribe to our newsletter

For the Latest News and Monthly Prize Giveaways