Former Top Google Researchers Have Made A New Kind of AI Agent

A new kind of artificial intelligence agent, trained to understand how software is built by gorging on a company’s data and learning how this leads to an end product, could be both a more capable software assistant and a small step towards much smarter AI.

The new agent, called Asimov, was developed by Reflection, a small but ambitious startup confounded by top AI researchers from Google. Asimov reads code as well as emails, Slack messages, project updates and other documentation with the goal of learning how all this leads together to produce a finished piece of software.

Reflection’s ultimate goal is building superintelligent AI—something that other leading AI labs say they are working towards. Meta recently created a new Superintelligence Lab, promising huge sums to researchers interested in joining its new effort.

I visited Reflection’s headquarters in the Brooklyn neighborhood of Williamsburg, New York, just across the road from a swanky-looking pickleball club, to see how Reflection plans to reach superintelligence ahead of the competition.

The company’s CEO, Misha Laskin, says the ideal way to build supersmart AI agents is to have them truly master coding, since this is the simplest, most natural way for them to interact with the world. While other companies are building agents that use human user interfaces and browse the web, Laskin, who previously worked on Gemini and agents at Google DeepMind, says this hardly comes naturally to a large language model. Laskin adds that teaching AI to make sense of software development will also produce much more useful coding assistants.

Laskin says Asimov is designed to spend more time reading code rather than writing it. “Everyone is really focusing on code generation,” he told me. “But how to make agents useful in a team setting is really not solved. We are in kind of this semi-autonomous phase where agents are just starting to work.”

Asimov actually consists of several smaller agents inside a trench coat. The agents all work together to understand code and answer users’ queries about it. The smaller agents retrieve information, and one larger reasoning agent synthesizes this information into a coherent answer to a query.

Reflection claims that Asimov already is perceived to outperform some leading AI tools by some measures. In a survey conducted by Reflection, the company found that developers working on large open source projects who asked questions preferred answers from Asimov 82 percent of the time compared to 63 percent for Anthropic’s Claude Code running its model Sonnet 4.

Daniel Jackson, a computer scientist at Massachusetts Institute of Technology, says Reflection’s approach seems promising given the broader scope of its information gathering. Jackson adds, however, that the benefits of the approach remain to be seen, and the company’s survey is not enough to convince him of broad benefits. He notes that the approach could also increase computation costs and potentially create new security issues. “It would be reading all these private messages,” he says.

Reflection says the multiagent approach mitigates computation costs and that it makes use of a secure environment that provides more security than some conventional SaaS tools.

In New York, I met with the startup’s CTO, Ioannis Antonoglou. His expertise training AI models to reason and play games is being applied to having them build code and do other useful chores.

A founding engineer at Google DeepMind, Antonoglou did groundbreaking research on a technique known as reinforcement learning, which was most famously used to build AlphaGo, a program that learned to play the ancient board game Go to a superhuman level using .

Reinforcement learning, which involves training an AI model through practice combined with positive and negative feedback, has come to the fore in the past few years because it provides a way to train a large language model to produce better outputs. Combined with human training, reinforcement learning can train an LLM to provide more coherent and pleasing answers to queries. With additional training, reinforcement learning helps a model learn to perform a kind of simulated reasoning, whereby tricky problems are broken into steps so that they can be tackled more effectively. Asimov currently uses open source models but Reflection is using reinforcement learning to post-train custom models that it says perform even better. Rather than learning to win at a game like Go, the model learns how to build a finished piece of software. Tapping into more data across a company should . Reflection uses data from human annotators and also generates its own synthetic data. It does not train on data from customers.

Big AI companies are already using reinforcement learning to tune agents. An OpenAI tool called Deep Research, for instance, uses feedback from expert humans as a reinforcement learning signal that teaches an agent to comb through websites, hunting for information on a topic, before generating a detailed report.

“We’ve actually built something like Deep Research but for your engineering systems,” Antonoglou says, noting that training on more than just code provides an edge. “We’ve seen that in big engineering teams, a lot of the knowledge is actually stored outside of the codebase.”

Stephanie Zhan, a partner at the investment firm Sequoia, which is backing Reflection, says the startup “punches at the same level as the frontier labs.”

With the AI industry now shooting for superintelligence, and deep pocketed companies like Meta pouring huge sums into hiring and building infrastructure, startups like Reflection may find it more challenging to compete.

I asked Reflection leaders what the path to more advanced might actually look like. They believe an increasingly intelligent agent would go on to become an oracle for companies’ institutional and organizational knowledge. It should learn to build and repair software autonomously. Eventually it would invent new algorithms, hardware, and products autonomously.

The most immediate next step might be less grand. “We’ve actually been talking to customers who’ve started asking, can our technical sales staff, or our technical support team use this?” Laskin says.

Read the full article on Wired.com