Bitcoin

Bitcoin

$110,856.57

BTC -1.86%

Ethereum

Ethereum

$4,002.82

ETH -2.10%

  • Login
  • Register
Metaverse Media Group
  • Home
  • Crypto
  • NFTs
  • Artificial Intelligence
  • More
    • Technology
    • Business
    • Newsletter
No Result
View All Result
  • Home
  • Crypto
  • NFTs
  • Artificial Intelligence
  • More
    • Technology
    • Business
    • Newsletter
No Result
View All Result
Metaverse Media Group

Yet another study finds that overloading LLMs with information leads to worse results

Yet another study finds that overloading LLMs with information leads to worse results

The Decoderby The Decoder
21 July 2025
Large language models are supposed to handle millions of tokens – the fragments of words and characters that make up their inputs – at once. But the longer the context, the worse their performance gets. The article Yet another study finds that overloading LLMs with information leads to worse results appeared first on THE DECODER….

summary
Summary

Large language models are supposed to handle millions of tokens – the fragments of words and characters that make up their inputs – at once. But the longer the context, the worse their performance gets.

That’s the takeaway from a new study by Chroma Research. Chroma, which makes a vector database for AI applications, actually benefits when models need help pulling in information from outside sources. Still, the scale and methodology of this study make it noteworthy: Researchers tested 18 leading AI models, including GPT, Claude, Gemini, and Qwen, across four types of tasks. These included semantic search, repetition challenges, and question-answering in lengthy documents.

Beyond word matching

The research builds on the familiar “needle in a haystack” benchmark, where a model must pick out a specific sentence hidden inside a long block of irrelevant text. The Chroma team criticized this test for only measuring literal string matching, so they modified the test to require true semantic understanding.

Specifically, they moved beyond simple keyword recognition in two key ways. First, instead of asking a question that used the same words as the hidden sentence, they posed questions that were only semantically related. For example, in a setup inspired by the NoLiMa benchmark, a model might be asked “Which character has been to Helsinki?” when the text only states that “Yuki lives next to the Kiasma museum.” To answer, the model must make an inference based on world knowledge, not just keyword matching.

THE DECODER Newsletter
The most important AI news straight to your inbox.
✓ Weekly
✓ Cancel at any time

The models found this much more difficult; performance dropped sharply on these semantic questions, and the problem grew worse as the context got longer.

Second, the study looked at distractors: statements similar in content but incorrect. Adding even a single distractor noticeably reduced success rates, with different impacts depending on the distractor. With four distractors, the effect was even stronger. Claude models often refused to answer, while GPT models tended to give wrong but plausible-sounding responses.

Image: Hong et al.
Recommend our article

Structure matters (but not how you’d expect)

Structure also played a surprising role. Models actually did better when the sentences in a text were randomly mixed, compared to texts organized in a logical order. The reasons aren’t clear, but the study found that context structure, not just content, is a major factor for model performance.

The researchers also tested more practical scenarios using LongMemEval, a benchmark with chat histories over 100,000 tokens long. In this separate test, a similar performance drop was observed: performance fell when models had to work with the full conversation history, compared to when they were given only the relevant sections.

The study’s recommendation: use targeted “context engineering” – picking and arranging the most relevant information in a prompt – to help large language models stay reliable in real-world scenarios. Full results are available on Chroma Research, and a toolkit for replicating the results is available for download on GitHub.

Recommendation

Other labs find similar problems

Chroma’s results line up with findings from other research groups. In May 2025, Nikolay Savinov at Google Deepmind explained that when a model receives a large number of tokens, it has to divide its attention across the entire input. As a result, it’s always beneficial to trim irrelevant content and keep the context focused, since concentrating attention on what’s important helps the model perform better.

A study from LMU Munich and Adobe Research found much the same thing. On the NOLIMA benchmark, which avoids literal keyword matches, even reasoning-focused models suffered major performance drops as context length increased.

Microsoft and Salesforce reported similar instability in longer conversations. In multi-turn dialogues where users spell out their requirements step by step, accuracy rates fell from 90 percent all the way down to 51 percent.

One of the most striking examples is Meta’s Llama 4 Maverick. While Maverick can technically handle up to ten million tokens, it struggles to make meaningful use of that capacity. In a benchmark designed to reflect real-world scenarios, Maverick achieved just 28.1 percent accuracy with 128,000 tokens – far below its technical maximum and well under the average for current models. In these tests, OpenAI’s o3 and Gemini 2.5 currently deliver the strongest results.

Join our community
Join the DECODER community on Discord, Reddit or Twitter – we can’t wait to meet you.

Read the full article on The-Decoder.com
in AI
Reading Time: 4 mins read
0
0
26
VIEWS
Share on TwitterShare on Facebook

Subscribe to our newsletter

For the latest news & monthly prize giveaways
Join Now

Subscribe to our newsletter

For the latest news & monthly prize giveaways
Join Now
ADVERTISEMENT

Related Posts

OpenAI prepares to launch GPT-5, but big leaps are unlikely
AI

OpenAI prepares to launch GPT-5, but big leaps are unlikely

2 months ago
29
Psychologist says ChatGPT helps understanding even if it may not understand
AI

Psychologist says ChatGPT helps understanding even if it may not understand

2 months ago
26
Under mounting pressure, Apple plans to increase its spending on artificial intelligence projects
AI

Under mounting pressure, Apple plans to increase its spending on artificial intelligence projects

3 months ago
26

Comments

Please login to join discussion
ADVERTISEMENT

Latest News

  • All
  • Crypto
  • NFTs
  • Technology
  • Business
XRP Positioned to Play Key Role in Bridging Tokenized Assets Across Jurisdictions, Says Ripple CTO
Crypto

XRP Positioned to Play Key Role in Bridging Tokenized Assets Across Jurisdictions, Says Ripple CTO

Bitcoin.com News
by Bitcoin.com News
2 months ago
189
SEC v Ripple: SEC Has Just Days Left to Respond on XRP Case, Says Legal Expert
Crypto

SEC v Ripple: SEC Has Just Days Left to Respond on XRP Case, Says Legal Expert

Bitcoin.com News
by Bitcoin.com News
2 months ago
31
XRP ETF From Teucrium Sees Enormous Interest, Gains Massive Traction With Extraordinary Inflows
Crypto

XRP ETF From Teucrium Sees Enormous Interest, Gains Massive Traction With Extraordinary Inflows

Bitcoin.com News
by Bitcoin.com News
2 months ago
31
Latam Insights: El Salvador’s Bitcoin ‘Shuffling,’ Brazil Gets 50% Tariffs
Crypto

Latam Insights: El Salvador’s Bitcoin ‘Shuffling,’ Brazil Gets 50% Tariffs

Bitcoin.com News
by Bitcoin.com News
2 months ago
37
Mining Crunch? Bitcoin Hashrate Slides Below 900 EH/s
Crypto

Mining Crunch? Bitcoin Hashrate Slides Below 900 EH/s

Bitcoin.com News
by Bitcoin.com News
2 months ago
33
Justin Sun Makes History as Youngest Chinese Commercial Astronaut With Blue Origin’s NS-34 Spaceflight
Crypto

Justin Sun Makes History as Youngest Chinese Commercial Astronaut With Blue Origin’s NS-34 Spaceflight

Bitcoin.com News
by Bitcoin.com News
2 months ago
26
Load More
Next Post
How WIRED Analyzed the Epstein Video

How WIRED Analyzed the Epstein Video

ADVERTISEMENT

Follow Us

Categories

  • Crypto
  • NFTs
  • AI
  • Technology
  • Business
  • Crypto
  • NFTs
  • AI
  • Technology
  • Business
Subscribe to our Newsletter

© 2022 Metaverse Media Group – The Metaverse Mecca

Privacy and Cookie Policy | Sitemap

Welcome Back!

Sign In with Google
OR

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Sign Up with Google
OR

Fill the forms below to register

*By registering into our website, you agree to the Terms & Conditions and Privacy Policy.
All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Home
  • Crypto
  • NFTs
  • Artificial Intelligence
  • More
    • Technology
    • Business
    • Newsletter
Bitcoin

Bitcoin

$110,856.57

BTC -1.86%

Ethereum

Ethereum

$4,002.82

ETH -2.10%

  • Login
  • Sign Up
This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.

Subscribe to our newsletter

Get the latest news & win monthly prizes

Subscribe to our newsletter

For the Latest News and Monthly Prize Giveaways

Join Now
Join Now