Bitcoin

Bitcoin

$117,685.47

BTC 0.11%

Ethereum

Ethereum

$3,781.68

ETH 0.86%

  • Login
  • Register
Metaverse Media Group
  • Home
  • Crypto
  • NFTs
  • Artificial Intelligence
  • More
    • Technology
    • Business
    • Newsletter
No Result
View All Result
  • Home
  • Crypto
  • NFTs
  • Artificial Intelligence
  • More
    • Technology
    • Business
    • Newsletter
No Result
View All Result
Metaverse Media Group

Yet another study finds that overloading LLMs with information leads to worse results

Yet another study finds that overloading LLMs with information leads to worse results

The Decoderby The Decoder
21 July 2025
Large language models are supposed to handle millions of tokens – the fragments of words and characters that make up their inputs – at once. But the longer the context, the worse their performance gets. The article Yet another study finds that overloading LLMs with information leads to worse results appeared first on THE DECODER….

summary
Summary

Large language models are supposed to handle millions of tokens – the fragments of words and characters that make up their inputs – at once. But the longer the context, the worse their performance gets.

That’s the takeaway from a new study by Chroma Research. Chroma, which makes a vector database for AI applications, actually benefits when models need help pulling in information from outside sources. Still, the scale and methodology of this study make it noteworthy: Researchers tested 18 leading AI models, including GPT, Claude, Gemini, and Qwen, across four types of tasks. These included semantic search, repetition challenges, and question-answering in lengthy documents.

Beyond word matching

The research builds on the familiar “needle in a haystack” benchmark, where a model must pick out a specific sentence hidden inside a long block of irrelevant text. The Chroma team criticized this test for only measuring literal string matching, so they modified the test to require true semantic understanding.

Specifically, they moved beyond simple keyword recognition in two key ways. First, instead of asking a question that used the same words as the hidden sentence, they posed questions that were only semantically related. For example, in a setup inspired by the NoLiMa benchmark, a model might be asked “Which character has been to Helsinki?” when the text only states that “Yuki lives next to the Kiasma museum.” To answer, the model must make an inference based on world knowledge, not just keyword matching.

THE DECODER Newsletter
The most important AI news straight to your inbox.
✓ Weekly
✓ Cancel at any time

The models found this much more difficult; performance dropped sharply on these semantic questions, and the problem grew worse as the context got longer.

Second, the study looked at distractors: statements similar in content but incorrect. Adding even a single distractor noticeably reduced success rates, with different impacts depending on the distractor. With four distractors, the effect was even stronger. Claude models often refused to answer, while GPT models tended to give wrong but plausible-sounding responses.

Image: Hong et al.
Recommend our article

Structure matters (but not how you’d expect)

Structure also played a surprising role. Models actually did better when the sentences in a text were randomly mixed, compared to texts organized in a logical order. The reasons aren’t clear, but the study found that context structure, not just content, is a major factor for model performance.

The researchers also tested more practical scenarios using LongMemEval, a benchmark with chat histories over 100,000 tokens long. In this separate test, a similar performance drop was observed: performance fell when models had to work with the full conversation history, compared to when they were given only the relevant sections.

The study’s recommendation: use targeted “context engineering” – picking and arranging the most relevant information in a prompt – to help large language models stay reliable in real-world scenarios. Full results are available on Chroma Research, and a toolkit for replicating the results is available for download on GitHub.

Recommendation

Other labs find similar problems

Chroma’s results line up with findings from other research groups. In May 2025, Nikolay Savinov at Google Deepmind explained that when a model receives a large number of tokens, it has to divide its attention across the entire input. As a result, it’s always beneficial to trim irrelevant content and keep the context focused, since concentrating attention on what’s important helps the model perform better.

A study from LMU Munich and Adobe Research found much the same thing. On the NOLIMA benchmark, which avoids literal keyword matches, even reasoning-focused models suffered major performance drops as context length increased.

Microsoft and Salesforce reported similar instability in longer conversations. In multi-turn dialogues where users spell out their requirements step by step, accuracy rates fell from 90 percent all the way down to 51 percent.

One of the most striking examples is Meta’s Llama 4 Maverick. While Maverick can technically handle up to ten million tokens, it struggles to make meaningful use of that capacity. In a benchmark designed to reflect real-world scenarios, Maverick achieved just 28.1 percent accuracy with 128,000 tokens – far below its technical maximum and well under the average for current models. In these tests, OpenAI’s o3 and Gemini 2.5 currently deliver the strongest results.

Join our community
Join the DECODER community on Discord, Reddit or Twitter – we can’t wait to meet you.

Read the full article on The-Decoder.com
in AI
Reading Time: 4 mins read
0
0
22
VIEWS
Share on TwitterShare on Facebook

Subscribe to our newsletter

For the latest news & monthly prize giveaways
Join Now

Subscribe to our newsletter

For the latest news & monthly prize giveaways
Join Now
ADVERTISEMENT

Related Posts

Google DeepMind’s Gemini wins Mathematical Olympiad gold using only natural language
AI

Google DeepMind’s Gemini wins Mathematical Olympiad gold using only natural language

4 hours ago
21
OpenAI’s math gold hints that AI may soon tackle even longer and harder tasks
AI

OpenAI’s math gold hints that AI may soon tackle even longer and harder tasks

7 hours ago
20
AI training shifts from clickworkers to experts in physics, biology and engineering
AI

AI training shifts from clickworkers to experts in physics, biology and engineering

7 hours ago
21

Comments

Please login to join discussion
ADVERTISEMENT

Latest News

  • All
  • Crypto
  • NFTs
  • Technology
  • Business
Strategy Files for IPO of New ‘Stretch’ Preferred Stock to Fund More Bitcoin Buys
Crypto

Strategy Files for IPO of New ‘Stretch’ Preferred Stock to Fund More Bitcoin Buys

Bitcoin.com News
by Bitcoin.com News
23 minutes ago
19
Nasdaq Firm DeFi Dev Corp. Hits 999,999 SOL Treasury Mark
Crypto

Nasdaq Firm DeFi Dev Corp. Hits 999,999 SOL Treasury Mark

Bitcoin.com News
by Bitcoin.com News
38 minutes ago
20
GENIUS Act Poised to Reshape Global Stablecoin Landscape, Experts Say
Crypto

GENIUS Act Poised to Reshape Global Stablecoin Landscape, Experts Say

Bitcoin.com News
by Bitcoin.com News
1 hour ago
20
The Great Bitcoin Stack: 64 Public Firms Now Hold Over $100B in Bitcoin
Crypto

The Great Bitcoin Stack: 64 Public Firms Now Hold Over $100B in Bitcoin

Bitcoin.com News
by Bitcoin.com News
2 hours ago
22
Gamesquare Boosts ETH Treasury Holdings, Launches NFT Yield Strategy
Crypto

Gamesquare Boosts ETH Treasury Holdings, Launches NFT Yield Strategy

Bitcoin.com News
by Bitcoin.com News
3 hours ago
22
Trump’s Commerce Secretary Loves Tariffs. His Former Investment Bank Is Taking Bets Against Them
Business

Trump’s Commerce Secretary Loves Tariffs. His Former Investment Bank Is Taking Bets Against Them

Wired
by Wired
4 hours ago
22
Load More
Next Post
How WIRED Analyzed the Epstein Video

How WIRED Analyzed the Epstein Video

ADVERTISEMENT

Follow Us

Categories

  • Crypto
  • NFTs
  • AI
  • Technology
  • Business
  • Crypto
  • NFTs
  • AI
  • Technology
  • Business
Subscribe to our Newsletter

© 2022 Metaverse Media Group – The Metaverse Mecca

Privacy and Cookie Policy | Sitemap

Welcome Back!

Sign In with Google
OR

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Sign Up with Google
OR

Fill the forms below to register

*By registering into our website, you agree to the Terms & Conditions and Privacy Policy.
All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Home
  • Crypto
  • NFTs
  • Artificial Intelligence
  • More
    • Technology
    • Business
    • Newsletter
Bitcoin

Bitcoin

$117,685.47

BTC 0.11%

Ethereum

Ethereum

$3,781.68

ETH 0.86%

  • Login
  • Sign Up
This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.

Subscribe to our newsletter

Get the latest news & win monthly prizes

Subscribe to our newsletter

For the Latest News and Monthly Prize Giveaways

Join Now
Join Now