Context Rot: How Increasing Input Tokens Impacts LLM Performance \ stacker news ~AI

pull down to refresh

Context Rot: How Increasing Input Tokens Impacts LLM Performance research.trychroma.com/context-rot

304 sats \ 2 comments \ @Scoresby 14 Jul AI

model performance degrades as input length increases, often in surprising and non-uniform ways.

Long context evaluations for these models often demonstrate consistent performance across input lengths. However, these evaluations are narrow in scope and not representative of how long context is used in practice. The most commonly used test, Needle in a Haystack (NIAH), is a simple lexical retrieval task often used to generalize a model’s ability to reliably handle long context.

The researchers tested llms with focused prompts (~300 tokens) and full prompts (~113k tokens):

Across all models, we see significantly higher performance on focused prompts compared to full prompts.

More broadly, our findings point to the importance of context engineering: the careful construction and management of a model’s context window. Where and how information is presented in a model’s context strongly influences task performance, making this a meaningful direction of future work for optimizing model performance.

view all related items

60 sats \ 0 replies \ @optimism 14 Jul

@freetx offered a nice analogy in #1027038 about the "teapot test" - basically that LLMs often hyperfocus on some part of the instruction. I've seen this happen with smaller reasoning models and even posted/complained about such hyperfocus-on-nonsense behavior in the past (#987426).

This is why I want to invest a bit in fine-tuned mcp. I'm starting to think that the tool descriptions take too much context away from the problem we want to solve. I'm trying to treat the models as if they have an attention disorder to see if that improves results.

102 sats \ 0 replies \ @elvismercury 15 Jul

All of these context questions are equally interesting to consider wrt people. The power of what you try to bring to mind is significant in constructing reality.