Context Rot: How Increasing Input Tokens Impacts LLM Performance

Scoresby

@freetx offered a nice analogy in https://stacker.news/items/1026911/r/optimism?commentId=1027038 about the "teapot test" - basically that LLMs often hyperfocus on some part of the instruction. I've seen this happen with smaller reasoning models and even posted/complained about such hyperfocus-on-nonsense behavior in the past (https://stacker.news/items/987239/r/optimism?commentId=987426).

This is why I want to invest a bit in fine-tuned mcp. I'm starting to think that the tool descriptions take too much context away from the problem we want to solve. I'm trying to treat the models as if they have an attention disorder to see if that improves results.