pull down to refresh
Because there is attention degradation in LLMs (due to architecture), the more tokens you generate (and feed into the context / input), the higher the chance that the bot gets it wrong. So that means that conversation quality degrades over time and "space".
With what code do you make sure it doesn't regress into compound error mode along long-form conversations?
I’m not trying to solve this at the model level.
The bot isn’t designed for infinite, meandering conversations. It’s deliberately shaped around shorter, reflective exchanges, with frequent re-grounding of the question instead of accumulating state indefinitely.
That’s how I avoid compound error: assumptions aren’t allowed to silently pile up. If the premise starts drifting, the bot reframes, pushes back, or slows the conversation down.
In practice, long-form coherence is mostly a product design problem, not something you fix with a special line of code.
I understand you're trying to work around the intrinsic limitation in the models. If not with code and not with RL, do you solve it with prompting?
I.e. how do you solve the "product design" problem?
It’s handled at the interaction level. The bot regularly restates and challenges assumptions instead of building endlessly on prior context, and older turns aren’t treated as sacred state. Prompting helps, but conversation design does most of the work.
Because there is attention degradation in LLMs (due to architecture), the more tokens you generate (and feed into the context / input), the higher the chance that the bot gets it wrong. So that means that conversation quality degrades over time and "space".
With what code do you make sure it doesn't regress into compound error mode along long-form conversations?