I understand you're trying to work around the intrinsic limitation in the models. If not with code and not with RL, do you solve it with prompting?


I.e. how do you solve the "product design" problem?

> Because there is attention degradation in LLMs (due to architecture), the more tokens you generate (and feed into the context / input), the higher the chance that the bot gets it wrong. So that means that conversation quality degrades over time and "space".
> With what code do you make sure it doesn't regress into compound error mode along long-form conversations?


I’m not trying to solve this at the model level.
The bot isn’t designed for infinite, meandering conversations. It’s deliberately shaped around shorter, reflective exchanges, with frequent re-grounding of the question instead of accumulating state indefinitely.
That’s how I avoid compound error: assumptions aren’t allowed to silently pile up. If the premise starts drifting, the bot reframes, pushes back, or slows the conversation down.
In practice, long-form coherence is mostly a product design problem, not something you fix with a special line of code.

layer0studios

Because there is attention degradation in LLMs (due to architecture), the more tokens you generate (and feed into the context / input), the higher the chance that the bot gets it wrong. So that means that conversation quality degrades over time and "space".

With what code do you make sure it doesn't regress into compound error mode along long-form conversations?

bitcoin

Education

I built a Bitcoin chatbot that refuses to give quick answers — feedback wanted

Because there is attention degradation in LLMs (due to architecture), the more tokens you generate (and feed into the context / input), the higher the chance that the bot gets it wrong. So that means that conversation quality degrades over time and "space".

With what code do you make sure it doesn't regress into compound error mode along long-form conversations?