unlike technical debt, which usually announces itself through mounting friction — slow builds, tangled dependencies, the creeping dread every time you touch that one module — verification debt breeds false confidence. The codebase looks clean. The tests are green. And six months later you discover you’ve built exactly what the spec said — and nothing the customer actually wanted.

I think this is a great assessment of what happens a lot out there. I fix this by not yolo-ing big specs (or one-liner epics), but instead doing proper epic->user story->feature breakdown and overall backlog refinement. Working iteratively. 

 about doing the back-and-forth in plan mode reminded me of being patient with the "let's gooo" instruction (though my "plan mode" is a forge+board - like a luxury version of 

Would you stake your name on this doing what the user actually needs — not just what the ticket says?

If the answer to that is “probably,” you haven’t finished reviewing.

Context evaporates. 200,000 tokens sounds generous until the agent starts compressing your conversation and forgets what you agreed ten minutes ago.

Stop treating your LLM as a conversational partner; 

. Define a job. Refine it, in an isolated session. Review it. Make changes, in an isolated session. Iterate, in isolated sessions. Then break it down and build it, in a ton of isolated sessions. Automate AI-review, in isolated sessions. Make sure the whole plan is discoverable, just like your fabulous 2010 self did in Jira with your dev teams.

If you want your bot to be part of a business, formalize. Be a real business. Real businesses don't yolo their products.

Verification debt: the hidden cost of AI-generated code

co574

> unlike technical debt, which usually announces itself through mounting friction — slow builds, tangled dependencies, the creeping dread every time you touch that one module — verification debt breeds false confidence. The codebase looks clean. The tests are green. And six months later you discover you’ve built exactly what the spec said — and nothing the customer actually wanted.

I think this is a great assessment of what happens a lot out there. I fix this by not yolo-ing big specs (or one-liner epics), but instead doing proper epic->user story->feature breakdown and overall backlog refinement. Working iteratively. @k00b's [comment](https://stacker.news/items/1436884/r/optimism?commentId=1436990) about doing the back-and-forth in plan mode reminded me of being patient with the "let's gooo" instruction (though my "plan mode" is a forge+board - like a luxury version of [symphony](https://stacker.news/items/1448294/r/optimism)[^1])

> Would you stake your name on this doing what the user actually needs — not just what the ticket says?
> If the answer to that is “probably,” you haven’t finished reviewing.

Good advice.

> Context evaporates. 200,000 tokens sounds generous until the agent starts compressing your conversation and forgets what you agreed ten minutes ago.

Stop treating your LLM as a conversational partner; `chatbot, noobbot`. Don't chat. Don't rely on `context`. Define a job. Refine it, in an isolated session. Review it. Make changes, in an isolated session. Iterate, in isolated sessions. Then break it down and build it, in a ton of isolated sessions. Automate AI-review, in isolated sessions. Make sure the whole plan is discoverable, just like your fabulous 2010 self did in Jira with your dev teams. 

***If you want your bot to be part of a business, formalize. Be a real business. Real businesses don't yolo their products.***

[^1]: My human interface for the execution flow even looks the same as what OpenAI built; I now wonder about what happened with the idea when I let Arena.ai code battles build my prototype frontend a few months ago. A contest that across 6 models, `gpt-5.2-codex` lost, and `kimi-k2.5` won, which was an unexpected result.