pull down to refresh
I'm not a developer so it's not natural for me to go snooping around the code looking for architectural issues.
Fix this first thing. So for every implementation PR just trigger Gemini to compare against design specs and at a threshold of more than a couple deviations, adjust task design to highlight what not to do, and throw the original. Always cheaper to catch things before merge.
Although I'm working in the terminal, I use descriptive branch names that match dedicated worktrees.
I use feature branch trees too, but even though I HitL (bots don't get to have merge rights) this works pretty well. I do review everything tho. If some bot gives me a massive PR I just reject it and tell it to not be a sloppy biatch.
Fix this first thing
I don't have design specs. I just keep sprints as tight as possible and constantly smoke test my way towards what I am trying to build. I don't know what design specs would look like to be honest, as in how extensive and detailed, whether diagrammatic or markdown.
I did try to be spec driven but after a few sprints it became apparent that the specs generated were directionally correct but woefully inadequate, and that I just had to feel my way forward. It would be nice to start with a lot of boiler plate that covered every possible design pattern, but I'm using Wordpress and don't want to spend time figuring out how to do it right rather than just getting it done as proof of concept. I can worry about perfect architecture when it's working end to end with real traction.
I don't have design specs.
Then how do you know if something is architecturally right or wrong?
I don't know what design specs would look like to be honest, as in how extensive and detailed, whether diagrammatic or markdown.
I most often just let it plan to implementation, so it's like:
- I want something, I write down a bunch of acceptance criteria
- I let a bot identify gaps and confusing things in my criteria, I clarify
- When I'm happy with that spec (can literally just be a 2-level markdown list) I let a bot plan the design at high level, break up into stages
- For each stage, detailed design -> stage design doc
- When I'm happy with that, "execute the plan"
- On PR, dispatch a context-free bot for each of:
- compliance with acceptance
- Analyzing CI - which includes lint, dep tracking, unit tests + coverage, integration tests (if possible at that point) and security audits
- ----- synchronize, make decision to keep or throw ----
- robustness review
- completeness review
- analyze all newly written tests for omissions
- integration issues
- logic errors
- ---- synchronize, at this point there are on average 50-80 findings ----
- Prioritize, build new task list, goto 1.
woefully inadequate
k00b reminded me to not be impatient on spec phase. I spend about 50% of my time in spec, 40% in review, 10% having a beer
I can worry about perfect architecture when it's working end to end with real traction.
I get that btw. I don't understand why you'd need so many agents for a PoC though. Just throw it away.
It's a very complex proof of concept. I try to break everything down to the narrowest detail so that it can do is tasks without compaction.
I have 5-10 Claude sessions open at any moment, and another 5 Codex sessions, and then Gemini and Codex conduct code reviews automatically on each PR.
I have it structured so that there are a couple of seasons that work high level, then each sprint gets an orchestrator, and each task in a sprint gets an implementation session.
I find my biggest problem is working on something complex, then waiting for it to finish, moving to another sprint to move it along and losing my focus.
I have ADHD and moving between different unrelated complex tasks is difficult, easy to forget what the sprint was about entirely.
I have multiple staging environments and protocols for testing and deployment, different sessions can test autonomously, and they can use Chrome and SSH, so they can test autonomously.
Right now I'm struggling with regression testing and making sure I'm implementation sessions don't duplicate code we already have. I'm not a developer so it's not natural for me to go snooping around the code looking for architectural issues.
I think these problems are solvable with some reasonable UI principles, e.g. recognition over recall. Although I'm working in the terminal, I use descriptive branch names that match dedicated worktrees.
If I could configure my setup so that Claude Code in the terminal would get labelled as "Ready" or "Working" and labelled using status definitions that I can set then I'd be able to keep track of the state and the progress of agents with less cognitive overhead on he operational side and more attention to the details of the tasks.