Claude Code falls short if you don't have a great methodology. It needs feedback loops to work fine; otherwise, it derails. One possible feedback loop is a human reviewing code and testing manually. But there's a better/complementary approach if you want it to work autonomously. On both Elo and Bmg.js, I've started by making sure the testing methodology was effective and scientifically sound. Claude writes the tests, executes them, discovers where it's wrong, and corrects itself. Impressive."
Pretty good article