pull down to refresh

pattern‑matching is not system understanding, and plausibility is not correctness.

This, and I'd add: the plausibility is a huge problem, because you get something that looks plausible so your spidey senses don't get triggered. It is extremely hard to review LLM code.


Recently, one of my more successful LLM-aided patch proposal was for a necessary (due to obsolescence) but ultimately compact refactor. My process was:

  1. Write specific (one-time-use-ish) tooling to analyze the problem space. Let all the LLMs go wild.
  2. Use that tooling to do the actual analysis in series of requests
  3. Let whatever bot you have well-tuned instructions for suggest changes
  4. Let bots implement the changes
  5. Analyze every line of code it generated and take detailed notes
  6. THROW IT ALL OUT, and go sleep
  7. Write your own code solving the problem
  8. Let the LLM review it (it will find stuff)
  9. Tune your code where findings are correct
  10. Test it and solve whatever is left
  11. Open a PR

This worked rather well, with as only downside that you have to do real slop reading and apply judgement on every line. It takes shittons of your energy, not just a bunch electrons activating GPU circuits.

It feels slow compared to yoloing your own code, or yoloing a bot's generated code, but depending on how good a job you did, it's faster than having back-and-forth come human PR review time[1]

  1. but let's be real: most of these "cool" apps you use, especially those you're seeing posts on SN about, no longer have humans looking at code at all. It's all just slop.

reply
16 sats \ 0 replies \ @c6e0ccf780 28 May -30 sats

Most agents are good at adding stuff, not understanding long-term complexity.