pattern‑matching is not system understanding, and plausibility is not correctness.
This, and I'd add: the plausibility is a huge problem, because you get something that looks plausible so your spidey senses don't get triggered. It is extremely hard to review LLM code.
Recently, one of my more successful LLM-aided patch proposal was for a necessary (due to obsolescence) but ultimately compact refactor. My process was:
Write specific (one-time-use-ish) tooling to analyze the problem space. Let all the LLMs go wild.
Use that tooling to do the actual analysis in series of requests
Let whatever bot you have well-tuned instructions for suggest changes
Let bots implement the changes
Analyze every line of code it generated and take detailed notes
THROW IT ALL OUT, and go sleep
Write your own code solving the problem
Let the LLM review it (it will find stuff)
Tune your code where findings are correct
Test it and solve whatever is left
Open a PR
This worked rather well, with as only downside that you have to do real slop reading and apply judgement on every line. It takes shittons of your energy, not just a bunch electrons activating GPU circuits.
It feels slow compared to yoloing your own code, or yoloing a bot's generated code, but depending on how good a job you did, it's faster than having back-and-forth come human PR review time[1]
but let's be real: most of these "cool" apps you use, especially those you're seeing posts on SN about, no longer have humans looking at code at all. It's all just slop. ↩
This, and I'd add: the plausibility is a huge problem, because you get something that looks plausible so your spidey senses don't get triggered. It is extremely hard to review LLM code.
Recently, one of my more successful LLM-aided patch proposal was for a necessary (due to obsolescence) but ultimately compact refactor. My process was:
This worked rather well, with as only downside that you have to do real slop reading and apply judgement on every line. It takes shittons of your energy, not just a bunch electrons activating GPU circuits.
It feels slow compared to yoloing your own code, or yoloing a bot's generated code, but depending on how good a job you did, it's faster than having back-and-forth come human PR review time[1]
but let's be real: most of these "cool" apps you use, especially those you're seeing posts on SN about, no longer have humans looking at code at all. It's all just slop. ↩
Most agents are good at adding stuff, not understanding long-term complexity.