pull down to refresh

After 36 hours (without consistent sleep - I am way too old for that but Red Bull GmbH is really pleased with me once more) of intense Claude 4.5 Sonnet usage, and being delighted to see that it improved a lot vs 4.0; I have findings to share.

TLDR; if you don't know what you're doing, you're still gonna suck at coding, Claude won't fix that.

Let me start by stating that it feels to me that Anthropic has definitely raised the bar with this one. ChatGPT, Grok, Mistral can't beat this right now. Neither can Qwen3-coder, GLM 4.6, or InternVL 3.5. So I must admit: good job Anthropic!
Most important finding: Claude, despite the sales pitch, is definitely not an expert coder.
Claude still fails with things that an expert coder does all day every day, like code that is heavy on concurrency or uses binary protocols - basically anything that isn't json - even when I'm doing this in python. I haven't tried having it to do systems programming because of this - if it cannot do complex operations in python then I won't give it rust or c++. Step-by-step, because previous attempts have taught me to not throw it off the deep end; it just doesn't work like that and you need to continuously tune the instructions based on what you're observing. Ultimately, I may be able to make a good instruction for it to code c++ but this would take me quite some time, as it's not science (nor engineering) but simple trial & error.
However, once I explain it what it does wrong and it does the annoying thing1, it does fix problems it created. This is why, I'm sorry, noobs can't code with Claude either. You have to understand what you're doing.
It also still has attention issues so you'll see it create tons of bugs, struggle to understand complex project structure, forget that you cannot just change method signatures without checking your implementation, and so on. There are definitely gold mines to be had in selling tokens and selling CI infra, because it's not that great. The more repetitive I make it, the better it performs though (and the more tokens it burns.)
Here's the big thing: it is really good at making cli tools using bash, python or nodejs; like exceptionally good. And then it can use the tools it made seamlessly (if you use Claude Code) and boost its own productivity by improving the tools, as long as you remember to nudge it about assessing things that can be improved in the toolset.
This pattern is actually interesting: let the LLM code the toolset, let it use it, improve it, and gain massive productivity. This is now on my list to test with future open-weight coding models. Or even let Claude code tools and let the open model use the tools.
I let it build a cli tool around git-bug and with some instruction tuning it now uses that with its home-grown tool bugctl (I had it do two rounds of making up names to come up with that one), logs things nicely without being dependent on GitHub or other platforms.
It's still a bit eager to close issue reports and jumps the gun very often, like your intern would, which I why I have deny-listed git add/commit/pull/push. I just review everything with git add -i.

Footnotes

  1. IT STILL SAYS "YOU'RE ABSOLUTELY RIGHT", which is Claude-speak for "I'm sorry I fucked up". If it would just stfu and fix the problem, it would save me money and time and annoyance. See also #1246980 lol
100 sats \ 3 replies \ @k00b 5h
I've been using 4.5 in MAX mode in cursor since it was released. It does seem better in some qualitative way. I haven't done a hard vibe with it yet but I was using it to make large changes to my migration and it did a pretty good job.
My biggest gripe is that it more often than not starts on a plausible but faulty premise. It's like it begins recursively reasoning a little too early. Once you've steered it the right way it's pretty great.
After I get my monthly admin done, I'm hoping to vibe some tests with it.
reply
Hmm. Why did you decide that you needed max mode?
reply
100 sats \ 1 reply \ @k00b 4h
I hadn't used it yet so I was curious. Extra cost is worth it if it produces better results with less effort and purportedly that's what it does. Jury is still out though
reply
yeah I'm skeptical about it. I have better results with a clean instruction file and context reset per "task", but I have the issue system now for context, which helps a lot...
even if I just tell Claude to file an issue, then to plan a resolution of that issue, and then to execute the resolution. And clean context between these runs actually works best - I probably will lose some cached tokens this way, but I doubt that it's cheaper to just let it grow forever.
reply
Took it for a short spin in cursor but quickly reverted back to Gemini
Have always found Claude gets too easily distracted
reply
17 sats \ 1 reply \ @optimism OP 5h
I didn't use Cursor but Claude Code for this - but I personally detest IDEs so this is anyway working nicer for me personally, like Gemini CLI / Codex.
reply
I need to force myself to use gem cli more to see if I can learn to like it, would probably be more efficient if I got used to it... Always rage quit when I try a regular shell command and get reminded they can't just do that
reply
Doesn’t get rid of programmers
Makes good ones better
Bullish!
reply
10 sats \ 1 reply \ @optimism OP 6h
I think so, yes. And I think that this isn't exclusive to programmers. I expect that it makes good lawyers, doctors, PAs, engineers... all... better.
reply
I use ChatGPT a lot, and often call it out for being wrong, and it’ll correct itself.
But even if it’s wrong, I still learn from it and it’s always willing to talk to me and work thru a problem together
reply
100 sats \ 1 reply \ @d680ecaa8e 8h
I dislike python concept because even if you want to code as backend and you code html as frontend in django it is not efficient as nest js coding. Python purpose is calculation, automatisation or making games or handling process that are little complication like datascience (it is another thing).
reply
I wouldn't use python to make a website. The main benefit is that it is extremely readable and this helps when you have some LLM spit out thousands of lines of code that you have to review before you commit. Even though javascript isn't extremely hard to read, I feel that it is less pleasant for review, at least for me personally.
reply