pull down to refresh
100 sats \ 9 replies \ @SimpleStacker 15h \ parent \ on: Why OpenAI’s solution to AI hallucinations would kill ChatGPT tomorrow AI
Agreed. I actually want the AI to be confident. One of my gripes with it is that when I ask it for coding help, it sometimes gives me 3 different implementations. I don't want 3 implementations, I want you to be opinionated on what's the best one. If I don't like it, you can trust me to ask you to re-evaluate.
So the problem isn't that AI's are too confident. The problem is that users put too much trust in the initial output.
reply
That's interesting.
Maybe it's because my trust level in the AI is already low, so I don't expect to actually use any of its implementations (at least word for word). I'm mainly using it to get a sense of "where in the code should I be looking", and "what's the general idea for the solution?" as a quicker alternative than reading and crunching all the code in my own mind.
I'm still gonna crunch enough code to understand what's going on, so the purpose of the AI is more like "find me the best jumping off point"
reply
reply
Usually yes. So far they've done a decent job in finding the right parts of the code to be looking at, and their suggested solutions are usually on the right track (but usually not something you can just copy paste)
Yeah, I rarely use it for code except when I try something new to see what it can do. But then, I've spent 95% of my time reviewing other people's code the last decade, so for me it's not much use in production. I've tried doing AI-enhanced code review where I feed it the resulting code of a diff, but it didn't really work well for me on c++ code. I'm still a skeptic when it comes to production usage really. Maybe autocomplete, but the one in my rich-ish text editor works fine for me.
reply
Lower temperature, non-reasoning may improve here. Also
***IMPORTANT: BE CONCISE!***
at the bottom of the system prompt may work due to the horrors of chat training. Which to me is still the most ridiculous thing ever.I still have to test InternVL 3.5 (#1194686) in coding abilities because they claim to beat Claude 3.7 with a 14b model, so I'd like to see what's what with that, when I get a moment of peace.