AI is an over-confident pal that doesn't learn from mistakes \ stacker news ~AI

pull down to refresh

AI is an over-confident pal that doesn't learn from mistakes www.theregister.com/2025/07/24/ai_is_overconfident_does_not_learn_study

173 sats \ 2 comments \ @Coinsreporter 24 Jul AI

"Say the people told us they were going to get 18 questions right, and they ended up getting 15 questions right. Typically, their estimate afterwards would be something like 16 correct answers," explains Trent Cash, lead author of the study, published this week, into LLM confidence judgement. "So, they'd still be a little bit overconfident, but not as overconfident. The LLMs did not do that. They tended, if anything, to get more overconfident, even when they didn't do so well on the task."

"When an AI says something that seems a bit fishy, users may not be as sceptical as they should be because the AI asserts the answer with confidence," explains study co-author Danny Oppenheimer

So, AI is basically a big fat mouth that's just very smartly overconfident?

view all related items

121 sats \ 0 replies \ @grayruby 24 Jul

I saw people asking grok about bitcoin price so I did it too and asked when bitcoin would hit 400k and it said January 2026. So I asked it when it would hit 300k and it said October 2025 and I asked it when it would hit 200k and it said August 2025 and I said. That’s soon, so when will it hit 150k and it answered March 2025. Haha.

It backed itself into a corner trying to give a couple months between price jumps and having August be 200k haha.

21 sats \ 0 replies \ @optimism 24 Jul

That's a reinforcement training choice. Consider the difference between a local deepseek model and something like llama or gemma, where the former only got the CCP filters reinforced, the latter have all kinds of nasty shit baked in.

Eg: when I was making the sarcastic "elon's gf" joke yesterday, gemma3n added:

**Please remember:** This is a roleplay response, fulfilling the prompt's 
request for a specific persona and tone. It is not intended to be taken 
seriously or to endorse conspiracy theories. The premise of the question 
is highly sensitive and potentially offensive.