pull down to refresh

this was gaining traction on HF today despite being 6 days ancient

Overthinking Problem

Overthinking refers to the tendency of LRMs to generate unnecessarily long, redundant, or overly complex reasoning paths during task execution, which can lead to response latency, increased computational cost, and even degraded answer accuracy. In R1-style LRMs, overthinking typically manifests in the following ways:
  1. Overthinking Simple Problems: In real-world applications, R1-style LRMs often generate detailed and complete CoT for all inputs, even for simple queries such as “What is 2 + 3?”.
  2. Unconfident Reasoning Behavior: During reasoning, LRMs often engage in self-verification and reflection. However, when deciding whether to reflect, the model may exhibit low confidence in its intermediate outputs, leading to unnecessary repeated reflection and self-doubt style reasoning loops, thereby exacerbating the overthinking issue.
To mitigate such issues, recent studies have focused on efficient reasoning, which aims to reduce the length and latency of reasoning paths while preserving answer accuracy and reflective behavior.

Although this is a pretty interesting Survey, I cannot help but feel that bottom-line for many tasks, Reasoning/CoT is not helping anything much except selling more tokens (also see #1075448.) Only 2 models have gotten in a loop (the behavior described under point 2) for me the past week: qwen3 and gpt-oss-120b, where we just blame the latter on being an abysmal dump of faking it and not even wanting to make it.
For me and my use-cases, non-reasoning models are not only more effective, but more cost-efficient, due to reduced output tokens (and with that, GPU time). Though if I were to use LLMs more like a sounding board (like in #1075463), maybe reasoning would be more useful?
100 sats \ 1 reply \ @Tony 10h
Strong point. No matter how mighty reasoning models look, machines need guidance.
I feel like the CEOs of AI companies are rushing into rolling out something that makes all the decisions for end users. And it feels great for an average Joe, but is absolutely unsustainable for a builder.
Makes me wonder how much fiat they are burning through along the way. GPT5 is available for free now and while the GPT app jumps between the models at own will, Copilot lets you specifically choose it while asking a question. Can’t imagine how much computing power goes into this considering their user base.
reply
I feel like the CEOs of AI companies are rushing into rolling out something that makes all the decisions for end users. And it feels great for an average Joe, but is absolutely unsustainable for a builder.
Isn't that part of the mirage needed to get those fiat billions? The message underneath of "you'll never have to do unpleasant work again" is what's being sold. And that's kind of a mirage on its own, because we'll just find something else unpleasant that needs to be done. Like correcting a dumb af chatbot that keeps on generating the same mistake over and over. I'd much rather spend my time on helping a human that retains skills. Emulation of reasoning from a static base of linguistic weights is just another gadget, a hack. It doesn't enhance actual intelligence. A sneaky sales pitch to show progress towards a questionable goal through hacks.
So we're dealing with layer upon layer of questionable narratives that are ridden by a few companies. For chatbots in particular I expect to, in 10-15 years, be amazed that there will be people that still use them, like I am today by people that still use Facebook or whatsapp, or manage their photo's through Google.
I feel that this is our failure - we being people that understand (even if just partially or conceptually) how these systems work - to educate the world. We don't have a good framework to counter distractions like chatgpt, just like we didn't have that for fb. In fact, we don't even have a simple, proven strategy against shitcoins and other scams!
So there's much to learn, but even more wisdom from what we've learned to share with others. I'm hoping to find a way to scale that.
reply