this was gaining traction on HF today despite being 6 days ancient
Overthinking Problem
Overthinking refers to the tendency of LRMs to generate unnecessarily long, redundant, or overly complex reasoning paths during task execution, which can lead to response latency, increased computational cost, and even degraded answer accuracy. In R1-style LRMs, overthinking typically manifests in the following ways:
- Overthinking Simple Problems: In real-world applications, R1-style LRMs often generate detailed and complete CoT for all inputs, even for simple queries such as “What is 2 + 3?”.
- Unconfident Reasoning Behavior: During reasoning, LRMs often engage in self-verification and reflection. However, when deciding whether to reflect, the model may exhibit low confidence in its intermediate outputs, leading to unnecessary repeated reflection and self-doubt style reasoning loops, thereby exacerbating the overthinking issue.
To mitigate such issues, recent studies have focused on efficient reasoning, which aims to reduce the length and latency of reasoning paths while preserving answer accuracy and reflective behavior.
Although this is a pretty interesting Survey, I cannot help but feel that bottom-line for many tasks, Reasoning/CoT is not helping anything much except selling more tokens (also see #1075448.) Only 2 models have gotten in a loop (the behavior described under point 2) for me the past week:
qwen3
and gpt-oss-120b
, where we just blame the latter on being an abysmal dump of faking it and not even wanting to make it.For me and my use-cases, non-reasoning models are not only more effective, but more cost-efficient, due to reduced output tokens (and with that, GPU time). Though if I were to use LLMs more like a sounding board (like in #1075463), maybe reasoning would be more useful?