pull down to refresh

reasoning abilities
Also see #1080396, it has this point:
Reasoning model traces are full of phrases like “wait, what if we tried” and “I’m not certain, but let’s see if” and “great, so we know for sure that X, now let’s consider Y”. In other words, reasoning is a sophisticated task that requires a sophisticated tool like human language.
Personally I'm not really a theory guy, I just do things intuitively and then either it makes sense or it doesn't and then I choose to clean it up or abandon. But do I go through "wait, what if I tried?" 20 times in the thought process to solve a single query? I don't think so. But maybe I'm a retard.
Give us a decade of familiarity with LLMs (probably even less) and we'll be so reliant on being able to turn to them for information or for conversation or for brainstorming that it will likely feel odd to think "alone."
I found myself coding some Rust yesterday and despite not being very much experienced in that language, I loved my day. I asked 2 questions to a local LLM (goose chat-only, to qwen3) where otherwise I would have done a search, but for the rest I've just been writing code, causing a bunch of compiler errors, learning some improved "idiomatic code patterns" as I go. 1
It is so nice to not need LLMs that talk to you like you're some genius while what you said was mundane af.

Footnotes

  1. the rust compiler is really awesome for telling you not only what you messed up, but also what you should change.
talk to you like you're some genius
I assume that the simpering lapdog phase of these models is similar to the freaky finger phase -- it will quickly get polished out.
I missed that post (#1080396). Thanks for pointing it out.
The Hacker News comment thread he references is a doozy. There are clearly a lot of people who are thinking about this topic, and makes me think I ought to be a little more cautious with my flippant comments.
Whether AI reasoning is “real” reasoning or just a mirage can be an interesting question, but it is primarily a philosophical question.
Nevertheless, here I come with another possibly flippant comment: I suspect that the question of AI reasoning will remain unanswerable. Possibly what we mean isn't even a question. Rather, it's a choice. We are going to have to choose whether we want to treat llms as thinking, reasoning entities or not.
reply
102 sats \ 2 replies \ @optimism 23h
I assume that the simpering lapdog phase of these models is similar to the freaky finger phase -- it will quickly get polished out.
It's polished out of most of my application of it and I've been wondering if I should spend some GPU time on fine-tuning qwen3. It "reasons" that it should give me a polite answer and then it tries to give me virtual oral pleasure. Which is a waste of my time. These things are supposed to make my life easier and everything 10x faster! That's the promise and they are all massively underdelivering. </end_rant>.
On a more serious note: looking at what Gemma3 does (and my limited exposure to Gemini and ChatGPT in some tests I ran) I can't help but feel that this is the business model for consumer-facing LLM. Optimization for engagement. Like FB, YT, X, and so on.
I suspect that the question of AI reasoning will remain unanswerable.
Why though? It's literally showing what it does and you can precisely track activation through a model... digital things are very measurable.
reply
100 sats \ 1 reply \ @Scoresby OP 23h
digital things are very measurable
Isn't this a bit like saying measuring all the various wavelengths bouncing off a painting can help you know whether or not it is art?
I think the question of AI reasoning may remain unanswerable because whatever is going on in an LLM is different enough from what goes on in a human mind that we'll never be at peace with the comparison.
What is my evidence for this? Probably something like Stanislaw Lem's Solaris -- which I admit is a little shaky. But, I don't think we have a good handle on what reasoning is in human minds, and so measurement or not, it seems to me that we won't be able to make a clear determination about LLM reasoning. It may be that some people think it counts as thinking and maybe others don't.
reply
204 sats \ 0 replies \ @optimism 21h
Isn't this a bit like saying measuring all the various wavelengths bouncing off a painting can help you know whether or not it is art?
Nice analogy!
I'd say to the philosopher it definitely is. It says in the article:
The “principled reasoner” being compared to here simply does not exist. It’s a Platonic ideal.
I agree with that. But what it feels like is that the "template" that is being reinforced during reasoning training 1, is maybe good for math and coding, but not so awesome for other domains. I think what's needed is new alignment policies that aren't solely derived from the experts that are OpenAI/Goog/Anthropic hires - some diversity in alignment would be nice! 2.
I don't think we have a good handle on what reasoning is in human minds
I think the problem is trying to emulate it in the first place. I'm personally always amazed by people that tick differently, and they help me break through barriers in my thinking. I don't need people that think like me around me. Compatible, but not similar?

Footnotes

  1. I cannot help but feel that the method and scoring is the same or at least very similar for all LLMs minus Claude and Gemini at the moment, also because when DeepMind decided to do it wholly differently, it turned out that the results kind of sucked.
  2. I was fantasizing the other day that I'd totally want a mixture of experts LLM where each expert layer has been aligned by a different Bitcoiner... So you basically get the luke-jr layer arguing with the petertodd layer in the reasoning phase. That would be so awesome for those of us that quit the bird app... once a month, on demand. "you're wrong and you know it." -> "you lie!". hah!
reply