pull down to refresh

I assume that the simpering lapdog phase of these models is similar to the freaky finger phase -- it will quickly get polished out.
It's polished out of most of my application of it and I've been wondering if I should spend some GPU time on fine-tuning qwen3. It "reasons" that it should give me a polite answer and then it tries to give me virtual oral pleasure. Which is a waste of my time. These things are supposed to make my life easier and everything 10x faster! That's the promise and they are all massively underdelivering. </end_rant>.
On a more serious note: looking at what Gemma3 does (and my limited exposure to Gemini and ChatGPT in some tests I ran) I can't help but feel that this is the business model for consumer-facing LLM. Optimization for engagement. Like FB, YT, X, and so on.
I suspect that the question of AI reasoning will remain unanswerable.
Why though? It's literally showing what it does and you can precisely track activation through a model... digital things are very measurable.
100 sats \ 1 reply \ @Scoresby OP 23h
digital things are very measurable
Isn't this a bit like saying measuring all the various wavelengths bouncing off a painting can help you know whether or not it is art?
I think the question of AI reasoning may remain unanswerable because whatever is going on in an LLM is different enough from what goes on in a human mind that we'll never be at peace with the comparison.
What is my evidence for this? Probably something like Stanislaw Lem's Solaris -- which I admit is a little shaky. But, I don't think we have a good handle on what reasoning is in human minds, and so measurement or not, it seems to me that we won't be able to make a clear determination about LLM reasoning. It may be that some people think it counts as thinking and maybe others don't.
reply
204 sats \ 0 replies \ @optimism 21h
Isn't this a bit like saying measuring all the various wavelengths bouncing off a painting can help you know whether or not it is art?
Nice analogy!
I'd say to the philosopher it definitely is. It says in the article:
The “principled reasoner” being compared to here simply does not exist. It’s a Platonic ideal.
I agree with that. But what it feels like is that the "template" that is being reinforced during reasoning training 1, is maybe good for math and coding, but not so awesome for other domains. I think what's needed is new alignment policies that aren't solely derived from the experts that are OpenAI/Goog/Anthropic hires - some diversity in alignment would be nice! 2.
I don't think we have a good handle on what reasoning is in human minds
I think the problem is trying to emulate it in the first place. I'm personally always amazed by people that tick differently, and they help me break through barriers in my thinking. I don't need people that think like me around me. Compatible, but not similar?

Footnotes

  1. I cannot help but feel that the method and scoring is the same or at least very similar for all LLMs minus Claude and Gemini at the moment, also because when DeepMind decided to do it wholly differently, it turned out that the results kind of sucked.
  2. I was fantasizing the other day that I'd totally want a mixture of experts LLM where each expert layer has been aligned by a different Bitcoiner... So you basically get the luke-jr layer arguing with the petertodd layer in the reasoning phase. That would be so awesome for those of us that quit the bird app... once a month, on demand. "you're wrong and you know it." -> "you lie!". hah!
reply