reply on: Why tiny bee brains could hold the key to smarter AI \ stacker news ~Animal

pull down to refresh

21 sats \ 2 replies \ @optimism 10 Sep \ parent \ on: Why tiny bee brains could hold the key to smarter AI Animal_World

The randomizer / temperature is what messes it up, and back then, low temperature caused instruction deviation. The arguing had results similar to what you see in qwen3 nowadays - in the "reasoning" step it basically does this in one shot. I've had many a run stuck in a reasoning loop: "Wait, <crap>".

I also did a "socratic loop" where you basically let one agent ask questions to the other and then you bring it together. You limit looping by limiting rounds, but even then you'll sometimes find looping within the rounds. I suspect that this is also why reasoning mode is not really efficient in production settings outside chatbots where you need consistent results - I can achieve much higher consistency with a small llama 3.2 or mistral 31 which is half a year old, that are non-reasoning. At least that's what works much better in my "production" usage than models with the reasoning feature.

It highlights also that reasoning, and indirectly, slop input are sidetrack experiments that only work to reduce LLM efficiency outside of trying to astonish people "how smart they are".

111 sats \ 1 reply \ @freetx 10 Sep

I can achieve much higher consistency with a small llama 3.2 or mistral 31 which is half a year old, that are non-reasoning. At least that's what works much better in my "production" usage than models with the reasoning feature.

Yes, I tend to shy away from reasoning models myself in self-hosted situations. I hardly have every found that the "increased intelligence" (debatable) is ever worth the wasted time of the ~~slop-generation~~ uhh "reasoning"

small llama 3.2 or mistral 31

On a side note have you played with IBM's granite models? I really think 3.2 and 3.3 8b granite models punch far above their weight. I've asked them various legal, tax, and programming questions and their results are no where near "frontier class" but their results always seem grounded.

9 sats \ 0 replies \ @optimism 10 Sep

IBM's granite models

Haven't - will put it on the list! Thanks!