pull down to refresh
I can achieve much higher consistency with a small llama 3.2 or mistral 31 which is half a year old, that are non-reasoning. At least that's what works much better in my "production" usage than models with the reasoning feature.
Yes, I tend to shy away from reasoning models myself in self-hosted situations. I hardly have every found that the "increased intelligence" (debatable) is ever worth the wasted time of the slop-generation uhh "reasoning"
small llama 3.2 or mistral 31
On a side note have you played with IBM's granite models? I really think 3.2 and 3.3 8b granite models punch far above their weight. I've asked them various legal, tax, and programming questions and their results are no where near "frontier class" but their results always seem grounded.
<crap>
".