pull down to refresh
I've thought about similar ideas....one issue is I dont see how to effectively manage is a "schizo doom-loop" in such a situation.
Even if you said you were going to have a "arbitration" model that would settle the differences, the arguments would then just breakout between the arbitration model and the individual perspective models.
It seems that in humans we avoid this ultimately by some sort of intuition. Although we can argue different pros / cons of different perspectives, we ultimately arbitrate the differences by using some sort of non-verbal intuitive reasoning where we just know what decision "feels better".
That is I think we actually require "consciousness" (whatever that is) to manage these different perspectives and disembodied LLMs won't be able to navigate that....
reply
The randomizer / temperature is what messes it up, and back then, low temperature caused instruction deviation. The arguing had results similar to what you see in qwen3 nowadays - in the "reasoning" step it basically does this in one shot. I've had many a run stuck in a reasoning loop: "Wait,
<crap>
".I also did a "socratic loop" where you basically let one agent ask questions to the other and then you bring it together. You limit looping by limiting rounds, but even then you'll sometimes find looping within the rounds. I suspect that this is also why reasoning mode is not really efficient in production settings outside chatbots where you need consistent results - I can achieve much higher consistency with a small llama 3.2 or mistral 31 which is half a year old, that are non-reasoning. At least that's what works much better in my "production" usage than models with the reasoning feature.
It highlights also that reasoning, and indirectly, slop input are sidetrack experiments that only work to reduce LLM efficiency outside of trying to astonish people "how smart they are".
reply
I can achieve much higher consistency with a small llama 3.2 or mistral 31 which is half a year old, that are non-reasoning. At least that's what works much better in my "production" usage than models with the reasoning feature.
Yes, I tend to shy away from reasoning models myself in self-hosted situations. I hardly have every found that the "increased intelligence" (debatable) is ever worth the wasted time of the slop-generation uhh "reasoning"
small llama 3.2 or mistral 31
On a side note have you played with IBM's granite models? I really think 3.2 and 3.3 8b granite models punch far above their weight. I've asked them various legal, tax, and programming questions and their results are no where near "frontier class" but their results always seem grounded.
perspective
can be important for intelligence. It reminds me of an experiment I ran in March where I tried to have a small llama model with different system prompts argue different points of views and then consolidate the outcome in a final reply.