pull down to refresh

Since LLMs are deterministic (it has an absolute set of weights that doesn't get updated) there are some randomizers involved to make chatbots not repeat themselves and make them more "human".
So for example, it checks how often it said "yes", and if it matches some threshold it will not say "yes" again. To make it even more "human", all these thresholds are dynamic, and the window in which it is evaluated is often dynamic too.
Most of this is controlled by temperature, which "globally" scales how much randomness is used. Lower values allow for less randomness. You may want to play around with this (I'm quite sure I've seen that in Venice chat settings.)
Depending on your model used, there are recommended values. If these don't work for your use-case because of too many hallucinations, try lowering them. I.e if you have 0.5 now, try 0.45 or 0.4.