pull down to refresh
133 sats \ 11 replies \ @Undisciplined 7h \ parent \ on: On the Inevitability of Left-Leaning Political Bias in Aligned Language Models AI
The nature of political polarization is that almost everything becomes politically coded. There may be no good reason for various opinions to be clustered together, but they are.
So, when there's an extreme censorship campaign against one side, as there clearly was, the available training data will be biased towards the side that wasn't censored.
Okay. So let's take DeepSeek. There are some obvious things censored in there. Did this happen during ingestion because data was left out, or post-ingestion through reinforcement learning?
reply
I'm not even talking about the specifics of what was included or excluded for the purpose of training.
We had an intense decade-long period of big tech censorship online. If these models are training on what's available online, that is a very biased dataset and there's no way to include the missing stuff because people began self-censoring to avoid being demonitized.
reply
reply
That doesn't mean much to me.
I'm talking about bias in the information produced for and available to the world. It's not about some specific training set. There's no available unbiased dataset.
reply
What I'm saying is that these bots aren't as much influenced by the dataset that they're ingesting (literally pirated libraries), as by the follow-up training where it is adjusted so that it answers questions correctly.
Some obscure offensive answer popping up is mostly when it doesn't do what it's trained to do, no matter what you find offensive. The general alignment to human speech and thus, most of the bias, is because that is how they tuned it.
reply
Ok, fair enough. I'd be surprised if there weren't substantial viewpoint bias amongst the trainers, too.
Just about everyone with an advanced degree comes from the establishment left.
reply
reply
At some point, though, all it has to build off of is what's been made available. I don't see how bias can be avoided if what's available is biased.