On the Inevitability of Left-Leaning Political Bias in Aligned Language Models \ stacker news ~AI

pull down to refresh

On the Inevitability of Left-Leaning Political Bias in Aligned Language Models arxiv.org/abs/2507.15328

336 sats \ 17 comments \ @0xbitcoiner 12 Oct AI

by Thilo Hagendorff

The guiding principle of AI alignment is to train large language models (LLMs) to be harmless, helpful, and honest (HHH). At the same time, there are mounting concerns that LLMs exhibit a left-wing political bias. Yet, the commitment to AI alignment cannot be harmonized with the latter critique. In this article, I argue that intelligent systems that are trained to be harmless and honest must necessarily exhibit left-wing political bias. Normative assumptions underlying alignment objectives inherently concur with progressive moral frameworks and left-wing principles, emphasizing harm avoidance, inclusivity, fairness, and empirical truthfulness. Conversely, right-wing ideologies often conflict with alignment guidelines. Yet, research on political bias in LLMs is consistently framing its insights about left-leaning tendencies as a risk, as problematic, or concerning. This way, researchers are actively arguing against AI alignment, tacitly fostering the violation of HHH principles.

full pdf paper: https://arxiv.org/pdf/2507.15328

~Politics_And_Law

view all related items

100 sats \ 16 replies \ @optimism 13 Oct

There are moments where I find left/right polarization extremely confusing, and this is definitely one of them. I think this is because it's an excessive reduction in dimensions to human thought. Even on a 2-dimensional scale of left/right and authoritarian/libertarian, the results from taking those political alignment tests always astound me.

If we assert that LLMs, that operate over thousands of dimensions have a one-dimensional guideline then I think there's a problem, especially if that were true. I'd instead assert that this is more human desire to simplify complex systems, both AI and society, into a labelled, organized system.

But reality is much more complex than a one-dimensional measure can possibly dream to represent and I think this goes for AI and humans both.

100 sats \ 15 replies \ @0xbitcoiner OP 13 Oct

I think you're right. In a world as divided as ours, both sides are always gonna try to pull things their way and bash the other. I'm not sure if any bias an LLM has is intentional — if it is, that’s bad. But if it’s not on purpose, then I guess it just comes from the training data, which naturally leans one way or another. I’m not saying that bias is good or bad, just that it’s kinda "natural."

75 sats \ 14 replies \ @optimism 13 Oct

I think that at this point, the training data (for the big LLMs) is all-encompassing and what you find in chatbot interaction is the result of training more than base language ¹.

I do think that bias is there because of this, but I agree with the author on the part where the training of an LLM is on-purpose and thus bias is a desired outcome; I do however disagree with then trying to one-dimensionally express the result on a scale of left-to-right. I also think that the most fit-for-purpose LLMs are those that are at least tuned to some level of specialization, and that means that you need bias.

Some of the bias is probably intentional. Grok apparently had to filter out some nazi crap under public pressure. I think that if we'd less ascribe personality to LLMs, and thus also train it less to simulate having a personality, we'd also need less filtering for bias. I guess my ideal LLM is the opposite of the robotic persona that all the Big Tech is now pitching.

I may be mistaken in this because I'm not at a lab training LLMs, but this is how I interpret the post-o1 era we're in now, where reinforcement learning is ultimately what makes it tick in terms of instruction following. ↩

133 sats \ 11 replies \ @Undisciplined 14 Oct

The nature of political polarization is that almost everything becomes politically coded. There may be no good reason for various opinions to be clustered together, but they are.

So, when there's an extreme censorship campaign against one side, as there clearly was, the available training data will be biased towards the side that wasn't censored.

42 sats \ 10 replies \ @optimism 14 Oct

Okay. So let's take DeepSeek. There are some obvious things censored in there. Did this happen during ingestion because data was left out, or post-ingestion through reinforcement learning?

100 sats \ 9 replies \ @Undisciplined 14 Oct

I'm not even talking about the specifics of what was included or excluded for the purpose of training.

We had an intense decade-long period of big tech censorship online. If these models are training on what's available online, that is a very biased dataset and there's no way to include the missing stuff because people began self-censoring to avoid being demonitized.

0 sats \ 8 replies \ @optimism 14 Oct

Meta has been training it on libgen. I really think this is not some magical thing. They teach it to be the way it is.

100 sats \ 7 replies \ @Undisciplined 14 Oct

That doesn't mean much to me.

I'm talking about bias in the information produced for and available to the world. It's not about some specific training set. There's no available unbiased dataset.

view all 7 replies

100 sats \ 1 reply \ @0xbitcoiner OP 13 Oct

Yeah, I agree. Specialized LLMs are obviously gonna have some bias, that’s kind of the whole point, right? But general ones should stay neutral (or 'natural' bias), especially on key topics. I don’t really know how you’re supposed to properly check an LLM for bias, but I get that it’s not easy, and it’s definitely not fair or accurate to just say “it’s biased this way or that way.” It’s way more complicated than that, for sure.

44 sats \ 0 replies \ @optimism 13 Oct

It's very hard and a guessing game, though "unlearning bias" has been done to an extent. See for example https://erichartford.com/uncensored-models

Footnotes