pull down to refresh

Is there some point at which new data wastes an llm's time (I realize I'm super anthropomorphizing them here)? Could some data be harmful to a model?
Every time an LMM tells you bs (or deletes your database, lol (#1051154)) it basically is making a connection it should not make based on either the weights of your query, or the model weights (because the response is an intersection of these)
Every bad input will influence weights. The idea that with more data, bad data noise will become less, is mathematically correct, except, there is waaaayyyy more bs on the interwebs than there is actual good data, so good luck with that.
Why does Groks bad behavior mimic X users' bad behavior? Because it is fed that crap.
except, there is waaaayyyy more bs on the interwebs than there is actual good data, so good luck with that.
What I'm finding curious is not trying to be good data, but trying to figure out how to influence llms. I'm interested in being the bad data, I think.
If it's just a matter of sheer quantity, how long before people are hiding massive dumps of bad (biased, deceptive) data on the internet to influence llm outputs? (Sorry if this is so technically off that it just sounds stupid - but it seems like the next logical step to me). Google was "just" best search results and slowly became look at the results that paid to be seen.
Why won't llms follow this same trajectory?
In that sense I understand these articles about writing for llms to mean writing to influence their outputs.
reply
Why would you need to hide it? You can do it in plain sight. And it's done in plain sight, we call that Reddit, they used to sell "their" data to Google for LLM input, now OpenAI.
It's not malicious in the way that it's meant to deceive LLMs, it's just bad. It will make bad connections between token A and token B.
Google was "just" best search results and slowly became look at the results that paid to be seen.
That's because people wanted to pay to be seen and Google, with paper principles, decided to take the money.
Why won't llms follow this same trajectory?
I think this is already the case. Or did you think your chat history was between you and the LLM? Especially if you pay the big buck, your interactions are worth a lot to the LLM companies because they can figure out what a premium conversation looks like and adapt earning models to that.
reply