pull down to refresh

Because nobody's going to spend billions to retrain a model built on dubiously legal content
Researchers have found promising new ways to have AI models ignore copyrighted content, suggesting it may be possible to satisfy legal requirements without going through the lengthy and costly process of retraining models.
Training AI models requires huge quantities of data, which model-makers have acquired by scraping the internet without first asking for permission and by allegedly knowingly downloading copyrighted books.
Those practices have seen model makers sued in many copyright cases, and also raised eyebrows at regulators who wonder whether AI companies can comply with the General Data Protection Regulation right to erasure (often called the right to be forgotten) and the California Consumer Privacy Act right to delete.
I was testing the tricks outlined in #1206827 on Gemma3 yesterday to make it say stuff it's explicitly trained to not do during SFT, and honestly, Google made it tight! I couldn't get it to say anything off at all. Also not with the leetspeak trick.
But this also means that making it ignore (c) sources will be much harder on tightly instructed models.
reply
So, what's gonna happen to the models that are already trained and up for grabs?
reply
Well... the genie is out of the bottle, good luck getting it back in.
reply
that's what I thought. So it's pretty much just the big tech companies that gotta comply.
reply
They don't have to comply with anything really. They just gotta pay up.
reply
0 sats \ 0 replies \ @Zion 5 Sep
This could save AI companies a fortune if it works, but I bet the lawyers are still gonna have a field day with those lawsuits
reply