Because nobody's going to spend billions to retrain a model built on dubiously legal content
Researchers have found promising new ways to have AI models ignore copyrighted content, suggesting it may be possible to satisfy legal requirements without going through the lengthy and costly process of retraining models.
Training AI models requires huge quantities of data, which model-makers have acquired by scraping the internet without first asking for permission and by allegedly knowingly downloading copyrighted books.
Those practices have seen model makers sued in many copyright cases, and also raised eyebrows at regulators who wonder whether AI companies can comply with the General Data Protection Regulation right to erasure (often called the right to be forgotten) and the California Consumer Privacy Act right to delete.
...read more at theregister.com
pull down to refresh
related posts
I was testing the tricks outlined in #1206827 on Gemma3 yesterday to make it say stuff it's explicitly trained to not do during SFT, and honestly, Google made it tight! I couldn't get it to say anything off at all. Also not with the leetspeak trick.
But this also means that making it ignore
(c)sources will be much harder on tightly instructed models.So, what's gonna happen to the models that are already trained and up for grabs?
Well... the genie is out of the bottle, good luck getting it back in.
that's what I thought. So it's pretty much just the big tech companies that gotta comply.
They don't have to comply with anything really. They just gotta pay up.
This could save AI companies a fortune if it works, but I bet the lawyers are still gonna have a field day with those lawsuits