Judge Alsup “has all but ruled that Anthropic’s downloading of pirated books is [copyright] infringement,” leaving “the real issue at trial… the jury’s calculation of statutory damages based on the number of copyrighted books/works in the class.”
Copyright law doesn't make sense to me. Same as patent law.
the same ruling, he found that Anthropic’s wholesale downloading and storage of millions of pirated books — via infamous “pirate libraries” like LibGen and PiLiMi — was not covered by fair use at all. In other words: training on lawfully acquired books is one thing, but stockpiling a central library of stolen copies is classic copyright infringement.
This gets at the idea of poisoned data sets: if Anthropic is made to pay heavily here, imagine how careful other companies will need to be with the data they crawl.
The order reiterates a basic tenet of copyright law: every time a pirated book is downloaded, it constitutes a separate violation — regardless of whether Anthropic later purchased a print copy or only used a portion of the book for training.
Copyright law is not about protecting creators.
Even when pirate sites started getting taken down, Anthropic scrambled to torrent fresh copies. After a company co-founder discovered a mirror of “Z-Library,” a database shuttered by the FBI, he messaged his colleagues: “[J]ust in time.” One replied, “zlibrary my beloved.”
Everybody is a pirate at heart. This feels like a classic case of laws trailing woefully behind reality. Reality always wins, but often there are many normal people who become collateral damage along the way. These are the kind of things that make me hate the state.