Anthropic Scores a Landmark AI Copyright Win—but Will Face Trial Over Piracy Claims

Anthropic has scored a major victory in an ongoing legal battle over artificial intelligence models and copyright, one that may reverberate across the dozens of other AI copyright lawsuits winding through the legal system in the United States. A court has determined that it was legal for Anthropic to train its AI tools on copyrighted works, arguing that the behavior is shielded by the “fair use” doctrine, which allows for unauthorized use of copyrighted materials under certain conditions.

“The training use was a fair use,” senior district judge William Alsup wrote in a summary judgment order released late Monday evening. In copyright law, one of the main ways courts determine whether using copyrighted works without permission is fair use is to examine whether the use was “transformative,” which means that it is not a substitute for the original work but rather something new. “The technology at issue was among the most transformative many of us will see in our lifetimes,” Alsup wrote.

“This is the first major ruling in a generative AI copyright case to address fair use in detail,” says Chris Mammen, a managing partner at Womble Bond Dickinson who focuses on intellectual property law. “Judge Alsup found that training an LLM is transformative use—even when there is significant memorization. He specifically rejected the argument that what humans do when reading and memorizing is different in kind from what computers do when training an LLM.”

The case, a class action lawsuit brought by book authors who alleged that Anthropic had violated their copyright by using their works without permission, was first filed in August 2024 in the US District Court for the Northern District of California.

Anthropic is the first artificial intelligence company to win this kind of battle, but the victory comes with a large asterisk attached. While Alsup found that Anthropic’s training was fair use, he ruled that the authors could take Anthropic to trial over pirating their works.

While Anthropic eventually shifted to training on purchased copies of the books, it had nevertheless first collected and maintained an enormous library of pirated materials. “Anthropic downloaded over seven million pirated copies of books, paid nothing, and kept these pirated copies in its library even after deciding it would not use them to train its AI (at all or ever again). Authors argue Anthropic should have paid for these pirated library copies. This order agrees,” Alsup writes.

“We will have a trial on the pirated copies used to create Anthropic’s central library and the resulting damages,” the order concludes.

Anthropic did not immediately respond to requests for comment. Lawyers for the plaintiffs declined to comment.

The lawsuit, Bartz v. Anthropic, was first filed less than a year ago; Anthropic asked for summary judgment on the fair use issue in February. It’s notable that Alsup has far more experience with fair use questions than the average federal judge, as he presided over the initial trial in Google v. Oracle, a landmark case about tech and copyright that eventually went before the Supreme Court.

Prior to this, there had only been one summary judgment decision issued in an AI copyright case. In Thomson Reuters v. Ross, a judge found that the AI startup Ross’s training on materials from the Thomson Reuters-owned legal research firm Westlaw was not fair use—but that case is already headed to an appeals court. In almost all of the ongoing copyright lawsuits, the AI company defendants are attempting to lay out “fair use” arguments, so Alsup’s decision will almost certainly play a large role in how these cases are argued going forward, especially cases where piracy isn’t a factor. Fair use advocates are already celebrating this as a win.

“Judge Alsup’s ruling should be a model for other courts assessing whether Gen AI training on copyrighted material is fair use,” says Adam Eisgrau, the senior director of AI, Creativity, and Copyright Policy at the tech trade group Chamber of Progress. “He found it is clearly transformative and affirmed that the purpose of copyright is to promote competition and creativity, not protect monopoly revenue streams.”

Still, this is in many respects a truly split ruling, as Alsup heavily emphasized within the summary judgment that piracy was not legally excusable. “The downloaded pirated copies used to build a central library were not justified by a fair use,” he writes. “Every factor points against fair use.”

During the discovery process of the lawsuit, Judge Alsup learned that Anthropic relied partially on downloading vast amounts of materials from pirated databases like Books3 to amass books to train AI tools like Claude. He offers details into the process in the summary judgment, noting that Anthropic cofounder Ben Mann downloaded the entirety of Books3 back in winter 2021, and didn’t stop there: “Anthropic’s next pirated acquisitions involved downloading distributed, reshared copies of other pirate libraries. In June 2021, Mann downloaded in this way at least five million copies of books from Library Genesis, or LibGen, which he knew had been pirated. And, in July 2022, Anthropic likewise downloaded at least two million copies of books from the Pirate Library Mirror, or PiLiMi, which Anthropic knew had been pirated,” Alsup writes.

Anthropic is not the only AI company accused of piracy in AI copyright lawsuits. In Kadrey v. Meta, another hotly contested copyright lawsuit brought by authors, plaintiffs’ lawyers have forcefully argued that Meta’s acquisition of books from pirate libraries like LibGen was not legal nor shielded by the fair use doctrine.

The lowest statutory damage for this type of copyright infringement is $750 per book, and Alsup notes that Anthropic’s pirate library consisted of at least 7 million books, which means Anthropic faces billions in potential court-imposed penalties. There is no trial date set yet.

Related Posts

‘Big Balls’ No Longer Works for the US Government

How Synthflow AI is cutting through the noise in a loud AI voice category

New data highlights the race to build more empathetic language models

Leave a Reply Cancel Reply