A group of authors has sued Anthropic, accusing it of training its models on pirated books, as reported by Reuters. The proposed class action lawsuit was filed in a California court on Monday and alleges Anthropic “built a multibillion-dollar business by stealing hundreds of thousands of copyrighted books.”
“It is apparent that Anthropic downloaded and reproduced copies of The Pile and Books3, knowing that these datasets were comprised of a trove of copyrighted content sourced from pirate websites like Bibiliotik,” the lawsuit reads. The authors want the court to certify their class action lawsuit as well as require Anthropic to pay proposed damages and prevent the company from using copyrighted material in the future. Anthropic didn’t immediately respond to The Verge’s request for comment.
The writers suing Anthropic include Andrea Bartz, the author of We Were Never Here; Charles Graeber, who wrote The Good Nurse; and Kirk Wallace Johnson, the author of The Feather Thief. While the lawsuit acknowledges that Books3 has been removed from the “most official” version of The Pile, the original version is still allegedly available elsewhere online. A recent investigation also found that companies like Anthropic and Apple trained their AI models on thousands of scraped YouTube video subtitles available within The Pile.