Anthropic destroyed millions of print books to build its AI models

Benj Edwards, writing for Ars Technica:

On Monday, court documents revealed that AI company Anthropic spent millions of dollars physically scanning print books to build Claude, an AI assistant similar to ChatGPT. In the process, the company cut millions of print books from their bindings, scanned them into digital files, and threw away the originals solely for the purpose of training AI–details buried in a copyright ruling on fair use whose broader fair use implications we reported yesterday.

…

Ultimately, Judge William Alsup ruled that this destructive scanning operation qualified as fair use—but only because Anthropic had legally purchased the books first, destroyed each print copy after scanning, and kept the digital files internally rather than distributing them. The judge compared the process to “conserv[ing] space” through format conversion and found it transformative. Had Anthropic stuck to this approach from the beginning, it might have achieved the first legally sanctioned case of AI fair use. Instead, the company’s earlier piracy undermined its position.

I know using copyrighted materials for training these models is fraught with slippery slopes, but in this specific case I agree with the ruling.