Mainstream coverage focused on the high-profile May 5, 2026 lawsuit by Scott Turow and major publishers accusing Meta of training its Llama models on copyrighted books and journal articles copied from pirate sites (naming specific works and alleging Mark Zuckerberg authorized the practice), seeking class-wide statutory damages, an injunction and destruction of infringing copies; Meta responded that courts have found training on copyrighted material can be fair use and said it will defend the case. Reports tied the filing to earlier revelations about Books3 and unsealed documents showing Meta paused licensing in favor of a fair-use strategy, and recounted related litigation threads going back to 2023.
Missing from much mainstream coverage were key legal and factual contexts: prior 2025 rulings (e.g., Kadrey v. Meta and Bartz v. Anthropic) that found model training on copyrighted books can qualify as fair use while distinguishing unlawful storage/distribution — the Bartz case led to a reported $1.5 billion settlement and per-work payouts — and technical scale facts such as Meta’s statement that Llama 3 was pretrained on over 15 trillion tokens and that LibGen hosts millions of pirated books. Mainstream outlets also largely lacked opinion/analysis and social-media perspectives that have debated nuances of fair use vs. copying, the practical differences between training and distribution, and the implications of dataset scale; readers would benefit from more case-law history, dataset statistics, and independent technical and economic analysis to understand how precedent, dataset composition, and damages outcomes could shape this and other AI–copyright disputes.