The Anthropic Lawsuit and the Void in AI Governance

In 2025, Anthropic, one of the most prominent artificial intelligence companies, faced a class action alleging that its Claude models were trained using unauthorized copies of copyrighted books and personal data scraped from pirate sites. [1]

I. Copyright and Fair Use

The heart of the case is whether training an AI model on copyrighted works is a “fair use” of those works, as contemplated by 17 U.S.C. § 107. The factors to be considered in determining fair use include the purpose and character of the use; the nature of the copyrighted work used; the amount and substantiality of the portion used; and the effect of the use on the market for the original work. [1] These factors are implicated by the AI training process, where models are designed to process massive quantities of data. Determining when the “purpose and character” or “amount and substantiality” of a use crosses the line into infringement is often unclear. [2]

In Authors Guild v. Google, Inc., the Second Circuit held that Google’s scanning and indexing of millions of books to create a searchable database constituted a transformative fair use. [3] The court emphasized that Google’s purpose was not to supplant the original works, but to enhance public access to information and enable new forms of research, thus serving a socially beneficial, non-expressive function. This decision has long been viewed as a cornerstone of modern fair use jurisprudence, particularly in the context of emerging technologies that repurpose copyrighted material for innovative ends.

However, the Supreme Court’s recent ruling in Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith appeared to narrow this expansive interpretation. [4] In Warhol, the Court rejected the notion that transformation alone could justify fair use, especially when the secondary use is commercial and competes with the original work’s market. Unlike Google’s database, which offered a new and non-substitutive use for the underlying works, Andy Warhol’s portraits of Prince were found to retain the same essential purpose and character as Goldsmith’s original photograph; to serve as visual art licensed to magazines. The Court’s reasoning thus reoriented the analysis toward the commercial context of the secondary work, emphasizing that even a degree of transformation does not override these concerns.

This distinction is particularly relevant to ongoing disputes involving AI companies such as Anthropic. The plaintiffs in Bartz v. Anthropic allege that the company used copyrighted materials, including pirated copies of books, to train its models – activities that serve an obvious commercial purpose. [5] Unlike Google’s archival project, Anthropic’s use of protected works directly supports the development of a marketable product, and its outputs may compete with the original creators’ works in the creative marketplace. Given the Supreme Court’s reasoning in Warhol, AI companies may face increasing difficulty asserting that their data training practices constitute transformative fair use when their models are profit-driven and potentially encroach on the same markets as the works they appropriate.

II. Privacy and Data Protection

Beyond copyright, the Bartz v. Anthropic lawsuit also highlights critical privacy concerns surrounding AI training data. Along with allegations of large-scale data scraping, issues have arisen in Reddit, Inc. v. Anthropic PBC, where Reddit accused Anthropic of repeatedly accessing its user-generated content without authorization, a clear violation of its content policies and licensing terms. [6] Such cases underscore the privacy implications of mass data harvesting by AI developers and the challenges of protecting personal information in the digital ecosystem.

In the United States, the lack of a comprehensive federal privacy statute compounds these issues. Instead of this federal regulation, guidelines rely on a patchwork of state laws, such as the California Consumer Privacy Act, which provide inconsistent protections and limited recourse for individuals whose data are misused. [7] This fragmented legal landscape complicates enforcement across jurisdictions and makes it challenging to hold AI companies accountable for the unauthorized use of personal data. As reliance on user-generated information grows, cases like Bartz and Reddit illustrate the urgent need for coherent national standards governing privacy and data use in artificial intelligence systems.

However, even when plaintiffs allege violations of these privacy or data protection laws, they must first establish Article III standing, demonstrating a concrete and particularized injury to bring a case in federal court. [8] This procedural hurdle often limits the ability of individuals to challenge unauthorized data collection. In TransUnion LLC v. Ramirez, the Supreme Court held that a speculative or hypothetical risk of harm is insufficient to confer standing; plaintiffs must show an actual, concrete injury. [9] As a result, many privacy-based claims against AI developers face significant jurisdictional challenges before substantive issues can even be addressed.

III. The Case for Reform

This case underscores a growing gap in the law. Copyright doctrines were drafted for human copying, not for machine learning that ingests billions of works at scale. Some scholars warn that courts are stretching the fair use doctrine to encompass conduct that Congress never contemplated. [10] Possible reforms include a statutory licensing plan for training data (a system where creators are compensated for the use of their works under a standardized agreement, rather than requiring individual negotiations), transparency obligations for AI developers, and a federal privacy law clarifying when data scraping is permissible. [11]

However, not all scholars agree that legislative reform is necessary. Andrew Torrance and Bill Tomlinson argue that the process of training artificial intelligence models should be considered a form of “fair training,” analogous to fair use, because it is a non-consumptive activity that does not replace or diminish the market for the original work. [12] They maintain that existing copyright doctrines already contain sufficient flexibility to accommodate machine learning, provided the use remains transformative and non-substitutive. From this perspective, creating new statutory licensing regimes or regulatory frameworks could impose unnecessary burdens, stifle innovation, and risk entrenching dominant AI developers at the expense of smaller competitors.

Conclusion

The Anthropic lawsuit may determine whether courts alone can adapt existing legal doctrines to the AI era, or whether legislative intervention is required. Either way, its resolution may shape the legal infrastructure of artificial intelligence for years to come.


Sources

  1. Complaint, Bartz v. Anthropic PBC, No. 3:25-cv-00487 (N.D. Cal. filed Aug. 19, 2024).

  2. 17 U.S.C. § 107 (2018).

  3. Authors Guild v. Google, Inc., 804 F.3d 202, 207 (2d Cir. 2015).

  4. Andy Warhol Found. for the Visual Arts, Inc. v. Goldsmith, 598 U.S. 508, 525 (2023).

  5. Kate Knibbs, Court Rules Anthropic’s Use of Pirated Books Was Not Fair Use, Wired (Aug. 27, 2025).

  6. Mike Isaac, Reddit Sues Anthropic, Accusing It of Illegally Using Data From Its Site, N.Y. Times (June 4, 2025).

  7. California Consumer Privacy Act of 2018, Cal. Civ. Code §§ 1798.100–.199 (West 2023).

  8. U.S. Const. art. III.

  9. TransUnion LLC v. Ramirez, 594 U.S. 413, 426–27 (2021).

  10. Pamela Samuelson, Possible Futures of Fair Use, 90 Wash. L. Rev. 815 (2015).

  11. U.S. Copyright Office, Copyright and Artificial Intelligence, Part 3: Generative AI Training (Pre-Publication Version May 2025).

  12. Andrew W. Torrance & Bill Tomlinson, Training is Everything: Artificial Intelligence, Copyright, and “Fair Training”, 128 Dick. L. Rev. 233 (2023).

Previous
Previous

NIH Funding Cuts: The Shadow Docket, Executive Overreach, and the Fragility of Public Health

Next
Next

The State, the Screen, and the Child: Rethinking Protection, Autonomy, and Algorithmic Control