In AI Litigation, Content Creators Challenge Use Of Their Work To “Train” New Technology

Art Law Blog

FILTERS

RECENT POSTS

The Appraisers Association of America's National Convention Hosts Judd Grossman as a Panelist
Judd Grossman's Segment with The Baer Faxt Podcast Featured in NO RESERVE
Unraveling the Inigo Philbrick Scandal: Judd Grossman Featured on The Art Angle

In AI Litigation, Content Creators Challenge Use Of Their Work To “Train” New Technology
02/26/2024

We have written before about the many legal questions raised by new and rapidly-proliferating artificial intelligence technology. In recent weeks, there have been significant developments in AI-related litigation across the country. As the new year began, the New York Times instituted a lawsuit alleging copyright infringement of its news content by ChatGPT. In February, a federal judge trimmed the scope of a group of lawsuits challenging the use of books to “train” AI. And multiple competing class actions are jostling to determine which one will proceed first. These lawsuits promise to raise difficult questions about how our existing copyright regime should apply to the brave new world of content generated by AI.

New York Times Alleges Copyright Infringement of Its News Content

In a complaint filed during the last week of 2023, the New York Times sued a number of entities related to OpenAI, the creator of AI juggernaut ChatGPT, as well as tech giant Microsoft, which is a significant investor and technological partner in Open AI and also operates its own AI-powered products (such as Bing Chat). The claims in this case relate to the defendants’ artificial intelligence LLMs (large language models). An LLM works by predicting what words are likely to follow a particular string of text. The LLM must first be “trained,” however, using large volumes of preexisting content authored by others, which serve as examples. Copies of the training works are stored, and then repeatedly passed through computer models, to help the model “learn” how to respond to users’ prompts.

In its filing, the Times asserted that the defendants “trained” their LLMs using content created by the Times (among many other creators). The Times alleges that the defendants have been “copying and using millions of The Times’s copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more,” and indeed, that defendants’ models are built to recognize a preference for the Times content as being of higher quality than some of the other training content and therefore particularly valuable for training purposes.

The copyright claims (which include copyright infringement claims and related vicarious and contributory liability claims) focus on several allegedly-infringing actions by the defendants, including: 1) copying and storing Times content for training; 2) creating models that have allegedly “memorized” verbatim or near-verbatim reproductions of Times works; and 3) creating models that regularly produce and provide to users output that is either memorized Times content or substantially similar to such content. In other words, claims implicate both the AI’s “input” (i.e., the use of preexisting content to train the AI) and its “output” (i.e., the potentially infringing results that users receive from these AI products)

The Times separately asserts trademark claims, alleging that the defendants’ products sometimes provide output that misattributes to the Times content that was not in fact produced by the Times, including inferior and even incorrect information; such “hallucinations,” the Times argues, potentially damage the brand and reputation of the Times. And the complaint also asserts claims under the Digital Millennium Copyright Act (DMCA), alleging that defendants have illegally removed copyright management information from Times content. The Times is seeking damages, injunctive relief to stop the defendants from continuing their conduct, and even “destruction” of the LLM models and training sets that incorporate works created by the Times.

The parties had apparently been negotiating ways to permit the use of NYT content—as the Times has done with other licensees, including creators of other digital products. And OpenAI, for its part, has recently reached major deals with other content creators, including the Associated Press and publisher Axel Springer. But here, the parties were unable to reach a mutually agreeable arrangement that provided what the Times viewed as adequate “commercial terms and technological guardrails.”

The complaint anticipates the defendants’ invocation of a “fair use” defense, and preemptively argues that “there is nothing ‘transformative’ about using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it.” In arguing against fair use, the Times complaint seeks to emphasize the hard work that goes into independently reporting news—everything from the skill and subject-matter expertise of the reporters who do in-depth interviews and stories, to the physical danger faced by press personnel in war zones, to the time and effort required to fact-check and edit stories. It also emphasizes the heightened importance of this type of journalism in today’s world, and the public policy priorities served by the Times, including, among other things, educating voters and combating online misinformation. The complaint’s overarching narrative is that the defendants are “freeloading” by profiting from the Times content without having to invest the time, money, and effort to develop it—and that these efforts are financially damaging the Times and hindering its ability to serve the important social purposes of its mission.

The complaint also emphasizes that the defendants’ products are “substitutive,” making it less likely that readers will actually read the Times itself, because they can get that content through the defendants—thereby depriving the Times of subscription, licensing, advertising, and affiliate revenue. This is in line with recent trends in fair use case law which place significant weight on the extent to which an allegedly infringing product affects the market for the original product.

California Court Dismisses Some Claims by Authors In Consolidated AI Litigation

The Times case is just one of several pending lawsuits in which plaintiffs have asserted copyright infringement claims based on the use of their content to train AI tools. Indeed, the Times case has been specifically flagged as “related” to at least three other similar suits pending in the Southern District of New York. Those cases, which have now been consolidated, are putative class actions brought by the Authors Guild and several prominent authors of fiction and nonfiction books.

Meanwhile, on the opposite coast, a different group of authors, led by comedian Sarah Silverman and writer Paul Tremblay, is pursuing a similar consolidated lawsuit in the Northern District of California. But those plaintiffs were recently dealt a setback when a federal judge granted a motion to dismiss several of their claims.

The Silverman plaintiffs allege that OpenAI’s LLM copied their copyrighted books and used them in its training dataset, and could generate accurate summaries of the books’ content and themes. Their causes of action included: (1) direct copyright infringement; (2) vicarious infringement; and (3) violation of the DMCA, as well as claims under California state law for unfair competition, negligence, and unjust enrichment. The defendants moved to dismiss all claims except the direct copyright infringement.

The court sided largely with defendants, holding that all the claims except the state claim for unfair competition should be dismissed. Among other things, the court was not satisfied that plaintiffs had adequately shown that any “output” from the AI system was “substantially similar” to their content. However, the plaintiffs have until mid-March to amend their complaint to try to successfully replead—and of course, the direct copyright infringement claim at the core of the case was not challenged on that motion, and remains live.

West Coast Plaintiffs Seek to Intervene and Dismiss Similar Cases In New York

Relatedly, the authors’ proceedings in California and New York appear to be on a collision course as the various groups of plaintiffs try to establish which cases should go forward first. The Silverman plaintiffs seek to represent a class of all people in the U.S. who own a copyright in any work that was used as training data for OpenAI language models during the class period; the Authors Guild case is being brought on behalf of a similar class. Accordingly, the Silverman plaintiffs have sought to intervene in the consolidated New York Authors Guild case, asking the Southern District of New York to dismiss the Authors Guild case in favor of their California case, citing a longstanding policy—the so-called “first-to-file rule”—under which federal courts often hold that where there are competing cases (dealing with overlapping parties and factual and legal disputes) pending in different jurisdictions, the first-filed one should proceed and the others should be dismissed or stayed. The goal of the rule is not only to promote efficiency, but to prevent different courts from issuing inconsistent rulings. The Silverman plaintiffs have also asked the California court to enjoin the defendants from participating in the New York suit, accusing the defendants of “forum-shopping” by seeking to proceed in New York because of a potentially more favorable schedule; the California case schedule is set to deal with issues of class certification before motions for summary judgment, while the New York case schedule leaves open the possibility that the defendants could litigate substantive issues before class certification. Neither court has ruled yet, but questions about which case should proceed first may end up having a broad impact on how these AI cases are litigated.

What About Visual Art?

The cases discussed so far all involve print content—novels, autobiographies, non-fiction books, newspaper stories. But there is active litigation involving visual artists as well. A putative class action is pending in the Northern District of California brought by multiple visual artists regarding image-generating AI products like Midjourney. The court already dismissed some claims, and the plaintiffs recently amended their pleading to address issues raised in the court’s ruling. Four defendants—Midjourney, DeviantArt, Stability AI, and Runway AI—have now filed four separate motions to dismiss, which are currently being briefed. Issues in the briefing are likely to explore the real nature of plaintiffs’ claims—for example, are plaintiffs claiming that the models themselves are an infringing copy, or a derivative work of the copyrighted works? The court may also need to grapple with the question of whether some of the plaintiffs’ state law claims for unfair business practices are preempted by federal copyright law. And at least one of the defendants has asked the court to dismiss the claims on fair use grounds. Fair use is often an issue that courts are reluctant to decide at the initial motion to dismiss stage of a case, and generally prefer to deal with it on summary judgment or at trial—but this motion may represent the first opportunity for a court to squarely address whether fair use protects defendants who use copyrighted material to train AI.

What’s Next?

All of these cases deal with similar “big picture” questions about whether the “input” and the “output” of AI systems amounts to copyright infringement. But each case also raises questions about the unique aspects of specific types of content. For example, there will be differences between how a court might analyze fair use as applied to news stories versus visual art, where those types of content have very different purposes and licensing markets. Likewise, the cases may need to grapple with specific differences between defendants, including the precise ways in which their AI products operate.

And in another interesting wrinkle, the AI technology at issue continues to evolve in real time, and it is possible that the defendants may try to proactively remedy some of the problems pointed out by plaintiffs in these suits; for example, one source suggests that OpenAI may already be working to close some of the loopholes that allow users to ask for verbatim copies of New York Times content in order to bypass paywalls. If that is correct, a court may also need to decide how to address past infringements even if the defendants have taken steps to prevent that kind of infringement going forward.

It also remains to be seen how the various cases will unfold procedurally. When the Times initially filed, at least one commentator called the lawsuit a “negotiating tactic” designed to pressure a settlement—but the case is still open, and is still proceeding as a single plaintiff (as compared with the other multi-plaintiff or putative class action suits). Meanwhile, the book author lawsuits seem to be reaching a crossroads in which one or both courts will need to decide whether the first-filed California case should “lead the way” while the New York case should be dismissed or stayed. It’s possible, of course, that the parties themselves may find ways to broker settlements that balance the competing interests at stake; but, as one writer recently noted, it is no easy task to determine how much the artificial intelligence industry should pay copyright holders for using their works.

All of these complexities raise overarching questions about how—and whether—courts should be expected to apply centuries-old copyright concepts to the new world of AI. And the same questions are being asked across the world; OpenAI recently warned the U.K.’s House of Lords that effective AI is “impossible” without the use of copyrighted materials. One U.S. copyright scholar has argued that copyright law has a relatively limited function and is not the best way to hash out the big questions about the role of AI in society; rather, that formidable job is better left to legislative and policy solutions. This past fall, President Biden issued an executive order regarding AI, in which, among other things, he instructed the federal Copyright Office to make “recommendations to the President on potential executive actions relating to copyright and AI,” including “the scope of protection for works produced using AI and the treatment of copyrighted works in AI training.” But with no comprehensive federal approach on the immediate horizon, the judges overseeing these lawsuits will have difficult decisions to make. We will continue to monitor the litigation exploring the tension between AI and copyright law, which will be relevant to anyone who generates creative content or manages the intellectual property rights of creators.

ATTORNEY: Kate Lucas
CATEGORIES: Copyright, Fair Use, Legal Developments, Trademark