14 Artificial Intelligence Part 2: How does the law govern AI use & training?
Rachael Samberg and Katie Zimmerman
Because licensing artificial intelligence (AI) usage and training rights can be so complex, and is best served by a grounding in how scholars use AI in their research, we have divided this training section into four chapters:
- Part 1: How is AI used in research?
- Part 2: How does the law govern AI use and training? (this chapter)
- Part 3: How can we license AI uses and training rights?
- Part 4: An AI negotiation case study
Using AI in TDM research is a fair use
An AI rights clause in an academic library e-resources agreement merely preserves uses that courts have already determined to be fair use, but does so expressly for the avoidance of doubt.
As noted in the text and data mining chapter, previous court cases like Authors Guild v. HathiTrust, Authors Guild v. Google, and A.V. ex rel. Vanderhye v. iParadigms have addressed fair use in the context of TDM and confirmed that the reproduction of copyrighted works to create and conduct text and data mining on a collection of copyright-protected works is a fair use. These cases further hold that making derived data, results, abstractions, metadata, or analysis from the copyright-protected corpus available to the public is also fair use, as long as the research methodologies or data distribution processes do not re-express the underlying works to the public in a way that could supplant the market for the originals.
For the same reasons that the TDM processes constitute fair use of copyrighted works in these contexts, the training of AI tools to do that text and data mining should also be fair use. [1] This is in large part because of the same transformativeness of the purpose (under Fair Use Factor 1) and because, just like “regular” TDM that doesn’t involve AI, AI training does not reproduce or communicate the underlying copyrighted works to the public (which is essential to the determination of market supplantation for Fair Use Factor 4).
But, while AI training is no different from other TDM methodologies in terms of fair use, there is an important distinction to make between the inputs for AI training and generative AI’s outputs. The overall fair use of generative AI outputs cannot always be predicted in advance: The mechanics of generative AI models’ operations suggest that there are limited instances in which generative AI outputs could indeed be substantially similar to (and potentially infringing of) the underlying works used for training; this substantial similarity is possible, for example, when a training corpus is rife with numerous copies of the same work. And a recent case filed by the New York Times addresses this potential similarity problem with generative AI outputs.
Yet, training inputs should not be conflated with outputs: The training of AI models by using copyright-protected inputs corresponds to what courts have already determined in TDM cases to be a transformative fair use. This is especially true when that AI training is conducted for non-profit educational or research purposes, as this bolsters its status under Fair Use Factor 1, which considers both transformativeness and whether the act is undertaken for non-profit educational purposes.
Were a court to suddenly determine that training AI was not fair use, and AI training was subsequently permitted only on “safe” materials (like public domain works or works for which training permission has been granted via license), this would curtail freedom of inquiry, exacerbate bias in the nature of research questions able to be studied and the methodologies available to study them, and amplify the views of an unrepresentative set of creators given the limited types of materials available with which to conduct the studies.
Europe has protected AI and TDM research rights
Even though we are confident it is a fair use to use AI in the context of (at minimum scholarly) TDM research provided the underlying content isn’t redistributed, there is no need to wait for potentially confusing opinions being issued in the context of commercial generative AI cases currently pending before the courts. It is advantageous if possible to expressly authorize the scope of AI rights so that regardless of what a court decides about fair use, you have already negotiated for your scholars to make certain uses of AI.
The European Union (EU) may be of some help to you in your negotiations in this regard. That’s because publishers must agree to allow certain AI uses for their EU academic clients. Why?
EU Copyright Digital Single Market Directive (CDSM)
CDSM Article 3 governs text and data mining (TDM) in the context of (a) scientific research by (b) research organizations and cultural heritage institutions—that is, both conditions must be true.[2] affirms that use of AI should be considered part of the TDM process, and provides that any restrictions that the AI Act is placing on AI do not extend to Article 3 TDM. In other words: The use of AI within TDM processes under CDSM Article 3 (i.e. scientific research by research organizations) remains intact without further restriction by the AI Act, and without the right of copyright owners to opt out of having their works used with AI.[3].
Recapping the EU Framework
In sum, for CDSM Article 3-governed TDM research—including TDM research involving AI—publishers must preserve all of the following rights:
- TDM (and TDM inclusive of AI) is Permitted: Research institutions can conduct TDM and TDM with AI, and retain copies of mined works for scientific research and verification.
- No Opt-Outs from AI Training: Copyright owners may not opt out of allowing works to be used for AI training for scientific research.
- No contractual Override: For all Article 3 uses, CDSM Article 7 further mandates that license agreements cannot abrogate rights.
The only real "constraint" is that the CDSM imposes reasonable security measures for conducting TDM. In addition, copyright owners may require security, but any such measures shall not go beyond what is necessary to achieve that objective.
Turning legal knowledge into negotiating strategy
With all this in mind, what should you consider aiming for in your negotiation strategy? We address desired outcomes in Part 3.
- As we explained in the TDM chapter, this is currently being addressed by the Copyright Office and potentially in some of the pending district court litigation, but neither source may be the final word on the matter. ↵
- CDSM Article 4 governs TDM that occurs outside scientific research by research organizations. This distinction is important. ] Article 3 carves out TDM rights for scientific research by research organizations and cultural heritage institutions. And Article 7 of the CDSM prohibits license or other agreements from taking those TDM research rights away! In essence, the publishers have to agree to protect TDM rights in the EU, and they can do so in your agreement, too.
EU's AI Act
The recent AI Act [footnote]https://www.europarl.europa.eu/doceo/document/TA-9-2024-0138-FNL-COR01_EN.pdf ↵ - For a good explanation of this, see pp. 7-9 of the Martin Senftleben article https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4740268 ↵