"

15 Artificial Intelligence Part 3: How can we license AI uses and training rights?

Rachael Samberg and Katie Zimmerman

Because licensing artificial intelligence (AI) usage and training rights can be so complex, and is best served by a grounding in how scholars use AI in their research, we have divided this training section into four chapters:

Desired result

Ultimately, deciding what to preserve or negotiate for, and how hard to push for it, really depends on the institution. Generally speaking, you want to be broad in your thinking to anticipate scholars’ needs. To that end, newly-emerging content license agreements that prohibit usage of AI entirely, or charge exorbitant fees for it as a separately-licensed right, will impede research and the advancement of knowledge. We aim in this chapter equip you with alternative licensing language to adequately address rightsholders’ concerns

Desired Language

The language you use to preserve AI rights may vary based on whether the rights clause is framed as a “Usage Restriction” or “Authorized Use.” We address both here. (And if the agreement is silent on AI rights and you’re adding them in using the language below, you could put them either under “Prohibited/Unauthorized Uses” or “Authorized Uses”):

Preserving AI in Prohibited Uses/Unauthorized Uses Clauses

Restrictions on Use of Subscribed Products: Licensee and Authorized Users may not:

  1. use the Subscribed Products in combination with an artificial intelligence tool to the extent doing so would: create a competing commercial product or service for use by third parties; unreasonably disrupt the functionality of the Subscribed Products; or reproduce or redistribute the original Subscribed Products to third parties. Artificial intelligence tools may not be used without reasonable information security standards to undertake, mount, load, or integrate the Subscribed Products on Licensee’s or Authorized Users’ servers or equipment.

[IF VENDORS WANT ADDITIONAL PROTECTIONS FOR THIRD-PARTY (NOT HOMEGROWN) GENERATIVE AI, IT MAY BE NECESSARY TO ADD THE FOLLOWING FURTHER RESTRICTIONS:

Further, third-party generative artificial intelligence tools may not be used with the Subscribed Products unless: (i) the tool is used locally in a self-hosted environment or closed hosted environment solely for use by Subscriber or Authorized Users; and (ii) the tool is not trained or fine tuned using the Subscribed Products or any part thereof, unless: such training is made by Subscriber or Authorized Users only; there is no public release or exchange of the trained artificial intelligence tool or its data with a third party; and neither the Subscribed Products nor any part thereof is shared with a third party.]

Preserving AI in Authorized Uses Clauses

Authorized Uses: Licensee and Authorized Users may:

  1. use the Subscribed Products in combination with an artificial intelligence tool except to the extent that such usage would: create a competing commercial product or service for use by third parties; unreasonably disrupt the functionality of the Subscribed Products; or reproduce or redistribute the original Subscribed Products to third parties. Artificial intelligence tools shall be used with reasonable information security standards to undertake, mount, load, or integrate the Subscribed Products on Licensee’s or Authorized Users’ servers or equipment.

[IF VENDORS WANT ADDITIONAL PROTECTIONS FOR THIRD-PARTY (NOT HOMEGROWN) GENERATIVE AI, IT MAY BE NECESSARY TO ADD THE FOLLOWING FURTHER RESTRICTIONS:

Further, to the extent that Licensee or Authorized Users use the Subscribed Products in combination with a third-party generative artificial intelligence tool, any such third-party generative artificial intelligence tool must: (i) be used locally in a self-hosted environment or closed hosted environment solely for use by Licensee or Authorized Users; and (ii) not be trained or fine tuned using the Subscribed Products or any part thereof, unless: such training is made by Subscriber or Authorized Users only; there is no public release or exchange of the trained artificial intelligence tool or its data with a third party; and neither the Subscribed Products nor any part thereof is shared with a third party.]

Tips & Tricks

Addressing rightsholders’ concerns

How can a library negotiate sufficient AI usage rights while acknowledging the concerns of publishers? We believe publishers or vendors are more inclined to curb AI usage when they are concerned about: (1) the security of their licensed products, and the fear that researchers will leak or release content behind their paywall; (2) AI being used to create a competing product that could substitute for the original licensed product and undermine their share of the market; (3) training the public version of third-party generative tool, or releasing a trained third-party generative tool to the public; (4) when vendors are not the rightsholders and have themselves signed agreements with the publishers/rightsholders that prohibit AI usage downstream; and (5) a desire to charge separately for AI rights to drive up fees.

While these concerns are valid, they largely reflect longstanding fears over users’ potential generalized misuse of licensed materials. But publishers are already able to—and do—impose contractual provisions disallowing the creation of derivative products and systematically sharing licensed content with third-parties, so additionally banning the use of AI in doing so is unnecessary.

In all events, the sample licensing language above addresses these concerns by specifying that sharing the licensed materials with the public is not allowed; creating competing commercial products is not allowed; and reasonable security measures are used. Further, in the case of third-party generative AI tools (like ChatGPT), even more robust security protocols may be needed to satisfy publisher concerns, along with a prohibition on publicly sharing the third-party generative tool once it has been trained. Note that the viability of offering this solution may require that your institution also have enforceable enterprise licenses with the vendors of these third-party tools. OpenAI, for example, the company that operates ChatGPT, uses user inputs to further train their model in the free-to-the-public version of ChatGPT [1], but explicitly does not train the model on user data in business license versions of the product [2]. If your institution does not have enterprise licenses for the AI tools that your end users frequently use, then limiting third-party tool use based on security and data privacy may not do much for your users.  Be sure to tailor your AI tool terms to the needs of your institution.

Preserve access to facts

Publishers have been pushing hard to foreclose scholars from training and dissemination AI tools that now “know” something based on the licensed content. That is, such publishers wish to prevent tools from learning facts about the licensed content. However, this is precisely the purpose of licensing content. When institutions license content for their scholars to read, they are doing so for the scholars to learn information from the content. When scholars write about it or teach about the content, they are not regenerating the actual expression from the content—the part that is protected by copyright; rather the scholars are conveying the lessons learned from the content—facts not protected by copyright. Prohibiting the training of AI tools and the dissemination of those tools is functionally equivalent to prohibiting scholars from learning anything about the content that institutions are licensing for that very purpose, and that scholars have written to begin with! Publishers should not be able to monopolize the dissemination of information learned from scholarly content, and especially when that information is used non-commercially.

So, it is worth explaining the fact vs. expression distinction to publishers, and reminding them that the purpose of licensing content is to learn from it and disseminate the facts one extracts.

Competing uses

Many of the mitigations that you may be able to negotiate around restrictive AI terms include the limitation that AI tools trained on the licensed content may not be used to create a competing commercial product to the licensed materials.  This may be necessary to mitigate rightsholder concerns that a generative AI tool could replace the need to subscribe to their content and disrupt their business model. While this is a reasonable concern, and, indeed, language around not creating competitive products is common in eresource licenses generally (see the Commercial Purposes section of this guide), AI tools complicate this provision and the specific language used becomes increasingly important.

The main complication here is that, increasingly, vendors are including their own AI tools as part of the “Licensed Materials.” These can be relatively straightforward AI tools, such as a chatbot feature that will summarize an article or suggest articles based on a prompt, or something more complex such as Elseiver’s Reaxys Predictive Retrosynthesis AI tool, which will generate chemical syntheses and is starting to be included in subscriptions to the Reaxys database. The danger here is that the AI tools included in the Licensed Materials may look increasingly similar to tools that researchers may create. It was not a problem to agree not to create a “competing product” when the product was a database or set of journals. As vendors move further into analytics, however, there is the potential for increasing overlap between vended products and academic research projects. To avoid limiting the academic freedom of researchers you should make sure the anticompetitive language is narrowly tailored to an appropriate scope. Our recommended language is “competing commercial product or service for use by third parties,” which prevents the limitation from affecting strictly academic and in-house products (researchers who develop a tool that is later commercialized would still need to potentially negotiate a separate license agreement at the point of commercialization). Another possible approach is to define any vendor AI tools separately from the definition of “Licensed Materials” and to then include only the defined term in the anticompetitive clause.

Summarizing and communicating the clause(s)

For better or worse, negotiating AI usage and training rights winds up being highly nuanced in terms of both what you need and what the publisher/vendor is willing to agree to. For this reason alone, it can be helpful to summarize what you want or are achieving through the AI clause—both so you understand and to better explain your needs or position to the vendor.

In addition, it is increasingly common in e-resource license agreements overall to be required to use “reasonable efforts” to make your Authorized Users aware of their rights and obligations under the agreement—and this is frequently required specifically in the context of AI clauses. There are a variety of ways to inform users of the terms, but a terms summary is a useful first step—and can be distilled for say, the catalog record of a resource, or on a general terms of use web page, presented in a click-through, etc. Having ready access to a terms summary is also helpful in answering research and reference questions from users about whether they can conduct TDM and use AI with the resource.

Here is an example of the kind of summary that might be helpful both in the negotiating process and in the post-execution conveyance of terms. These are the AI terms that the University of California reached in a recent agreement, summarized in a chart:

Type of AI Tool Authorized Uses
Home-grown non-generative;

Home-grown generative;

Third-party Non-Generative AI

Can be used and trained, provided:

  • The content is not used to make competing or commercial product for third parties
  • Doesn’t disrupt functioning of licensed products
  • Doesn’t reproduce/redistribute licensed products to third parties
  • Commercially reasonable security measures are undertaken

Subject to the above, no restrictions on dissemination of the AI tool (trained or not)

Third-party Generative AI Can be used provided:

  • the content is used locally in a self-hosted environment or closed hosted environment solely by Subscriber & Authorized Users
  • Does not share Subscribed Product or any parts thereof with third party

Can also be trained provided:

  • there is a separate license Subscriber enters into with third party that: imposes commercially reasonable security measures, limits use to Subscriber or Authorized Users only, and precludes public release or exchange of the trained artificial intelligence tool or its data with a third party
  • training occurs locally in a self-hosted environment or closed hosted environment solely by Subscriber & Authorized Users
  • training does not share Subscribed Product or any parts thereof with third party

Importance and Risk

Modern research increasingly incorporates computational methodology, rendering both TDM and AI usage and training rights increasingly paramount. While it is impossible to “future-proof” licenses with all anticipated permutations of how TDM and AI will be relied upon by scholars, what is clear is that merely preserving fair use rights won’t be enough in the face of publishers’ express efforts to limit AI. Even if you think your scholars now don’t have the tools and resources to incorporate AI into their work, the proliferation of third-party-developed tools will only make AI usage and training more accessible. It’s better to be safe than sorry and try to carve out the right for your users to make use of the emerging technologies.

To see all this negotiation guidance in action, we present a case study in Part 4.


  1. https://help.openai.com/en/articles/8983117-how-does-openai-use-my-personal-data
  2. https://openai.com/index/new-tools-for-chatgpt-enterprise/

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

E-Resource Licensing Explained Copyright © 2024 by Sandra Enimil, Rachael Samberg, Samantha Teremi, Katie Zimmerman, Erik Limpitlaw is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.