Artificial Intelligence Part 4: AI Negotiation Case Study

Rachael Samberg

16 Artificial Intelligence Part 4: AI Negotiation Case Study

Rachael Samberg

Because licensing artificial intelligence (AI) usage and training rights can be so complex, and is best served by a grounding in how scholars use AI in their research, we have divided this training section into four chapters:

Part 1: How is AI used in research?
Part 2: How does the law govern AI use and training?
Part 3: How can we license AI uses and training rights?
Part 4: An AI negotiation case study (this chapter)

In Part 4, we thought it would be helpful to share an example of AI restrictions that publishers proposed, followed by the outcomes for which we negotiated.

Publisher Proposal

Publisher-proposed language

Customer and its Authorized Users may not:

directly or indirectly develop, train, program, improve, and/or enrich any artificial intelligence tool (“AI Tool”) accessible to anyone other than Customer and its Authorized Users, whether developed internally or provided by a third party; or
reproduce or redistribute the Content to any third-party AI Tool, except to the extent limited portions of the Content are used solely for research and academic purposes (including to train an algorithm) and where the third-party AI Tool (a) is used locally in a self-hosted environment or closed hosted environment solely for use by Customer or Authorized Users; (b) is not trained or fine-tuned using the Content or any part thereof; and (c) does not share the Content or any part thereof with a third party.

The publisher’s first paragraph forbids the training or improving of any AI tool if it’s accessible or released to third parties. And it forbids the use of any outputs or analysis derived from the content to train any tool available to third parties. What does this mean? It would mean that the trained tools we explored in Parts 1-3 of our AI discussions would never be able to be released to third parties.

The paragraph is perhaps even more concerning. It provides that when using third party AI tools of any kind, scholars can use only limited portions of the licensed content with them, and can’t do any training at all of third party tools even if it’s a non-generative tool and they’re doing it in a completely closed environment! In other words, this proposal wouldn’t let scholars train a third party tool under any circumstances, much less make that tool available to others. It would in essence constitute a flat ban on nearly every research project we’ve explored in these chapters.

Response to Publisher

We explained to the publisher that within the University of California, we have a strong mandate from the president to obtain rights for our scholars to better the world using modern research technologies. In addition, the European Union’s Copyright Digital Single Market Directive and recent AI Act nullify any attempt to circumscribe the text and data mining and AI training rights reserved for scientific research within research organizations. So if the publishers have to agree to these terms in Europe, they can do so here.

Finally, we explained the many kinds of research projects that would be precluded by the publisher’s proposal. For instance:

Tools that could not be disseminated

In 2017, chemists created and trained a generative AI tool on 12,000 published research papers regarding synthesis conditions for metal oxides, so that the tool could identify anticipated chemical outputs and reactions for any given set of synthesis conditions entered into the tool. The generative tool they created is not capable of reproducing or redistributing any licensed content from the papers; it has merely learned conditions and outcomes and can predict chemical reactions based on those conditions and outcomes. And this beneficial tool would be prohibited from dissemination under the publisher’s terms identified above.
In 2018, researchers trained an AI tool (that they had originally created in 2014) to understand whether a character is “masculine” or “feminine” by looking at the tacit assumptions expressed in words associated with that character. That tool can then look at other texts and identify masculine or feminine characters based on what it knows from having been trained before. The implications are that scholars can therefore use texts from different time periods with the tool to study representations of masculinity and femininity over time. No licensed content, no licensed or copyrighted books from a publisher can ever be released to the world by sharing the trained tool; the trained tool is merely capable of topic modeling—but the publisher’s above language would prohibit its dissemination nevertheless.

Tools that could neither be trained nor disseminated

In 2019, authors used text from millions of books published over 100 years to analyze cultural meaning. They did this by training third-party non-generative AI word-embedding models called Word2Vec and GLoVE on multiple textual archives. The tools cannot reproduce content: when shown new text, they merely represent words as numbers, or vectors, to evaluate or predict how similar words in a given space are semantically or linguistically. The similarity of words can reveal cultural shifts in understanding of socioeconomic factors like class over time. But the publisher’s above licensing terms would prohibit the training of the tools to begin with, much less the sharing of them to support further or different inquiry.
In 2023, scholars trained a third-party-created open-source natural language processing (NLP) tool called Chemical Data Extractor (CDE). Among other things, CDE can be used to extract chemical information and properties identified in scholarly papers. In this case, the scholars wanted to teach CDE to parse a specific type of chemical information: metal-organic frameworks, or MoFs. Generally speaking, the CDE tool works by breaking sentences into “tokens” like parts of speech and referenced chemicals. By correlating tokens, one can determine that a particular chemical compound has certain synthetic properties, topologies, reactions with solvents, etc. The scholars trained CDE specifically to parse MoF names, synthesis methods, inorganic precursors, and more—and then exported the results into an open source database that identifies the MoF properties for each compound. Anyone can now use both the trained CDE tool and the database of MoF properties to ask different chemical property questions or identify additional MoF production pathways—thereby improving materials science for all. Neither the CDE tool nor the MoF database reproduces or contains the underlying scholarly papers that the tool learned from. Yet, neither the training of this third-party CDE tool nor its dissemination would be permitted under the publisher’s restrictive licensing language cited above.

Accordingly, we marked up the publisher’s language to the extent it fails to confer essential text and data mining and AI usage, training, and dissemination rights. We also conveyed that our changes should appear familiar as the publisher is obligated to honor these same rights with European research institutions under Article 3 of the Copyright Digital Single Market Directive, and the recent AI Act.

Library Counterproposal

Except as explicitly stated in this Agreement or otherwise permitted in writing by Licensor as an amendment to this License Agreement, or as permitted by any Creative Commons licenses or public domain dedications applied to the Content, Customer and its Authorized Users may not:

(i) directly or indirectly develop, train, program, improve, and/or enrich any artificial intelligence tool (“AI Tool”) whether developed internally or provided by a third party, unless: (a) doing so with reasonable information security standards to undertake, mount, load, or integrate the Content on Customer’s or Authorized Users’ servers or equipment, and (b) no Content or any part thereof is shared with anyone other than Customer and its Authorized Users; or

(ii) in the case of third-party generative artificial intelligence tools, train or fine-tune any such tool (including an algorithm) unless: (a) doing so locally in a self-hosted or closed hosted environment solely for use by Customer or Authorized Users, and (b) neither the third-party generative artificial intelligence tool nor the Content is shared with anyone other than Customer or its Authorized Users.

Subject to the above, uses of AI Tools by Customer and Authorized Users are permitted provided they do not create a competing or commercial product or service; disrupt the functionality of the Content or the Platform; or reproduce or redistribute the Content or any part thereof to a third party.

We then summarized for the publisher the effect of our changes:

Type of AI Tool

Authorized Uses

Home-grown non-generative;

Home-grown generative;

Third-party non-generative AI

Can be used and trained, provided:

Licensed content is available only to licensee & authorized users
Doesn’t disrupt functioning of licensed products
Doesn’t reproduce/redistribute licensed products to third parties
Doesn’t create a competing or commercial product or service
Reasonable information security standards

Subject to the above, no restrictions on dissemination of the AI tool (trained or not)

Third-party Generative AI

Can be used and trained provided:

Licensed content is available only to licensee & authorized users
Done only in a closed-hosted environment
Doesn’t disrupt functioning of licensed products
Doesn’t reproduce/redistribute licensed products to third parties
Doesn’t create a competing or commercial product or service
++Reasonable info security standards (but this would already be covered by closed-hosted environment above)
No release or exchange of the trained tool or its data with a third party

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

E-Resource Licensing Explained Copyright © 2024 by Sandra Enimil, Rachael Samberg, Samantha Teremi, Katie Zimmerman, Erik Limpitlaw is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.