5 Privacy

Beth Cate and Rachael Samberg


Digital humanities scholars are often surprised that TDM questions seeming to present problems of legal privacy often wind up not being governed by U.S. privacy laws, but by professional or disciplinary ethical norms. Because of the specific scope of U.S. federal privacy laws and the strong privacy exceptions under state privacy laws, when TDM digital humanities researchers face privacy concerns, they are often matters of “privacy,” but not “legal privacy.”

The Gamergate case study highlights this phenomenon well. In Applying an Ethics of Care to Internet Research: Gamergate and Digital Humanities,[1] authors Suomela et al. overview the Gamergate scandal involving the harassment of women who spoke out on Twitter on the topic of misogyny within video game development culture. The women who shared their views received rape and death threats. In collecting Tweets from the women as well as their harassers, Suomela and team needed to consider whether their analysis and republication of such materials violated posters’ privacy. What they discovered was that privacy concerns related to the ethical issue of reamplifying hate messages, but not legal privacy because the voluntary disclosure of personal information—such as in someone’s own public postings—waives any legal privacy rights even if the subject content had been protected by laws.

In the next chapter, we’ll address the ethical challenges embedded in TDM research. But here, we’ll detail the kinds of privacy laws that scholars confront in the U.S., and the very powerful exceptions that often render concerns as ethical rather than legal.

What is “private” under the law?

When we think of privacy law and TDM, we often think about cleaning our data so as not to reveal personal information about individuals. But what personal information is actually protected by privacy law, and what are we allowed to publish? And specifically, how do privacy law challenges come up in the context of text data mining? In other words, what do we mean when we say “privacy”?

In the U.S., and unlike with copyright law which is basically just a matter of federal statute, there are actually multiple sources of privacy law.

Constitutional privacy

First, there’s the constitutional right of privacy, which protects personal privacy against unlawful government invasion. Note that the Constitution does not explicitly include the right to privacy, but the Supreme Court has found that it implicitly grants a right to privacy against governmental intrusion — and it does this through the First, Third, Fourth Amendment, and Fifth Amendments. These Constitutional rights to privacy, however, are typically not what we’re dealing with in the context of humanities text data mining. If you, a researcher or TDM professional, are doing the work, you are not a government actor, so you’re not likely violating someone’s constitutional right to privacy with the research you’re doing. You may be violating their privacy rights, but not those privacy rights arising under the Constitution— because the Constitution protects against government intrusions. Largely, individual researchers’ TDM research does not invoke Constitutional privacy rights.

Federal statutes

There are also federal statutes—laws made by the U.S. Congress—that provide protections for certain types of information, or certain types of individuals. And there are a bunch of them. Federal privacy statutes include statutes like the Children’s Online Privacy Protection Act (COPPA), the Fair Credit Reporting Act, the Family Educational Rights and Privacy Act (FERPA), the Financial Services Modernization Act (Gramm-Leach-Bliley Act), the Health Insurance Portability and Accountability Act (HIPAA), the Privacy Act, the Right to Financial Privacy Act, the Stored Communications Act, and more. These statutes impose obligations on how such data should be collected, managed, and disclosed or not.

Likely you will already be complying with institutional review board requirements for human subjects research, and because you will already be adhering to federal privacy laws when you’re using financial, medical, and other federally-protected materials. There is little particularly unique to TDM in this type of research, other than that potential privacy problems are exacerbated by the volume of data you might be collecting. If your data set contains covered information from thousands of individuals, you could violate these statutes at much greater scale than with other types of research. This is why robust research data management plans covering the data for the entire lifecycle of your research are critical.

Because overall, because these federal laws cover very specific types of research not often implicated in digital humanities research, and because institutional review boards already provide oversight whenever your research does happen to involve federally-protected information, what we want to focus on instead is a third source of privacy law—the one most likely to have particular relevance to humanities TDM research because of the sources of information you may be using.

State statutes or common law (i.e. “torts”)

That third source of law is: general privacy laws created by states. These can either be creatures of state statutes, or what is considered “common law”— that is, law derived from the court opinions apart from any statute that might exist. These statutes and common law create what is called a tort cause of action resulting from an unlawful invasion of privacy. Torts are essentially a wrongful act or an infringement of a right (other than under contract) leading to civil legal liability. So a tort is basically a civil (as opposed to criminal) wrong that you could do to someone, and something that’s an infringement of some non-contract based right that either statutes or common law have created. For instance, if you have trespassed on someone’s property, you may have committed a tort. If you have interfered with their livelihood, you may have committed a tort. If you have defamed someone and caused them harm, you may have committed a tort. These are civil wrongs that infringe various personal rights that people hold.

In the context of privacy, there are typically four torts we need to be aware of as TDM researchers. Now, the existence and recognition of these privacy torts varies by state–making these waters very murky for cross-border research. There will always be questions of which state’s law applies, and in turn, what privacy torts are at issue. Typically, tort issues are determined by the local law of the state which has the most significant relationship to the occurrence of the invaston and the parties. But generally speaking anyway, these are the four privacy torts that most states recognize in some form or another—whether through statute or common law rights.

Although recognition of these four harms goes back much further, the torts were articulated by William Prosser in his California Law Review article titled “Privacy” in 1960, and they include:

  1. Intrusion upon seclusion or solitude, or into private affairs;
  2. Public disclosure of embarrassing private facts;
  3. Painting someone in a person in a false light in the public eye; and
  4. Appropriation of name or likeness.

As set forth in the legal encyclopedia American Jurisprudence,[2] the rationale behind recognizing these four torts is that:

One has a public persona, exposed and active, and a private persona, guarded and preserved, and the heart of our liberty is choosing which parts of our lives will become public and which parts we hold close…Courts have a unique and essential role in protecting the individual’s private life and ‘space’ from well-intentioned but ultimately oppressive, insulting, degrading, and demeaning intrusions, whether these intrusions come from the omnipresent forces of the state, or from the equally omnipresent and inescapable forces of the market.

To understand the Prosser torts, it’s also important to know that the right protected by a tort action for invasion of privacy is a personal right, specific to the individual whose privacy is invaded. In the absence of a state statute providing otherwise, the cause of action is not assignable, and it cannot be maintained by other persons, such as members of the individual’s family. This is why, as we’ll see, a person’s death typically extinguishes their right to privacy. We may feel bad—ethically—about disclosing the private affairs of deceased people, but the deceased people typically do not bear a privacy right in that information anymore under state statutes.

Digging into Prosser torts

So let’s talk about what these four torts protect.

  1. Both intrusion upon seclusion and public disclosure of embarrassing private facts require the invasion of something secret, secluded, or private. For there to be a tort on these grounds, a person must have had an objectively reasonable expectation of seclusion or solitude in the particular invaded place or as to the particular topic or matter intruded upon. In order for a defendant to be considered to have intruded into a place, conversation, or matter as to which the plaintiff has a reasonable expectation of privacy, the defendant must have penetrated some zone of physical or sensory privacy or obtained unwanted access to data by electronic or other covert means, in violation of the law or social norms. A defendant is not liable for invasion of privacy under the theory of intrusion upon seclusion if the plaintiff is already in public view at the time of the alleged invasion. This set up reveals that community standards are often important for gauging privacy invasions. Intrusion into private matters is not binary; there are nuances to societal recognition of expectations of privacy. By the same token, the fact that the privacy one expects in a given setting is not complete or absolute does not render the person’s expectation unreasonable as a matter of law. Notably, the law does not recognize a right of privacy in connection with further publication or amplification of information that is already public, or known to many people, or a matter of public record, or otherwise open to the public eye. For a fact to be considered private, someone must demonstrate an actual expectation that the disclosed fact remain private, and that society would recognize this expectation of privacy as reasonable and be willing to respect it. So again we see that community standards are important for gauging whether a privacy violation has occurred under these first two Prosser torts.
  2. Painting someone in a false light. This privacy tort is similar to the tort of defamation but there are different standards of proof. You’ve painted someone in a false light if you’ve published the information widely (i.e., not to just a single person, as in defamation); the publication identifies the plaintiff; there is an element of fiction or falsity; that falsity would be highly offensive to a reasonable person, and you were at fault in publishing the information.
  3. Appropriation of name or likeness protects a person’s exclusive use of his or her own identity. The phrase “name or likeness” embraces the concept of a person’s character. The tort does not protect one’s name per se but rather the value associated with that name, and typically only when done for commercial gain. We’re instead typically talking about leveraging someone’s name or likeness for your personal gain— to try to obtain for yourself the reputation, prestige, social or commercial standing, public interest, or other values of the underlying subject. You’re unlikely to have any such intentionality in non-profit research.

So we can see that mostly the two torts you’d be concerned about in the type of TDM research you’re doing are the first two Prosser torts: Intrusion upon seclusion, and public disclosure of embarrassing private facts. And further, intrusion upon seclusion requires some kind of invading of someone’s space where they have a realm of privacy to capture content; this is possible, but frankly unlikely in digital humanities research. This leaves “public disclosure of embarrassing private facts” as being the most likely Prosser tort to be at issue in TDM digital humanities research.

So the question becomes: Now that we know what we do about the common privacy torts (and specifically the public disclosure tort) that can arise in TDM research, what do we need to know about exceptions to that tort when making research choices?

Safeguards supporting TDM research

There are some inherent protections built into the nature of what a plaintiff must show to sustain a claim for Prosser torts that insulate you from some risk. Some protections are in the nature of burden of proof, and others are express exceptions

Regarding burden of proof, typically in order to succeed on a claim for intrusion on seclusion or public disclosure of private facts, under state statute or common law, plaintiffs must usually show:

  • That a reasonable person would have been offended or injured (not just that they are hypersensitive).
  • In turn, a determination of whether a defendant’s actions were reasonable is made by balancing the interests of the plaintiff in protecting his or her privacy from serious invasions with a defendant’s interest in pursuing its course of conduct.
  • And further, to sustain a claim, a plaintiff must show they actually suffered harm, such as mental distress or embarrassment.

These required showings provide important risk mitigations for researchers, as they are an impediment to lawsuits being filed or moving forward.

Exceptions supporting TDM research

Perhaps more importantly, though there are critical exceptions to various Prosser Privacy torts that are very favorable to TDM researchers:

  1. Public Interest: When it comes to public disclosure of private facts, the right of privacy is not violated by comment or disclosures as to matters of legitimate public interest. Relatedly, tort liability might also be inconsistent with the free speech and free press provisions of the First Amendment to the U.S. Constitution, as applied to state law through the Fourteenth Amendment. In these cases, courts often have to balance a person’s right to keep information private with your First Amendment right to disseminate information to the public. In achieving this balance, courts sometimes look to whether the facts you’re seeking to disclose  are of legitimate public concern and/or would be highly offensive to a reasonable person.
  2. Death: As we said earlier, a person’s death ends their right of privacy, though not necessarily their commercial right of publicity—that depends on state statute. However, you’re likely not doing your research for commercial gain anyway
  3. Unidentifiability: There are no privacy concerns if the people are not identifiable
  4. Consent: And finally, if someone has released the info themselves—such as on social media sites—or given you permission, they cannot sustain a privacy tort claim

An approach to mitigating risk

We want to take a moment to highlight here for you a potential practical approach to integrating consideration of these privacy torts and exceptions into your TDM research

It may come as no surprise to you that the same legal literacies researchers and professionals need to understand to navigate TDM research are critical for libraries to understand in determining what collections or corpora to make available for TDM research to begin with. At the UC Berkeley Library, we have launched what we call a Digital Lifecycle Program through which we digitized certain of our collections and make them available for free online for TDM and other research.

We have to answer the same copyright, contracts, privacy, and ethics questions in making the content available that you have to answer in using and publishing with it. And when it comes to the four privacy torts, we rely on similar exceptions that you as researchers would do. You can see from our own “Responsible Access Workflows” that if the subject matter of the collections is no longer living, or the subject matter is newsworthy or of public interest, from a state tort privacy perspective, digitization can proceed through the remaining workflows. We hope our workflows can be a practical way to help you work through privacy and other questions as your research proceeds.

International intersections

So far we’ve covered only U.S. law. What about international collaboration or if and how international privacy standards bleed into U.S. research? We know that researchers are not guaranteed to be insulated from international privacy regulation simply because their data collection is conducted within the United States. Data that is collected solely within the US may be produced, say, in France, or created by French citizens. The data may have been originally provided with the expectation and under the terms of use that appropriate local data protections would be followed. Many of these factors that should be taken into consideration may not be documented or readily accessible to a diligent researcher who inspects information prior to collection.

Ethically, legally, and practically, it is not safe to assume that the US definition of privacy is the sole relevant consideration. The contexts in which individuals share information online should play an important role in the sharing and use of information—even if U.S. or state privacy law doesn’t cover it.


Our guidance for Building LLTDM focuses mainly on U.S. law, but there are two international intersections that bear some attention regardless. The first is the General Data Protection Regulation, or “GDPR.” The GDPR was adopted in April 2016 and became enforceable beginning May 2018, and it deals with the protection of privacy and the collection and management of data. Basically, if a business doesn’t process an individual’s data in the correct way, it can be fined by the EU regulator.

At the core of GDPR is personal data as defined under European law. This is the type of information that allows a living person to be directly or indirectly identified from data that’s available, and it is much broader than under U.S. law. Personal data for purposes of GDPR can include something obvious, such as a person’s name, location data, or an online username, or it can be something that may be less apparent, such as an IP address. There are also a few special categories of personal data that are given greater protections, including information about racial or ethnic origin, political opinions, religious beliefs, membership of trade unions, genetic and biometric data, health information and data around a person’s sex life or orientation.

The GDPR aims to give individuals better control over their personal data. It enacts technical measures that dictate how businesses and other entities process personal data of EU citizens. Businesses and data controllers are required to enable safeguards to protect user data so that datasets are not publicly available by default, and can’t be used to identify subjects.

Even though GDPR is focused on the protection of EU citizens, it can also apply to entities that are based outside of the EU. So, if a business located in the US does business or has users in the EU, then the GDPR could apply to it. In turn, TDM researchers should care about regulations such as the GDPR because social media companies and other organizations that provide products and services to EU citizens is directly affected by these data protection rules.

First, let’s take a brief look at how the GDPR applies to data processors. These processors must follow seven protection and accountability principles when dealing with personal data.

  1. Processing must be lawful, fair, and transparent to the data subject.
  2. Processing must only be for the legitimate purposes specified explicitly to the data subject at the time of collection.
  3. It should collect and process only as much data as absolutely necessary for the purposes specified.
  4. Processors must keep personal data accurate and up to date.
  5. Processors may only store personally identifying data for as long as necessary for the specified purpose.
  6. Processing must be done in such a way as to ensure appropriate security, integrity, and confidentiality of the data.
  7. And finally, the data controller is responsible for being able to demonstrate GDPR compliance with all of these principles.

Next, let’s briefly look at the rights that must be provided to the individuals who are subject of the data collection. They have:

  1. The right to be informed
  2. The right of access
  3. The right to rectification
  4. The right to erasure
  5. The right to restrict processing
  6. The right to data portability
  7. The right to object
  8. And other rights in relation to automated decision making and profiling.

As we saw from the previous list, one of the user rights is the right to erasure, otherwise known as “the right to be forgotten.” Article 17 of the GDPR states, “The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay.” This right can be invoked when a particular situation arises. Some of these include:

  • When the personal data are no longer necessary in relation to the purposes for which they were collected or processed;
  • When the data subject withdraws consent;
  • When the personal data have been unlawfully processed;
  • And when the personal data must be erased to comply with a legal obligation in the EU or a Member State law
  • As well as a few other reasons.

We can see that under the GDPR there are powerful mechanisms for the protection of personal data, and also ways that users can demand that personal data be redacted from the holdings of data processors.

So, what does this mean for you as a TDM researcher? How could complicated regulations like GDPR affect the utility of particular data sets for research? If a dataset was thought to have been processed appropriately and a researcher wishes to use it to conduct TDM, what are the effects later if the dataset begins to develop holes since some of the information has been removed due to the right to be forgotten, or another redaction? It’s clear from reading parts of the GDPR that if personal data are being processed for scientific research purposes, the regulation indeed applies to that processing.

But, as under U.S. law, there are important limitations and exceptions to the rules that can provide a safety valve for particular types of activities. For example, we were just talking about Article 17, the right to be forgotten. This right is not an absolute user right. Article 17, as well as Article 89 delve more deeply into the safeguards relating to processing for archiving purposes that are in the public interest, as well as scientific, historical, and statistical research purposes.

These safeguards say that the GDPR provisions will not apply when certain circumstances arise. For example,

  • for exercising the right of freedom of expression and information;
  • for reasons of public interest in the area of public health;
  • for archiving purposes related to scientific, historical, and statistical research,
  • for the establishment, exercise or defence of legal claims
  • And for a few other reasons.

So, while GDPR has some strong protections for privacy rights of EU citizens, it also has some strong limitations and exceptions that support applicable research, including text and data mining. These limitations can give TDM researchers some flexibility in conducting their research without violating the law.

Chapter summary

We’ve seen the fairly circumscribed intersections between state tort “legal privacy” and digital humanities TDM research. Those risk junctures are so limited largely because of important research and public-interest related exceptions to state privacy law. But that doesn’t mean that we feel great about collecting, analyzing, and disseminating this content even if it is not technically “private” from a legal perspective. In the next chapter, we’ll address privacy from an ethical perspective.

  1. Suomela, T., Chee, F., Berendt, B., & Rockwell, G. (2019). Applying an Ethics of Care to Internet Research: Gamergate and Digital Humanities. Digital Studies/le Champ Numérique, 9(1), 4. DOI: http://doi.org/10.16995/dscn.302
  2. 62A Am. Jur. 2d Privacy § 1


Icon for the CC0 (Creative Commons Zero) license

To the extent possible under law, Beth Cate and Rachael Samberg have waived all copyright and related or neighboring rights to Building Legal Literacies for Text Data Mining, except where otherwise noted.

Share This Book