32 An Overview of Resource Description

We describe resources so that we can refer to them, distinguish among them, search for them, manage access to them, preserve them, and make predictions about what might happen to them or what they might do. Each purpose may require different resource descriptions. We use resource descriptions in every communication and conversation; they are the enablers of organizing systems.

Naming {and, or, vs.} Describing

Resources in Organizing Systems discussed how to decide what things should be treated as resources and how names and identifiers distinguish one resource from another. Names can suggest the properties and principles an organizing system uses to arrange its resources. We can see how societies organize their people by noting that among the most common surnames in English are descriptions of occupations (Smith, Miller, Taylor), descriptions of kinship relations (Johnson, Wilson, Anderson), and descriptions of appearance (Brown, White).[1]

In many cultures, one spouse or the other takes a name that describes their marital relationship. In many parts of the English-speaking world, married women have often referred to themselves using their husband’s name.[2]

Similarly, many other kinds of resources have names that are property descriptions, including buildings (Pentagon, White House), geographical locations (North America, Red Sea), and cities (Grand Forks, Baton Rouge).

Every resource can be given a name or identifier. Identifiers are especially efficient resource descriptions because, by definition, identifiers are unique over some domain or collection of resources. Names and identifiers do not typically describe the resource in any ordinary sense because they are usually assigned to the resource rather than recording a property of it.

However, the arbitrariness of names and identifiers means that they do not serve to distinguish resources for people who do not already know them. This is why we use what linguists call referring expressions or definite descriptions, like “the small black dog” rather than the more efficient “Blackie,” when we are talking to someone who does not know that is the dog’s name.[3]

Similarly, when we use a library catalog or search engine to locate a known resource, we query for it using its name, or some specific information we know about it, to make it easier to find. In contrast, when we look for resources to satisfy an information need but do not have specific resources in mind, we query for them using descriptions of their content or other properties. In general, information retrieval can be characterized as comparing the description of a user’s needs with descriptions of the resources that might satisfy them.

Description” as an Inclusive Term

Up to now we have used the concept of “description” in its ordinary sense to mean the labeling or explaining of the visible or important features that characterize or represent something. However, the concept is sometimes used more precisely in the context of organizing systems, where resource description is often more formal, systematic, and institutional. In the library science context of bibliographic description, a descriptor is one of the terms in a carefully designed language that can be assigned to a resource to designate its properties, characteristics, or meaning, or its relationships with other resources. In the contexts of conceptual modeling and information systems design, the terms in resource descriptions are also called “keywords,” “index terms,attributes, attribute values, elements, “data elements,” “data values,” or “the vocabulary.In business intelligence, predictive analytics or other data science contexts these are called “variables,” “features,properties, or “measurements. In contexts where descriptions are less formal or more personal the description terms are often called “labels” or “tags.” Rather than attempt to make fine distinctions among these synonyms or near-synonyms, we will use “description” as an inclusive term except where conventional usage overwhelmingly favors one of the other terms.

Many of these terms come from a narrow semantic scope in which the purpose of description is to identify and characterize the essence, or aboutness, of a resource. However, as it becomes trivial to associate computationally generated information with resources, many additional kinds of information beyond strict “aboutness” can support additional interactions. We describe many of these purposes and the types of information needed to enable them in “Determining the Purposes”. We apply resource description in an expansive way to accommodate all of them.

Resources in Organizing Systems introduced the distinction of “Resource Focus” to contrast primary resources with resources that describe them, which we called Description Resources. We chose this term as a more inclusive and more easily understood alternative to two terms that are well established in organizing systems for information resources: bibliographic descriptions and metadata. We will also distinguish resource description as a general concept from the narrower senses of statistical description, tagging of web resources, and the Resource Description Framework(RDF) language used to make statements about web resources and physical resources that can be identified on the Web.

Bibliographic Descriptions

The purposes and nature of bibliographic description are the foundation of library and information science and have been debated and systematized for nearly two centuries. Bibliographic descriptions characterize information resources and the entities that populate the bibliographic universe, which include works, editions, authors, and subjects.[4]

A bibliographic description of an information resource is typically realized as a structured record in a standardized format that describes a specific resource. The earliest bibliographic records in the nineteenth century were those in book catalogs, which organized for each author a list of his authored books, with separate entries for each edition and physical copy. Relationships between books by different authors were described using cross-references.

The nature and extent of bibliographic descriptions were highly constrained by the book catalog format, which also made the process of description a highly localized one because every library or collection of resources created its own catalog. The adoption of printed cards as the unit of organization for bibliographic descriptions around the turn of the twentieth century made it easier to maintain the catalog, and also enabled the centralized creation of the records by the Library of Congress.

The computerization of bibliographic records made them easier to use as aids for finding resources. However, digitizing legacy printed card-oriented descriptions for online use was not a straightforward task because the descriptions had been created according to cataloging rules designed for collections of books and other physical resources and intended only for use by people.

Metadata

Metadata is often defined as “data about data,” a definition that is nearly as ubiquitous as it is unhelpful. A more content-full definition of metadata is that it is structured description for information resources of any kind. Metadata is more useful when supported by a metadata schema that defines the elements in the structured description.[5]

The concept of metadata originated in information systems and database design in the 1970s, so it is much newer than that of bibliographic description. The earliest metadata schemas, called data dictionaries, documented the arrangement and content of data fields in the records used by transactional applications on mainframe computers. A more sophisticated type of metadata emerged as the documentation of the data models in database management systems, called database schemas, which described the structure of relational tables, attribute names, and legal data types and values for content.

In 1986, the Standard Generalized Markup Language(SGML) formalized the Document Type Definition(DTD) as a metadata form for describing the structure and content elements in hierarchical and hypertextual document models. SGML was superseded in 1997 by eXtensible Markup Language(XML), whose purpose was structured and computer-processable web content.[6]

Today, XML schemas and other web- and compute-friendly formats for resource description have broadened the idea of resource description far beyond that of bibliographic description to include the description of software components, business and scientific datasets, web services, and computational objects in both physical and digital formats. The resource descriptions themselves serve to enable discovery, reuse, access control, and the invocation of other resources needed for people or computational agents to effectively interact with the primary ones described by the metadata.[7]

Tagging of Web-based Resources

The concept of metadata has been extended to include the tags, ratings, bookmarks or other types of descriptions that individuals apply to individual photos, blog or news items, or any other resource with a web presence. The practice of tagging has emerged as a way to apply labels to content in order to describe and identify it. Sets of tags are useful in managing one’s collection of websites or digital media, in sharing them with others, and enabling new types of interactions and services.[8] For example, users of Last.fm tag music with labels that describe its nature, era, mood, or genre, and Last.fm uses these tags to generate radio stations that play music similar to that tag and related tags.

But tagging has a downside. The tendency for users to tag intuitively and spontaneously revives the vocabulary problem (“Naming Resources”) because one photographer’s “tree” is another’s “oak.” Likewise, unsystematic word choice leads to morphological inconsistency (“Relationships among Word Forms”); the same photo might be tagged with “burning” and “trees” and also with “burn” and “tree” by another. This disparity in the descriptors people use to categorize the same or similar resources can turn systems that use tagging into a “tag soup” lacking in structure.[9]

Some social media sites have incorporated mechanisms to make the tagging activity more systematic and to reduce vocabulary problems. For example, on Facebook, users can indicate that a specific person is in an uploaded picture by clicking on the faces of people in photographs, typing the person’s name, and then selecting the person from a list of Facebook friends whose names are formatted the way they appear on the friend’s profile. Some social media systems suggest the most popular tags, perform morphological normalization, or allow users to arrange tags in bundles or hierarchies.[10]

Resource Description Framework (RDF)

The Resource Description Framework(RDF) is a standard model for making computer-processable statements about web resources; it is the foundation for the vision of the Semantic Web.[11] We have been using the word “resource” to refer to anything that is being organized. In the context of RDF and the web, however, “resource” means something more specific: a resource is anything that has been given a Uniform Resource Identifier(URI). URIs can take various forms, but you are probably most familiar with the URIs used to identify web pages, such as http://springfield-elementary.edu/. (You are probably also used to calling these URLs instead of URIs.) The key idea behind RDF is that we can use URIs to identify not only things “on” the web, like web pages, but also things “off” the web like people or countries. For example, we might use the URI http://springfield-elementary.edu/ to refer to Springfield Elementary itself, and not just the school’s web page.

RDF models all descriptions as sets of “triples,” where each triple consists of the resource being described (identified by a URI), a property, and a value. Properties are resources too, meaning they are identified by URIs. For example, the URI http://xmlns.com/foaf/0.1/schoolHomepage
identifies a property defined by the Friend of a Friend(FOAF) project for relating a person to (the web page of) a school they attended. Values can be resources too, but they do not have to be: when a property takes simple values like numbers, dates, or text strings, these values do not have URIs and so are not resources.</span

>

Because RDF uses URIs to identify described resources, their properties, and (some) property values, the triples in a description can be connected into a network or graph. Figure: RDF Triples Arranged as a Graph. shows four triples that have been connected into a graph. Two of the triples describe Bart Simpson, who is identified using the URI of his Wikipedia page.[12] The other two describe Lisa Simpson. Two of the triples use the property age, which takes a simple number value. The other two use the property schoolHomepage, which takes a resource value, and in this case they happen to have the same resource (Springfield Elementary’s home page) as their value.

 

RDF Triples Arranged as a Graph

 

Depicts a set of RDF statements, related to Bart and Lisa Simpson, arranged as a graph.

Two RDF triples can be connected to form a graph when they have a resource, property, or value in common. In this example RDF triples that make a statement about the home page of the elementary school attended by Bart Simpson and Lisa Simpson can be connected because they have the same value, namely the URI for Springfield Elementary.

Using URIs as identifiers for resources and properties allows descriptions modeled as RDF to be interconnected into a network of “linked data,” in the same way that the web enabled information to be interconnected into a massive network of “linked documents. Proponents of RDF claim that this will greatly benefit knowledge discovery and inference.[13] But the benefits of RDF’s highly prescriptive description form must be weighed against the costs; turning existing descriptions into RDF can be labor-intensive.

RDF can be used for bibliographic description, and some libraries are exploring whether RDF transformations of their legacy bibliographic records can be exposed and integrated with resource descriptions on the open web. This activity has raised technical concerns about whether the RDF model of description is sophisticated enough and more fundamental concerns about the desirability of losing control over library resources. [14]

Aggregated Information Objects

In the pre-digital age, information objects came with explicit tangible boundaries. Books consisted of pages bound within a cover, a vinyl record album physically bound together a set of songs (you could even see the groove pattern separating the songs), a movie was delivered on a strip of film spooled onto a reel, and a collection was (usually) demarcated as a designated shelf or room in a library.

Boundaries of information objects in the digital realm are neither tangible nor obvious. Consider the simple notion of a web page. Our cognitive notion of that which is rendered in our browser window (e.g., some formatted text with an associated image) is actually, in web architecture terms (Jacobs & Walsh, 2004), three information objects (aka resources); the HTML encoding the text, CSS that defines the formatting rules, and the JPEG that encodes the image. All three have URLs and can independently be retrieved and linked. The situation is even more ambiguous for the common notion of a web site, the boundaries of which are not defined technically and are cognitively difficult to express.

Aggregations can be convenience methods for simplifying dissemination or organization, but they can also be transformative; resources can derive nearly all their value from their inclusion in an aggregation. On a web page, the CSS file is virtually useless on its own, since its role is to style the HTML file. In iTunes, the playback and organization functions are optimized for pop music, where individual songs can usually stand on their own when separated from the rest of an album. Classical music fans often struggle with this, because the individual “tracks” of a recording, split up to reduce file size and facilitate navigation through long works, are not separable; pieces are meant to be listened to in their entirety, and it can be difficult to ensure that they are aggregated together and have the proper metadata assigned to their aggregations. In other words: you can’t listen to symphonies on shuffle.[15]

The problem here is how to architecturally and technically express the notion of an aggregation, a set of information objects that, when considered together, compose another named information object. Aggregations are prevalent all over our digital information space: the web page and site mentioned above; a scholarly publication consisting of text, figures, and data; a dataset that is the composition of multiple data files. Notably the notion is both recursive and non-exclusive. An object that is itself an aggregation may be aggregated into another object. Information objects included in one aggregation may also be included in other aggregations, allowing reuse and re-factoring of existing information objects. A solution to this problem is a critical aspect of organizing digital information because, without well-defined boundaries we cannot deterministically identify, reference, or describe information objects.

The following are a number of technical approaches to the aggregation problem.

 

Kahn-Wilensky Digital Object Framework. Robert Kahn and Robert Wilensky coined the term “digital object” in a paper describing the core components of digital library infrastructure; the content, naming scheme, repository configuration, and access protocol. The digital object they describe is a uniquely-identified container packaging multiple data and metadata components. The model incorporates the recursive notion of a container of containers. The Kahn/Wilensky framework was the inspiration for the Fedora (Flexible Extensible Digital Object Repository Architecture) system, open source software deployed worldwide in information applications that leverage the container model.[16]

Warwick Framework. An early result of work by the Dublin Core Metadata Initiative, the Warwick Framework was motivated by the desire to associate multiple metadata packages with content (e.g., descriptive, rights, administrative). It specifies a container architecture with distinct metadata packages that could be included directly or by-reference, allowing reuse of individual packages. The Warwick Framework was the inspiration for METS, described next.[17]

METS. The Metadata Encoding and Transmission Specification is a widely deployed XML-based container format that packages together multiple metadata components and content, both either directly or by-reference. Metadata packages are classified into one of a set of pre-defined types; descriptive, administrative, rights, and structural. METS is specified by an XML schema that has been extended for a variety of specialized applications.[18]

OAI-ORE. The Open Archives Initiative’s Object Reuse and Exchange specification was motivated by a desire for an aggregation architecture fully congruent with web architecture principles, thus exposing aggregations to standard web tools, e.g. browsers, crawlers, HTTP servers. OAI-OARE introduces the notion of two types of URI-identified web resources; a resource map and an aggregation. When de-referenced through its respective URI, the former returns an REF/XML formatted description that establishes the identity of the aggregation, the resources that are included in the aggregation, and the semantics of the relationships among them.[19]

Frameworks for Resource Description

The broad scope of resources to which descriptions can be applied and the different communities that describe them means that many frameworks and classifications have been proposed to help make sense of resource description.

 

Architectures for Resource Description

 

Two contrasting architectures for resource description are depicted. The first architecture, labeled “Separate Descriptions,” presents a central oval labeled “Primary Resource” surrounded by six ovals labeled “Description Resource” each with an arrow to the central oval. The second architecture, labeled “Package of Descriptions,” presents a collection of “Resource Description” ovals with a single arrow to an oval labeled “Primary Resource.”

Two contrasting architectures for resource descriptions are separate descriptions versus packaged descriptions, which were dominant in library catalogs with printed cards containing descriptions about a resource.

The dominant historical view treats resource descriptions as a package of statements; this view is embodied in the printed library card catalog and its computerized analog in the MARC21 format (an exchange format for library catalog records), which contains many fields about the bibliographic characteristics of an object like author, title, publication year, publisher, and pagination. An alternate architecture for resource description focuses on each individual description or assertion about a single resource, as the RDF and linked data approaches do. These two alternatives are contrasted in Figure: Architectures for Resource Description.

In either case, these common ways of thinking about resource description emphasizeor perhaps even overemphasizetwo implementation decisions:

  • The first is whether to combine multiple resource descriptions into a structural package or to keep them as separate descriptive statements.

  • The second is the choice of syntax in which the descriptions are encoded.

Both of these implementation decisions have important implications, but are secondary to the questions about the purposes of resource description, how resource properties are selected as the basis for description, how they are best created, and other logical or design considerations. In keeping with a fundamental idea of the discipline of organizing (introduced in “The Concept of “Organizing Principle””), it is imperative to distinguish design principles from implementation choices. We treat the set of implementation decisions about character notations, syntax, and structure as the form of resource description and we will defer them as much as we can until The Forms of Resource Descriptions.

In library and information science, it is very common to discuss resource descriptions using a classification proposed by Arlene Taylor, which distinguishes administrative, structural, and descriptive metadata.[20] A similar typology proposed by Gilliland breaks metadata down into five types: administrative, descriptive, preservation, use, and technical.[21]

Resource description is not an end in itself. Its many purposes are all means for enabling and using an organizing system for some collection of resources. As a result, our framework for resource descriptions aligns with the activities of organizing systems we discussed in Activities in Organizing Systems: selecting, organizing, interacting with, and maintaining resources.


  1. (Reaney and Wilson 1997) classify surnames as local, surnames of relationship, surnames of occupation or office, and nicknames. The dominance of occupational names reflects the fact that there are fewer occupations than places. While there are only a handful of kinship relationships used in surnames (patronymic or father-based names are most common), because the surname includes the father’s name there is more variation than for occupations.

  2. This odd convention is preserved today in wedding invitations, causing some feminist teeth gnashing (Geller 1999).

  3. See (Donnellan 1966). A contemporary analysis from the perspective of cognitive science is (Heller, Gorman, and Tanenhaus 2012).

  4. Despite the “biblio-” root, bibliographic descriptions are applied to all of the resource types contained in libraries, not just books. Note also that this definition includes not just the information resources being described as distinct instances, but also as sets of related instances and the nature of those relationships.

    An excellent source for both the history and theory of bibliographic description is The Intellectual Foundation of Information Organization by Elaine Svenonius (Svenonius 2000). She divides bibliographic descriptions into “those that describe information from those that describe its documentary embodiments,” contrasting conceptual or subject properties from those that describe physical properties (p. 54). A more radical contrast was proposed by (Wilson 1968, p. 25), who distinguished descriptions according to the kind of bibliographic control they enabled. Descriptive control is objective and straightforward, lining up a population of writings in any arbitrary order. Exploitative control, defined as the ability to make the best use of a body of writings, requires descriptions that evaluate resources for their suitability for particular uses. Wilson argued that descriptive control was a poor substitute for exploitative control, but recognized that evaluative descriptions were more difficult to create.

  5. (Gill 2008)

  6. (Rubinsky and Maloney 1997) capture this transitional perspective. A more recent text on XML is (Goldberg 2008).

  7. See (Sen 2004), (Laskey 2005).

  8. See (Marlow, Naaman, Boyd, and Davis 2006). These authors propose a conceptual model of tagging that includes (1) tags assigned to a specific resource, (2) connections or links between resources, and (3) connections or links between users and explain how any two of these can be used to infer information about the other.

  9. (Hammond, Hanney, Lund, and Scott 2004) coined the phrase “tag soup” in an review of social bookmarking tools written early in the tagging era that remains insightful today. Many of the specific tools are no longer around, but the reasons why people tag are still the same.

  10. Making tagging more systematic leads to “tag convergence” in which the distribution of tags for a particular resource stabilizes over time (Golder and Huberman 2006). Consider three things a user might do if his tag does not match the suggested tags; (1) Change the tag to conform? (2) Keep the tag to influence the group norm? (3) Add the proposed tag but keep his tag as well?

  11. (RDF Working Group 2004). The official source for all things RDF is the W3C RDF page at http://www.w3.org/RDF/.

  12. Some argue that the resource being described is thus Bart Simpson’s Wikipedia page, not Bart Simpson himself. Whether or not that is an important distinction is a controversial question among RDF architects and users.

  13. (Heath and Bizer 2011) and http://linkeddata.org are excellent sources.

  14. (Byrne and Goddard 2010) present a balanced analysis of the cultural and technical obstacles to the adoption of RDF and linked data in libraries. (Yee 2009) is a highly specific technical demonstration of converting bibliographic descriptions to RDF. A detailed analysis / rebuttal of Yee’s article is at http://futurelib.pbworks.com/w/page/13686677/YeeRDF.

  15. (Pancake 2012)

  16. (Kahn & Wilensky, 1995) (Lagoze, Payette, Shin, & Wilper, 2005)

  17. (Lagoze, 1996)

  18. (McDonough, 2006)

  19. (Lagoze et al., 2008)

  20. (Taylor and Joudrey 2009) Taylor’s book on The Organization of Information, now in its 3rd edition (with co-author Daniel Joudrey), has been widely used in library science programs for over a decade.

  21. (Gilliland-Swetland 2000).

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

The Discipline of Organizing: 4th Professional Edition Copyright © 2020 by Robert J. Glushko is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.

Share This Book