Much of our thinking about classification comes from the bibliographic domain. Libraries and the classification systems for the resources they contain have been evolving for millennia, shaped by the intellectual, social, and technological conditions of the societies that created them. As early as the third millennium BCE, there were enough written documents—papyrus scrolls or clay tablets—that the need arose to organize them. Some of the first attempts, by Mesopotamian scribes, were simple lists of documents in no particular order. The ancient Greeks, Romans, and Chinese created more principled systems, both sorting works by features such as language and alphabetical order, and placing them into semantically significant categories such as topic or genre. Medieval European libraries were tightly focused on Christian theology, but as secular books and readers proliferated thanks to new technologies and increased literacy, bibliographic classifications grew broader and more complex to accommodate them. Modern classification systems are highly nuanced systems designed to encompass all knowledge; however, they retain some of the same features and biases of their forebears.
We will briefly describe the most important systems for bibliographic classification, especially the Dewey Decimal Classification(DDC) and Library of Congress Classification(LCC) systems. However, there are several important ways in which bibliographic classification is distinctive and we will discuss those first:
- Scale, Complexity, and Degree of Standardization:
Department stores and supermarkets typically offer tens of thousands of different items (as measured by the number of “stock keeping units” or SKUs), and popular online commerce sites like Amazon.com and eBay are of similar scale. However, the standard product classification system for supermarkets has only about 300 categories. The classifications for online stores are typically deeper than those for physical stores, but they are highly idiosyncratic and non-standard. In contrast, scores of university libraries have five million or more distinct items in their collections, and they almost all use the same standard bibliographic classification system that has about 300,000 distinct categories.
- Legacy of Physical Arrangement, User Access, and Re-Shelving:
A corollary to the previous one that distinguishes bibliographic classification systems is that they have long been shaped and continue to be shaped by the legacy of physical arrangement, user access to the storage locations, and re-shelving that they support. These requirements constrain the evolution and extensibility of bibliographic classifications, making them less able to keep pace with changing concepts and new bodies of knowledge. Amazon classifies the products it sells in huge warehouses, but its customers do not have to pick out their purchases there, and most goods never return to the warehouse. Amazon can add new product categories and manage the resources in warehouses far more easily than libraries can.
With digital libraries, constraints of scale and physical arrangement are substantially eliminated, because the storage location is hidden from the user and the resources do not need to be returned and re-shelved. However, when users can search the entire content of the library, as they have learned to expect from the web, they are less likely to use the bibliographic classification systems that have painstakingly been applied to the library’s resources.
The Dewey Decimal Classification
The Dewey Decimal Classification(DDC) is the world’s most widely used bibliographic system, applied to books in over 200,000 libraries in 135 countries. It is a proprietary and de facto standard, and it must be licensed for use from the Online Computer Library Center(OCLC).
In 1876, Melvil Dewey invented the DDC when he was hired to manage the Amherst College library immediately after graduating. Dewey was inspired by Bacon’s attempt to create a universal classification for all knowledge and considered the DDC as a numerical overlay on Bacon with 10 main classes, each divided into 10 more, and so on. Despite his explicit rejection of literary warrant, however, Dewey’s classification was strongly influenced by the existing Amherst collection, which reflected Amherst’s focus on the time on the “education of indigent young men of piety and talents for the Christian ministry.”
The resulting nineteenth-century Western bias in the DDC’s classification of religion seems almost startling today, where it persists in the 23rd revision (see Figure: “Religion” in Dewey Decimal Classification.). “Religion” is one of the 10 main classes, the 200 class, with nine subclasses, Six of these nine subclasses are topics with “Christian” in the name; one class is for the Bible alone; and another section is entitled “Natural theology.” Everything else related to the world’s many religions is lumped under 290, “Other religions.”
200 Religion 210 Natural Theology 220 Bible 230 Christian theology 240 Christian moral and devotional theology 250 Christian orders and local church 260 Christian social theology 270 Christian church history 280 Christian sects and denominations 290 Other religions
The notational simplicity of a decimal system makes the DDC easy to use and easy to subdivide existing categories, So-called subdivision tables allow facets for language, geography or format to be added to many classes, making the classification more specific. But the overall system is not very hospitable to new areas of knowledge.
The Library of Congress Classification
The US Library of Congress is the largest library in the world today, but it got off to a bad start after being established in 1800. In 1814, during the War of 1812, British troops burned down the US Capitol building where the library was located and the 3000 books in the collection went up in flames. The library was restarted a year later when Congress purchased the personal library of former president Thomas Jefferson, which was over twice the size of the collection that the British burned. Jefferson was a deeply intellectual person, and unlike the narrow historical and legal collection of the original library, Jefferson’s library reflected his “comprehensive interests in philosophy, history, geography, science, and literature, as well as political and legal treatises.”
Restarting the Library of Congress around Jefferson’s personal collection and classification had an interesting implication. When Herbert Putnam formally created the Library of Congress Classification (LCC) in 1897, he meant it not as a way to organize all the world’s knowledge, but to provide a practical way to organize and later locate items within the Library of Congress’s collection. However, despite Putnam’s commitment to literary warrant, the breadth of Jefferson’s collection made the LCC more intellectually ambitious than it might otherwise had been, and probably contributed to its dominant adoption in university libraries.
The LCC has 21 top-level categories, identified by letters instead of using numbers like the DDC (see Figure: Top Level Categories in the Library of Congress Classification.). Each top-level category is divided into about 10-20 subclasses, each of which is further subdivided. The complete LCC and supporting information takes up 41 printed volumes.
A — GENERAL WORKS B — PHILOSOPHY. PSYCHOLOGY. RELIGION C — AUXILLARY SCIENCES OF HISTORY (GENERAL) D — WORLD HISTORY (EXCEPT AMERICAN HISTORY) E — HISTORY: AMERICA F — HISTORY: AMERICA G — GEOGRAPHY. ANTHROPOLOGY. RECREATION H — SOCIAL SCIENCE J — POLITICAL SCIENCE K — LAW L — EDUCATION M — MUSIC N — FINE ARTS P — LANGUAGE AND LITERATURE Q — SCIENCE R — MEDICINE S — AGRICULTURE T — TECHNOLOGY U — MILITARY SCIENCE V — NAVAL SCIENCE Z — BIBLIOGRAPHY. LIBRARY SCIENCE
Bias is apparent in the LCC as it is in the DDC, but is somewhat more subtle. A library for the US emphasizes its own history. “Naval science” was vastly more important in the 1800s when it was given its own top level category, separated from other resources about “Military science” (which had a subclass for “Cavalry”).
The LCC is highly enumerative, and along with the uniqueness principle, this creates distortions over time and sometimes requires contortions to incorporate new disciplines. For example, it might seem odd today that a discipline as broad and important as computer science does not have its own second level category under the Q category of science, but because computer science was first taught in math departments, the LCC has it as the QA76 subclass of mathematics, which is QA.
The BISAC Classification
A very different approach to bibliographic classification is represented in the Book Industry Standards Advisory Committee classification(BISAC). BISAC is developed by the Book Industry Study Group(BISG), a non-profit industry association that “develops, maintains, and promotes standards and best practices that enable the book industry to conduct business more efficiently.” The BISAC classification system is used by many of the major businesses within the North American book industry, including Amazon, Baker & Taylor, Barnes & Noble, Bookscan, Booksense, Bowker, Indigo, Ingram and most major publishers.
The BISAC classifications are used by publishers to suggest to booksellers how a book should be classified in physical and online bookstores. Because of its commercial and consumer focus, BISAC follows a principle of use warrant, and its categories are biased toward common language usage and popular culture. Some top-level BISAC categories, including Law, Medicine, Music, and Philosophy, are also top-level categories in the LCC. However, BISAC also has top-level categories for Comics & Graphic Novels. Cooking, Pets, and True Crime.
The differences between BISAC and the LCC are understandable because they are used for completely different purposes and generally have little need to come into contact. This changed in 2004, when Google began its ambitious project to digitize the majority of the world’s books. (See the sidebar, What Is a Library?). To the dismay of many people in the library and academic community, Google initially classified books using BISAC rather than the LCC.
In addition, some new public libraries have adopted BISAC rather than the DDC because they feel the former makes the library friendlier to its users. Some librarians believe that their online catalogs need to be more like web search engines, so a less precise classification that uses more familiar category terms seems like a good choice.
One of the earliest known libraries—at Nippur in Mesopotamia—was small enough that its catalog needed no particular organization: the list of titles in the collection fit onto two easily scanned clay tablets. As collections grew, scribes made it easier to browse the contents of a collection by adding “colophons,” brief descriptions containing a document’s title, author, and place in a sequence of tablets (Casson 2002). A further step was the sorting of works into categories. A temple in the ancient Egyptian city of Edfu placed books into different trunks based on their topics, including royal duties, temple management, and timekeeping, as well as two trunks each for astronomy and protection from crocodiles. The fabled library of Alexandria in ancient Greece used categories based on Aristotle’s three modes of thought: theoretical (e.g. mathematics, physics, metaphysics), practical (ethics, politics, economics), and poetic (poetry, music, and art), plus a fourth “meta-category,” logic, that applied to all of them. Callimachus, one of the library’s directors, created the Pinakes, a library catalog whose top-level distinction was between poetry and prose (followed by genre, author, and work). A few centuries later, librarians in the Chinese Wei and Jin dynasties (third-fifth centuries CE) settled on four major categories—classics, philosophy, history, and literature—that lasted well into the twentieth century. (Shamurin 1955) Unlike the Greek system, which classified authors, the Chinese system classified individual works; some authors have suggested that this reflects Western cultures’ greater emphasis on the individual. Medieval libraries adapted ancient practices for their own needs: monastery libraries had separate cabinets for topics such as Bibles, Church history, and Christian poets, and divided their collections into Christian and secular literature (meanwhile, scholars in the intellectually flourishing Muslim world classified knowledge into Muslim and non-Muslim sciences) (Christ 1984). Today’s classification systems reflect both their debt to earlier systems and the biases of their own cultures: the first category of the Universal Decimal Classification, just like Aristotle’s “logic” category, is a meta-category covering organization, documentation, and information science, while the first top-level category of the Chinese Classification System is “Marxism, Leninism, Maoism, and Deng Xiaoping theory.”
(Taylor and Joudrey 2009, Ch. 3) is a historical review of library classification. (Svenonius 2000) reviews the evolution of the theoretical foundations. (Kilgour 1998) focuses on the evolution of the book and the story of the co-evolution of libraries and classification comes along for the ride.
Supermarkets typically carry anywhere from 15,000 to 60,000 SKUs (depending on the size of the store), and may offer a service deli, a service bakery, and/or a pharmacy. 300 standard product categories (
Dewey Decimal Classification:
https://www.amherst.edu/aboutamherst/history. Today Amherst is aggressively co-ed and secular.
That was not a typo. The “War of 1812” lasted well into 1815. The persistence of an inaccurate name for this war reflects its unique characteristics. Wars (in the English language) are generally named for the location of the fighting or the enemy being fought (the Mexican-American War, the Korean War, the Vietnam War, the Iraq War), or for a particular ideal or ambition (the Revolutionary War, the Civil War). The War of 1812 does not satisfy any of these naming conventions; the war was fought across a huge range of geography from eastern Canada to Louisiana, between a diverse range of groups from Canadians and Native American tribes, with national armies getting involved very late in the war. While nominally fought over freedom the seas, the war quickly morphed into one about territorial ambition in North America. Of course, if the world were a place where people could agree on naming standards for wars, it is likely we would no longer have wars. See
For additional examples, (Shirky 2005).
Cognitive Science has an even harder time finding its proper place in the LCC because it emerged as the intersection of psychology, linguistics, computer science, and other disciplines. Cognitive science books can be found scattered throughout the LCC, with concentrations in BF, P, and QA.
The Book Industry Study Group(BISG) first and foremost is focused on resource description and classification as means to business ends; this purpose contrasts with goals of DDC or LOC. BISG classifications are used for barcodes and shipping labels to support supply chain and inventory management, marketing, and promotion activities. See
What some call the “Perry Rebellion” or the “Dewey Dilemma” began in 2007 when the new Perry Branch Library in Gilbert, Arizona opened with its books classified using the BISAC rather than Dewey classifications. (Fister 2009). This is a highly inflamed controversy that pits advocates of customer service and usability against the library establishment, which despises the idea of turning to retailing as inspiration when designing and operating a library. Even if BISAC gets more widely adopted in public libraries it is unimaginable that it can be used in research libraries.