By Gracen Brilmyer, December 2014.
Overview. The CalBug project, housed out of the The Essig Museum of Entomology at the University of California, Berkeley, is a collaborative initiative between nine California institutions with a goal to digitize over a million specimens. Digitization involves imaging both specimens and their labels as well as storing their collection info in a database. The CalBug project also is attempting to georeference, or locate the original latitude and longitude coordinates, for these million specimens (some dating back to the 18th century) so that they can be better used for research. The project uses many student workers, graduate students, and volunteers to capture the images and data. Over the past few years, it has participated in the Notes from Nature project, which helps connect citizen scientists to scientific research. Through the images generated of the specimen labels by the team at the Essig Museum, citizen scientists digitally transcribe the data that can be read from the image. The Essig, after each label is transcribed by 24 citizen scientists, runs an R program to find the most accurate transcription and transfer it into the Essig’s database. These combined efforts have accumulated in over 209,000 specimen records and over 400,000 images and counting. This project has a large scope and an ever-increasing scale.
What is being organized? The insect specimens in the CalBug project are digitized on an individual level, with unique identifying numbers, and new specimen records and their associated data are continually being added to the digital collection. Both the specimens and their data are being organized. Existing groups of specimens are prioritized for digitization and new physical specimens are accessioned into the collection and are databased upon arrival.
Why is it being organized? An individual specimen’s associated data can be highly variable; however, as long as a specimen has the time and place of its collection (no matter how vague) associated with it, it is valuable research material. The physical specimens are organized to facilitate the collection manager’s use of the collection. When physical specimens need to be borrowed, they must be efficiently found, packaged, and sent out on loan, so fastidious organization is key when locating thousands of specimens. The digital organization of the collection also facilitates the duties of museum staff and the collection manager by allowing for expanded interaction with the collection by using the database. The digital collection’s web interface, undergoing a redesign as of the time of this writing, makes the collection accessible for researchers and novices alike, as well as to foster data sharing to other data repositories. Since the specimen data follows digital curatorial standards, a web interface that allows these fields to be easily searchable and navigable can add to the use of the collection for a broader audience, which is a major impetus for the redesign.
How much is it being organized? As discussed in the previous section, the specimens and their information are subject to multiple levels of organization, and each level of organization supports a different type of user. The data of the CalBug Project is organized according to Darwin Core (DwC), a standard “designed to facilitate the exchange of information about the geographic occurrence of species and the existence of specimens in collections.”1 Certain specimen attributes have concrete institutional parameters, such as unique identifying numbers and taxonomic identification, while others have less strict parameters (e.g. a precise location of where a specimen is found), although they still must use specific DwC fields. Although there are institutional taxonomies in place for information associated with a specimen’s collection and identification, the CalBug search interface design in Figure: CalBug search interface allows for an outward-facing reorganization of the existing fields.
When is it being organized, and by whom? The categorization and organization happens at multiple times for one specimen. If identified, the specimen is already inserted into the taxonomic classification scheme—the hierarchy of how species are related. This scientific warrant is inherited and replicated in the physical curation of the collection, and specimens are further sorted (within a taxon) by geographic region. Aligning with taxonomic categories provides a clear hierarchy for sorting and locating physical specimens and, with changes in taxonomy having to be published, makes collection maintenance fairly consistent.
The specimens are organized a second time when they are databased, either by interns or through Notes from Nature. The data is stored in a MySQL database that uses mostly DwC fields, an institutional taxonomy for specimen data. The digitization of specimens, through utilizing DwC institutional semantics, makes collection maintenance, governance, and interaction easier, as the collection manager can search in a multifaceted manner, better understand the holdings of the museum, and track specimens for loans. The unique specimen numbers allow for individual tracking, and the other DwC fields provide multiple areas for accurate search and retrieval.
For the CalBug web search interface, the specimens retain their classification hierarchy within the database. However, the outward-facing search fields aim to serve a broader audience, not just the collection manager and museum staff. Thus the search application organizes the resources a third time “on the way out” of the database in response to a user query. As this design is optimized for researchers and students, the classification appears to focus more on taskonomy instead of the institutional taxonomy (see Figure: CalBug search interface). The 20 search fields provided in the search interface, while actually searching through the ~100 fields in the database, facilitate precise information retrieval. Although fewer search fields might yield lower accuracy, user testing has shown that the new search design improves accuracy by not requiring users to know exactly which DwC field to query.
The search is further expanded by having a ‘Search any field’ box, which literally looks in every DwC field for a term, as well as a “Common Name” field, to support novice searches, such as “beetle” and “butterfly” instead of “coleoptera” and “lepidoptera.” The intrinsic properties of the specimens lend the results to simple (alphabetic and numeric) sorting as well as filtering (through the “Refine” option) on the list view of the results pages. Additional views of results, including a map view showing collection locations and a grid view that displays specimen photos, help users locate desired specimens and reorganize as needed to suit their needs.