Electronic Materials Logo

Organization & Representation of Information


To read the transcript of this video, go to Transcripts.

Objectives
Implement metadata standards associated with digital libraries.
Define and apply controlled vocabularies.

This page provides an overview of how all the elements that librarians use in organization and representation of information work together to create digital objects that are stored and accessed in digital collections.

Background Information

Students enter this section of the course with varying backgrounds. The following courses establish the skills necessary to create large digital library projects:

If you haven't had these courses, you may need to spend extra time exploring the resources on this page. Everything you need is here if you take the time to work through the documentation provided. You might also want to read Miller, Steven J. (2011). Metadata for Digital Collections (How-to-Do-It Manual). New York: NealSchuman, if you plan to work with digital collections. This is the course textbook for the metadata course.

readRead!
If you haven't taken the metadata course, read the following TWO articles.

Phillips, Jennifer (2013). Learning about metadata. In, J. Monson, LITA Guide: Jump-Start Your Career as a Digital Librarian. American Library Association, 127-144. Available as an ebook through IUPUI.

Southwick, Silvia B. & Skoric, Jane (2013). Metadata into Practice. In, J. Monson, LITA Guide: Jump-Start Your Career as a Digital Librarian, 127-144. American Library Association. Available as an ebook through IUPUI.

If you've had all three courses, you can focus on the articles directly related to digital libraries.

The Big Picture

“Vast digital collections are of limited use if researchers cannot access the materials of interest to them. While this statement may seem banal at first, there in fact remains a critical gulf between the amount of material that is available to researchers and the ability researchers have to find the materials they need within digitized collections of primary materials" (Lorang, Soh, Datla, Kulwicki, 2015)

This problem is evident across academic areas. While a growing number of digital collection exist, the discovery and use of these collections remains a problem. Digital humanities researchers want more effective ways to locate and analyze texts, while scientists are seeking tools to help them visualize data. One of the keys to addressing these problems is developing and applying standards to the description of digital objects to make them more easily discoverable. Another solution is building enhanced tools for data analysis.

videoWatch!
Skim the video Using Metadata from the Public Library Partnerships Project curriculum.
Think about how metadata might apply to your own digital project ideas.

This page will focus on ways that metadata can be used to increasing discoverability of items in digital libraries.

Metadata Primer

work libraryMetadata is simply "data about data". It's descriptive information about an object that is used for resource discovery. Metadata provides digital identification of items, helps organize resources, and supports preservation.

Metadata provides information about creation of the item, purpose, time, date, creator, location, and much more. For a digital image, metadata might include the size of the file, the color depth, the image resolution, the date of creation, the photographer, and other information such as a description.

In some cases, metadata is automatically generated by the software used for content management such as file-sizes and indexing. However in most cases, a human must describe the item such as titles, authors, and abstracts.

According to Reitz (2014), metadata is

"literally, 'data about data.' Structured information describing information resources/objects for a variety of purposes. Although AACR2/MARC cataloging is formally metadata, the term is generally used in the library community for nontraditional schemes such as the Dublin Core Metadata Element Set, the VRA Core Categories, and the Encoded Archival Description (EAD). Metadata has been categorized as descriptive, structural, and administrative".

There are three major categories of metadata:

Descriptive metadata is used for resource identification and discovery. It also facilitates indexing and selection. Information such as author, title, publisher, subject headings, and keywords are used for discovery. Common schemas include Dublin Core, MARC, MARXML, and MODS.

Skim!
Skim Best Practices for Descriptive Metadata from University of Illinois at Urbana-Champaign Library.

Structural metadata describes how the elements of an object are organized including the internal structure of information resources. For example, it might state how pages are ordered to form chapters in a book.

Skim!
Skim Best Practices for Structural Metadata from University of Illinois at Urbana-Champaign Library.

Administrative metadata provides information that assists in the management of resources. It may include file type, creation date, and software for creation. It might also include rights management, preservation information, and technical data describing the physical characteristics of a resource.

Skim!
Skim Best Practices for Administrative Metadata from University of Illinois at Urbana-Champaign Library.

Metadata becomes useful when both the contents and context of data files are described. For instance, metadata about a web page may include the tools used to create it, the language used it write it, and the links to similar pages.

Digital asset management systems and digital library content management systems have built in metadata creation tools. Examples include CONTENTdm and Omeka.

Below is an example of how metadata is entered into an Omeka digital collection. Click the image for a larger version.

small

Digital Objects

When you combine a digital file with metadata, you have a digital object. Standards have been established for different materials types. People are most familiar with cataloging books and other texts. However, metadata may be written into an audio, image, video, or other type of file. Photographic Metadata Standards govern metadata related to images.

Choosing Metadata Standards

The rest of this page will explore dozens of options for metadata standards. You'll want to apply those standards that make the most sense for your situation. Some general considerations include:

videoWatch!
Browse the Metadata Matters Webinar Series from The University of Illinois CARLI Project if you haven't taken the Metadata course. These videos provide valuable background on metadata.

readRead!
Read Best Practices for Shareable Metadata from the Digital Library Federation. It's essential that metadata be shareable.

readRead!
Read ONE of the following articles based on your interests and background:

Gill, Tony, Gilliland, Anne J., Whalen, Maureen, & Woodley, Mary S. (2008). Introduction to Metadata. Online Edition, Version 3.0. Available online. Also, available as PDF.

Hider, Philip (2013). Chapter 1. In, Information Resource Description: Creating and Managing Metadata. ALA Editions.

Schaffer, Jennifer (2015). The metadata is the interface: better description for discovery of archives and special collections synthesized rom user studies. In OCLC Research, Making Archival and Special Collections More Accessible, 85-97. Available online.

Warren, John W. (Summer 2015). Zen and the art of metadata maintenance. The Journal of Electronic Publishing, 18(3). Available online.

Metadata Levels and Services

In some cases, librarians will outsource metadata services. Or, librarians may need to select from a number of levels of service. For example, the Mountain West Digital Library offers four levels when they digitize a collection. Each level includes the lower level(s) of service if applicable:

readRead!
Read Gregory, Lisa & Williams, Stephanie (July/August 2014). One being a hub: some details behind providing metadata for the Digital Public Library of America. D-Lib Magazine, 20(7/8). Available online.

Visualizing Metadata

It's easy to get overwhelmed by all the standards.

try itTry It!
Go to Seeing Standards: A Visualization of the Metadata Universe by Jenn Riley. It's available as a PDF.

Go to The Metadata Standards Crosswalk by Patricia Harpring. It provides an excellent overview of the key elements for each standards.

Compare the two approaches to visualizing standards. Create your own chart to help you visualize all the elements as you work your way through the rest of this page.

Key Metadata Standards

This section discusses metadata standards used by digital libraries. Metadata structure standards include in MARC 21, Dublin Core, EAD and TEI. Metadata value standards into RDA which is used inconjunction with MARC 21 and Dublin Core for many digital projects.

MARC 21

You're probably most familiar with MARC, so we'll start here.

The Machine-Readable Cataloging (MARC) standards housed at the Library of Congress are a set of digital formats for the description of items. The current family of standards is known as MARC 21. In addition to the format for bibliographic records, this version includes formats for authority records, holdings records, classification schedules, and community information. According to Reitz (2014), Machine-Readable Cataloging (MARC) is

"an international standard digital format for the description of bibliographic items developed by the Library of Congress during the 1960s to facilitate the creation and dissemination of computerized cataloging from library to library within the same country and between countries... there are several versions of MARC in use in the world, the most predominant being MARC 21, created in 1999...

Widespread use of the MARC standard has helped libraries acquire predictable and reliable cataloging data, make use of commercially available library automation systems, share bibliographic resources, avoid duplication of effort, and ensure that bibliographic data will be compatible when one automation system is replaced by another".

The MARC record has three components:

According to Reitz (2014), "the MARC record is divided into fields, each containing one or more related elements of bibliographic description. A field is identified by a three-digit tag designating the nature of its content. Tags are organized as follows in hundreds, indicating a group of tags, with XX in the range of 00-99".

0XX fields - Control information, numbers, codes
1XX fields - Main entry
2XX fields - Titles, edition, imprint
3XX fields - Physical description, etc.
4XX fields - Series statements (as shown in item)
5XX fields - Notes
6XX fields - Subject added entries
7XX fields - Added entries other than subject or series
8XX fields - Series added entries (other authoritative forms)

MARC and Digital Libraries

MARC is the format used for library catalogs. The value standards for MARC are now RDA (Resource Description & Access) for most libraries. Online Public Access Catalogs (OPAC)s are about the only systems that use MARC data.

The OPAC is an important element of today's digital libraries. In most cases, they include traditional print materials, but also access to electronic materials including by physical items (e.g., books, DVDs) and digitally born items (e.g., ebooks, streaming video, audiobooks).

Below is an example of how MARC data is shown to end users in WorldCat. Click the image for a larger version.

marc

try itTry It!
Go to MARC 21 standards from the Library of Congress. Read the MARC Format Overview.

RDA: Resource Description & Access

Resource Description and Access (RDA) is "a set of content standards for cataloging materials held in libraries and other cultural institutions, RDA was developed over a six-year period to replace Anglo-American Cataloging Rules, 2nd edition, 2002 revision (AACR2). RDA was published in 2010 under the title RDA Toolkit by the American Library Association, the Canadian Library Association, and CILIP (UK)" (Reitz, 2014).

Users describe items in four areas:

RDA and Digital Libraries

RDA has replaced AACR2 for describing items. It's used along with MARC in OPACs.

try itTry It!
Go to Resource Description and Access (RDA) from the Library of Congress. Read the latest updates.
Review the Open Metadata Registry for RDA Vocabularies.
Browse RDA Record Examples from the Library of Congress.

Digital Library Spotlight
The Digital Media Repository at Ball State University has been working through the process of updating their online collections to be compliant with RDA. With more than 200 collections and 200,000 plus items across material types, this is a large project.

readRead!
Read Leigh, Katharine & Leigh, Richard N. (January 2015). Implementing RDA for digital libraries. Computers in Libraries, 35(1), 11-14.

RDA Resources

Dublin Core

"Dublin Core is the result of an international cross-disciplinary consensus achieved through the ongoing efforts of the Dublin Core Metadata Initiative (DCMI), aimed at providing a foundation for standardized bibliographic description of information resources available via the Internet. In 2007, the Dublin Core Metadata Element Set was published by the International Organization for Standardization.

The Dublin Core focuses on object description. Most librarians use some version of Dublin Core metadata. In some cases, users have tailored metadata to meet specific needs of projects or collections. However, it's important to have a common layer of descriptive data to make browsing and searching across all object types doable.

According to Reitz (2014), Dublin Core is

"a standard set of 15 inter-operable metadata elements designed to facilitate the description and recovery of document-like resources in a networked environment. The descriptive elements are:
Title (name given to the resource)
Creator (entity primarily responsible for making the content of the resource)
Subject (topic of the content of the resource, typically expressed as keywords, key phrases, or classification codes)
Description (abstract, table of contents, free-text account of the content, etc.)
Publisher (entity responsible for making the resource available)
Contributor (entity responsible for making contributions to the content of the resource)
Date (typically associated with the creation or availability of the resource)
Type (nature or genre of the content of the resource)
Format (physical or digital manifestation of the resource)
Identifier (an unambiguous reference to the resource within a given context, such as the URL, ISBN, ISSN, etc.)
Source (reference to a resource from which the present resource is derived)
Language (the language of the intellectual content of the resource)
Relation (reference to a related resource)
Coverage (extent or scope of the content of the resource)
Rights (information about rights held in and over the resource)"

No elements are required and all elements are repeatable. The content or value standards for Dublin Core aren't required, but some are recommended. Many digital libraries are using RDA including coverage, date, format, language, identifier, relation, source, subject, and type.

try itTry It!
Explore some of Dublin Core's pages that provide examples including the Using Dublin Core - The Elements and User Guide/Creating Metadata.

Read the Describing Your Materials page from North Carolina.

RDA and Digital Libraries

Dublin Core is commonly used system for digital collections and is built into many digital asset management systems such as CONTENTdm and Omeka. It is particularly effective for cross-collection searching and sharing. It's also good for cross-domain discovery. Metadata sharing is also easy. Dublin Core is particularly good for novice metadata creators and simple collections.

Dublin Core is the schema you're most likely to use when creating a digital collection.

Below is how Dublin Core metadata is shown to end users in a Contentdm digital image collection at the University of Utah. Click the image for a larger version.

small

try itTry It!
Go to the Dublin Core website. Browse information about the Dublin Core.

try itTry It!
Compare the metadata presented to users in different digital collections.
2013 Small Grains Report, University of Idaho Library
Afrique (Dunham) Dance Photo, New York Public Library
F.H. Dewitt & Co. Seed Catalog, New York Botanical Garden
Geronimo, Arizona Historical Society Library and Archives, Tucson
Strokes (audio recording), University of Utah
When They Re-discover America..., Duke Digital Collections

Encoded Archival Description (EAD)

According to Reitz (2014), the EAD Document Type Definition (DTD)

"is a non-proprietary standard for encoding in Standard Generalized Markup Language (SGML) or Extensible Markup Language (XML) the finding aids (registers, inventories, indexes, etc.) used in archives, libraries, museums, and other repositories of manuscripts and primary sources to facilitate use of their materials. EAD was developed in 1993 on the initiative of the UC Berkeley Library and is maintained by the Library of Congress, in partnership with the Society of American Archivists".

readRead!
Read Combs, Michele, Matienzo, Mark A., Proffitt, Merrilee, & Spiro, Lisa (2015). Over, under, around, and through: getting around barriers to EAD implementation. In OCLC Research, Making Archival and Special Collections More Accessible, 39-62. Available online.

"EAD is an international standard for encoding finding aids established to meet the needs of both end-users and archivists. EAD is represented in XML (Extensible Markup Language), a platform neutral data format that ensures data longevity when migrated from one software environment to another. EAD ensures the long-term viability of your data by encoding intellectual rather than only presentational data (HTML, for example, only accomplishes the latter). EAD can be produced from (or mapped to) a variety of formats, including relational databases, MARC, Dublin Core, HTML and others, which makes it an excellent format for porting data. In addition researchers can have a more robust interaction with EAD finding aids because EAD enables better searching and subsequent delivery from a single source document." (Combs, Matienzo, Proffitt, & Spiro, 2015, 41).

EAD and Digital Libraries

EAD is maintained by the Society for American Archivists. If you're working on a collaborative project with an archivist, it's likely that you may apply this standard. Since EAD can be produced from MARC and Dublin Core, it's a common format for porting data.

try itTry It!
Go to EAD (Encoded Archival Description). Library of Congress.

To learn more, go to Finding Aids. Library of Congress.

Text Encoding Initiative

According to Reitz (2014), the Text Encoding Initiative (TEI)

"is an international interdisciplinary standard intended to assist libraries, museums, publishers, and scholars in representing literary and linguistic texts in digital form to facilitate research and teaching. The encoding scheme is designed to maximize expressivity and minimize obsolescence. TEI began as a research project organized cooperatively by the Association for Computers and the Humanities, the Association for Computational Linguistics, and the Association for Literary and Linguistic Computing, funded by research grants from the National Endowment for the Humanities, the European Union, the Canadian Social Science Research Council, the Mellon Foundation, and others.

TEI and Digital Libraries

If you're going to be dealing with lots of text documents such as literary texts and letters, it's likely you'll be using the TEI standard. It's often used in digital humanities projects.

try itTry It!
Go to Text Encoding Initiative. Read the latest updates.


Metadata Standards for Digital Libraries

In addition to the key standards already discussion, many additional standards apply specifically to digital libraries and are maintained by the Network Development and MARC Standards Office of the Library of Congress. This section contains the standards you're likely to use, however there are many more than could apply depending on your situation.

XML: Extensible Markup Language

Extensible Markup Language (XML) is a simple, flexible text format used in the exchange of data, particularly on the Web.

An XML schema is a description of a type of XML document. This description includes the structure and content of these documents beyond the basics of XML.

MARCXML

The MARCXML standards provide a framework for working with MARC data in an XML environment. This connection between MARC and XML is important in the digital library environment because it's a means of facilitating the sharing and networking of bibliographic information. Developed by the Library of Congress, MARCXML is based on the MARC21 standards. MARCXML is used as an aggregation format. The goal is to provide a simple, flexible way to present data through XML stylesheets.

try itTry It!
Go to MARC 21 XML Schema from the Library of Congress. Browse information about METS.

ALTO

ALTO are technical metadata standards for Optical Character Recognition (OCR). Housed at the Library of Congress, these standards are important for those scanning texts.

try itTry It!
Go to ALTO from the Library of Congress. Browse information about ALTO schemas.

AudioMD and VideoMD

The audioMD and videoMD are XML schemas that detail metadata for audio and video-based digital objects. They are extension of the METS and PREMIS.

try itTry It!
Go to audioMD and videoMD from the Library of Congress. Browse information about these XML schemas.

METS

The Metadata Encoding and Transmission Standard (METS) standards housed at the Library of Congress are used for encoding descriptive, administrative, and structural metadata for digital objects a digital collection. The XML schema language is used.

According to Reitz (2014), the Metadata Encoding and Transmission Standard (METS) is

"an XML schema for encoding descriptive, structural, and administrative metadata for digital objects. METS can be used to facilitate the standardized exchange of digital objects between repositories, the development of common presentation utilities, and the archiving of digital objects. METS was developed by the Digital Library Federation and is maintained by the Library of Congress with the advice of the METS Editorial Board".

try itTry It!
Go to Metadata Encoding and Transmission Standard (METS) from the Library of Congress. Browse information about METS.

MIX

The MIX (NISO Metadata for Images in XML) standards housed at the Library of Congress are metadata for images in XML. This schema establishes a format for interchange of data specified in the Data Dictionary.

try itTry It!
Go to MIX from the Library of Congress. Browse information about MIX.

MODS

The Metadata Object Description Schema (MODS) standards housed at the Library of Congress are an XML schema for a bibliographic element set used in digital library settings. It is a derivative of the MARC 21, so it includes a subset of MARC fields, using language-based tags rather than numeric ones.

The Metadata Object Description Schema (MODS) is

"an XML schema developed by the Library of Congress for representing MARC-like semantics in the XML markup language. MODS can be used to carry selected data from MARC 21 records or for creating original resource description records according to a specification richer than Dublin Core but less complex than full MARC. MODS cannot be used for the conversion of MARC to XML without loss of data (MARCXML was designed for that purpose)".

try itTry It!
Go to Metadata Object Description Schema (MODS) from the Library of Congress. Browse the MODS User's Guidelines (Version 3).

Open Archives Initiative

The Open Archives Initiative (OAI) is "an organization funded by the Digital Library Federation, the Coalition for Networked Information, and the National Science Foundation to develop and promote interoperability standards as a means of facilitating the exchange of digital information content. Its program originated in the desire to advance scholarly communication by improving access to distributed repositories of e-prints, known as "archives." The main product of the OAI is a framework for harvesting and aggregating metadata from multiple repositories and a harvesting protocol known as the OAI Protocol for Metadata Harvesting (OAI-PMH)" (Reitz, 2014).

According to their website, the Open Archives Initiative "develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content." In other words, it's important that your digital objects can easily move from one platform to another.

Open Archives Initiative's Protocol for Metadata Harvesting (OAI-PMH) v2.0 is used by most projects. In general OAI support is implemented using OCLC's OAICat Open Source Software so that item records are available for harvesting.

try itTry It!
Go to Open Archives Initiative. Read about the OAI Protocol for Metadata Harvesting (OAI-PMH).

TextMD

TextMD (Technical Metadata for Text) is an XML schema that details technical metadata for text-based digital objects. It allows for properties such as encoding information, character information, languages, fonts, markup information, processing and textual notes, technical requirements for printing and viewing, and page ordering and sequencing.

try itTry It!
Go to TextMD. Explore the schema.

VRA Core

The VRA Core is a data standards and WML schema housed at the Library of Congress. They are a "a metadata element scheme developed by the Visual Resources Association for describing works of visual culture and images that document them, to facilitate the sharing of information among visual resources collections" (Reitz, 2014)

try itTry It!
Go to the VRA Core Categories support pages and the VRA Core website at the Library of Congress. Explore the latest updates.

PREMIS Data Dictionary for Preservation Metadata

The PREMIS Data Dictionary for Preservation Metadata is the international standard for metadata to support the preservation of digital objects and ensure their long-term usability. Developed by an international team of experts, PREMIS is implemented in digital preservation projects around the world, and support for PREMIS is incorporated into a number of commercial and open-source digital preservation tools and systems. 

try itTry It!
Go to PREMIS Data Dictionary for Preservation Metadata from the Library of Congress. Browse the standards.

Other Useful Standards and Resources

It's likely you'll encounter many more standards depending on the digital collection you're building. If you're dealing with works of art, you'll probably use CDWA: Categories for the Description of Works of Art. If you're working with video, you'll want to consider MPEG: Moving Picture Experts Group.

Controlled Vocabulary

Controlled vocabulary offer pre-selected words or phrases rather than presenting a free form natural language vocabulary. This approach reduces the likelihood of inaccurate results. According to Reitz (2014), controlled vocabulary is

"an established list of preferred terms from which a cataloger or indexer must select when assigning subject headings or descriptors in a bibliographic record, to indicate the content of the work in a library catalog, index, or bibliographic database. Synonyms are included as lead-in vocabulary, with instructions to see or USE the authorized heading. For example, if the authorized subject heading for works about dogs is "Dogs," then all items about dogs will be assigned the heading "Dogs," including a work titled All about Canines. A cross-reference to the heading "Dogs" will be made from the term "Canines" to ensure that anyone looking for information about dogs under "Canines" will be directed to the correct heading. Controlled vocabulary is usually listed alphabetically in a subject headings list or thesaurus of indexing terms. The process of creating and maintaining a list of preferred indexing terms is called vocabulary control."

Both general and discipline-specific thesauri are available to assist librarians in selecting subject headings an descriptors. Examples include

Below is an example from AAT: Art & Architecture Thesaurus Online. Click the image for a larger version.

small

Classification

From Library of Congress Classification (LCC) to the Dewey Decimal System (DDC), you learned about classification systems in your cataloging class. According to Reitz (2014), classification is "the process of dividing objects or concepts into logically hierarchical classes, subclasses, and sub-subclasses based on the characteristics they have in common and those that distinguish them."

While you probably connect classification with OPACs, classification systems are also used in many digital collections.

Other Metadata Structure Standards

Other Metdata Value Standards

Other useful documents associated with metadata from the Library of Congress:

Major Standards Organizations

readRead!
Read at least TWO of the following articles related to a specific aspect of metadata.

Dappert, Angela & Enders, Markus (2010). Digital Preservation Metadata Standards. Information Standards Quarterly, 22(2).

Gartner, Richard & Lavoie, Brian (2013). Technology Watch Report 13-3: Preservation Metadata (2nd edition). Digital Preservation Coalition.

Gregory, Lisa, Kenney, Kathleen, & Rudersdorf, Amy (2013). NC Preservation Metadata for Digital Objects.

Harpring, Patricia (2010). Introduction to Controlled Vocabularies: Terminology for Art, Architecture, and other Cultural Works. Online Edition. Getty Publications. Available online.

Lee, J. H., Clarke, R. I. and Perti, A. (2015). Empirical evaluation of metadata for video games and interactive media. Journal of the Association for Information Science and Technology. 

Lee, J., Tennis, J., Clarke, R., & Carpenter, M. (2013). Developing a video game metadata schema for the Seattle Interactive Media Museum. International Journal On Digital Libraries, 13(2), 105-117. 

Metadata Standards and Guidelines Relevant to Digital Audio (2011). ALCTS: PARS Task Force.

Nakasone, Sonoe & Sheffield, Carolyn (November/December 2013). Descriptive metadata for field books: methods and practices of the field book project. D-Lib Magazine, 19(11/12). Available online.

Subject Indexing

"(A subject index is) an alphabetically arranged list of headings selected by an indexer to represent the subject content of one or more works, with locators (usually page numbers) to direct the user to the corresponding text. Names are usually included in the subject index, but some publications have a separate name index and even a separate geographic index of place names. In some publications, the subject index is combined with the author index in a single alphabetic sequence." (Reitz, 2014).

Subject indexing is particularly important in digital collections. For instance, subject indexing is critical for the effective use of images. However, there are challenges in both creating and using these indexes for accessing images.

Pavel Rygiel (2012, 287) identified an issue with “subject indexing of architectural object images situated in the regions which in the past belonged to various countries.” She noted that paintings, drawings, engraving, and photographs are associated with the history of various places. It’s important to identify these locations for information retrieval, however historical events influence the geopolitical situation of the place where the objects are located. In other words, place names change.

Baracho and Cendon (2012) developed a classification scheme for identifying engineering drawing.

Digital libraries are effective at indexing and retrieval using descriptive terms. However, they may not be capturing the emotions associated with the collection. Kathryn Knautz (2012) studied the emotional description of films, images and music along with the implications for indexing.

Caroline Whippey (2012) investigated the non-textual information found in the game World of Warcraft. The auditory and visual aspect of game design have implications for designing more effective and interactive information search and retrieval systems.

 

Beyond the Basics

Beyond the basics, many new standards and guidelines are emerging that will increase the discoverability of digital objects. According to Seth van Hoolandand Ruben Verborgh, Ruben (2014, xiii),

"never before has so much of our global cultural heritage been at our fingertips. Yet as billions have been spent so far on digitization, both public and private, it still feels as though we are in the very earliest stages of what might be possible. Truly usable and intuitive interfaces notwithstanding, there is still much to do in terms of simple search and discovery tools across multiple collections."

The Semantic Web and Ontology

According to W3C website, the Semantic Web "provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries". The standards promote common data formats and exchange protocols, specifically the Resource Description Framework (RDF).

It's hoped that the Semantic Web will be able to integrate across content, applications, and systems. Unfortunately, the Semantic Web remains vast, vague, and uncertain. Although it faces challenges, it has a huge following that's likely to grow are more practical application emerge. Digital libraries are already using aspects of the approach including URI identifiers, XML syntax, and RDF data interchange. Increasingly, taxonomies such as OWL are being integrated.

According to W3C, "OWL 2 Web Ontology Language "is an ontology language for the Semantic Web with formally defined meaning. OWL 2 ontologies provide classes, properties, individuals, and data values and are stored as Semantic Web documents."

try itTry It!
Browse the W3C Data Activity Page: Building the Web of Data and W3C Semantic Web: SKOS (Simple Knowledge Organization System) page.

Semantic Enrichment

Semantic enrichment involves adding meaning to data through the addition of information. These additions may include information such as geographical coordinates and links to external resources. As library collections become more connected, these types of enriched data become more powerful. According to IBM (2015), semantic enrichment means

"enriching the content/context of data by tagging, categorizing, and/or classifying data in relationship to each other, to dictionaries, and/or other base reference sources. At its simplist, this means adding additional contextual information to some existing data set (think of adding traffic data to road maps where the traffic data provides context of road conditions, probability of delay, length of projected obstructions, condition of road, etc.)".

Semantic enrichment is useful across academic areas. However, the specifics of the approach may vary from discipline to discipline. Bontcheva and Kieniewicz (2015) note that

“environmental science is a broad, interdisciplinary subject area that spans biology, chemistry, earth sciences, physics, and engineering. Due to this breadth of subject scope, information discovery and sharing in environmental science is often a challenge… Linked Open Data (LOD), when coupled with semantic enrichment and search methods, offers an opportunity to improve the process of information discovery through enriching and contextualizing scientific publications with respect to unique, machine-readable, interlinked open vocabularies.”

Xu and Wang (2015) point out that semantic description and annotation are essential when working with digital images. They suggest combining semantic description with a domain thesaurus.

Linked Data

“Linked data is about making connections between related data using the semantic web. As libraries increasingly use Resource Description Framework (RDF), Uniform Resource Identifiers (URI), World Wide Web Consortium (W3C) standards, and other best practices in the management of data, researchers benefit from the ability to more easily discover data… Libraries have the opportunity to empower users by providing rich and deep content platforms with tools that facilitate discovery and analysis, which ultimately enables them to make information connections that contribute to the creation of new knowledge” (ACRL, 2015, 15).

Below is an example from WorldCat of linked data. Click the image for a larger version.

small

try itTry It!
Browse the Lined Data website.

Beyond Text-based Querying

Text-based querying remains that primary approach to digital collection access.

“One way in which the basic functionality of digital libraries has stalled is that text nearly always serves as the primary, and most often the only, basis for retrieval and analysis in conventional systems. Such text-based querying and retrieval does not meet all, or even all routine, use cases, and singular focus on text-based retrieval limits the types of questions researchers can imagine and pursue within the collections. Regardless of the questions researchers have, or the materials they seek to study, the standard methods for retrieving information of relevance in digital collections are text-based. Such collections are locked in to, and by, a model that positions text-based querying as normative and fail to imagine additional models of engagement. One way to open up digital collections to new types of questions and modes of discovery is through image analysis” (Lorang, Soh, Datla, Kulwicki, 2015).

Library researchers are working on ways to address the need for image analysis. For instance, Lorang, Soh, Datla, Kulwicki (July/August 2015) are part of the Image Analysis for Archival Discovery (Aida) project team at the University of Nebraska-Lincoln. They are making poetry published in newspapers more available for study through advanced work in digital image discovery.

Describing Science Data

In 2011, the Data Stewardship Committee of ESIP (Earth Science Information Partners) developed the “Provenance and Context Standard (PCCS)” to ensure that all content items contain consistent, comprehensive information. In addition, eight categories of content items were identified along with attributes that constitute a PCCS Matrix. These guidelines provide the foundation for development of standards in this area.

readRead!
Read THREE of the following articles focusing on emerging approaches to data and discovery.

Bendib, Issam, Laouar, Mohamed Ridda, Hacken, Richard, & Miles, Mathew (2014). Semantic ontologies for multimedia indexing (SOMI). In J. Chang, W. Zhang and I. Alon (eds.), Library Hi Tech, Volume 32 : Structuring the digital domain. Bradford, GBR: Emerald Insigh, 206-218. Available as an ebook through IUPUI.

Bontcheva, Kalina, Kieniewicz, Johanna, & Wallis, Michael (January/February 2015). Semantic enrichment and search: a case study on environmental science literature. D-Lib Magazine, 21(1/2). Available online.

Downs, Robert R., Duerr, Ruth, Hills, Denise J. & Ramapriyan, H.K. (July/August 2015). Data stewardship in earth sciences. D-Lib Magazine, 21(7/8).

Keller, Michael A. Persons, Jerry, Glaser, Hugh, & Calter, Mimi (October 2011). Linked Data for Libraries, Museums, and Archives: Survey and Workshop Report. Council on Library and Information Resources. Available online.

Kovács, Béla Lóránt & Takács, Margit (2014). New search method in digital library image collections: a theoretical inquiry. Journal of Librarianship and Information Science, 46(3), 217-225.

van Veen, Theo, Lonij, Juliette, & Koppelaar, Hanna (July/August 2015). Semantic enrichment: a low-barrier infrastructure and proposal for alignment. D-Lib Magazine, 21(7/8). Available online.

Yang, Seungwon & Farag, Mohamed Magdy Gharib (2014). Ontologies. In E. Fox, R. Torres, Digital Library Technologies: Complex Objects, Annotation, Ontologies, Classification, Extraction, and Security. Synthesis Lectures on Information Concepts, Retrieval, and Services, 63-88. Morgan & Claypool Publishers. Available as an ebook through IUPUI.

Xu, Lei & Wang, Xiaoguang (May/June 2015). Semantic description of cultural digital images: using a hierarchical model and controlled vocabulary. D-Lib Magazine, 21(5/6). Available online.

Metadata Application Profiles (MAPS)

With all the different standards, it can be easy to get overwhelmed. To maintain sanity, it's essential to create a set of guidelines you and your staff can apply when creating metadata for items in your collection. A Metadata Application Profile (MAP) provides a detailed description of metadata elements that will be applied in a particular digital collection.

Below is an example of a MAP. Click the image below for a larger size. To see the entire MAP, go to Digital Public Library of America (MAP, Version 4).

small map

readRead!
Read Deciding on Fields to be used in ICO Collections for an example. Notice the sections on field selection, vocabulary sources, field usage, and special fields.

Try It!
Read Miller, Steven J. (2011). Examples of Metadata Functionality, Application Profiles, and Records. Metadata for Digital Collections: A How-To-Do-It Manual. ALA Neal-Schuman. Now, explore some digital libraries and collections searching for additional examples.

To create a metadata application profile, you need a clear understanding of the standards as well as a handle on the types of digital objects that you will be including in your collection.

readRead!
Read Introduction to the DPLA Metadata Model. This report provides an overview to the standards that the DPLA uses to generate metadata.

Digital Library Spotlight
According to the Digital Public Library of America (DPLA) website, "the DPLA Metadata Application Profile (MAP) is the basis for how metadata is structured and validated in DPLA, and guides how metadata is stored, serialized, and made available". Download Version 4 of the MAP.

Examples

try itTry It!
Read Mountain West Digital Library Dublin Core Application Profile (July 20, 2011).
Read Dublin Core Metadata Guide: Indiana Memory Project (February 8, 2007).
Notice how both digital libraries are using Dublin Core. Compare these to with some of the other examples above.

try itTry It!
Explore the Data Dictionaries from the University of Washington Libraries. They contain schemas and Metadata Application Profiles (MAPS) for their projects and collections.

Building a Metadata Application Profile (MAP)

Use the following steps to create your own MAP for a digital collection you're building.

  1. User Needs. Think about the functional requirements of your collection. Ask yourself:
    • What do users need to know about the items in the collection (e.g., creator, date created, significance)?
    • How much detail do users need about each item?
    • What will users do with this collection?
    • How will users access information (e.g., search, browse, view)?
    • What will users be seeking in this collection?
    • What aspects of the collection will users be most interested in exploring?
    • How important is it that users can distinguish items from each other?
    • How important is it that users can link items together or connect items?
    • What other features will be included (e.g., link items together, link pages together, filter by...)
  2. The Items. Once you've identified the scope of your proposed digital collection, select a dozen representative items to explore indepth. Examine your representative items and think about how these items can be described as well as the relationship among items.
    • What kind of information do you need to describe these items?
    • What are the material types?
    • What are the range of creators and publishers?
    • What are the items "about"?
    • What are the properties of each material type?
  3. Metadata Elements. Identify the elements you will need to define this metadata scheme. Brainstorm the elements (e.g., creator) you will include, an definition, explanation, description, or note about that element (e.g., the individual or entity primarily responsible for creating the content of the item), examples (e.g., Johnson, Larry; CNN), and implementation (e.g., mandatory, optional, repeatable). Create a spreadsheet using the following columns (e.g., element, notes, examples, implementation).
  4. Values. Determine the requirements for element values and add two additional columns to the spreadsheet value control (e.g., yes, no) and control mechanism (e.g., LC Name Authority)
    • Will the value be controlled? (e.g., yes, no, how)
    • How will it be controlled? (e.g., guidelines, vocabulary list, rules)
  5. Crosswalks. Identify the element sets that will be used such as Dublin Core (e.g., dc:language). Create columns for the various source schema. Also create a column for degree of mapping (e.g., exactMatch, broadMatch, narrowMatch, closeMatch).
  6. Element Set Specifications. Create complete specifications for the metadata application profile.
  7. Guidelines. Prepare a set of guidelines for use of the MAP. Include element-by-element usage guidelines, examples of terms, examples of linked data, spreadsheet template, and flowcharts.

Adapted from Metadataetc.: Marcia Le Zeng & Jian Qin (2008)

Example Dublin Core Elements and Refinements

For more information about each term, go to Dublin Core Metadata Element Set, Version 1.1.

Digital Library Spotlight
The University of Washington Libraries have created an outstanding set of Metadata Guidelines for Collections using CONTENTdm. Their document contains decisions about metadata, formatting data, setting up field properties, and much more. They also provide links to very specific examples. Be sure to check out their examples.

Keep in mind that not all libraries maintain a MAP. However, many libraries have a user guide that includes much of this information.

Digital Library Spotlight
The Bracero History Project provides a User Guide to assist volunteers and administrators working on the digital collections. It includes a section on metadata.

Metadata Best Practices

Use the following resources to get a better sense for how digital libraries are applying metadata best practices.

The following four documents are used by the Mountain West Digital Library for metadata.

Examples

The Real World

“The creation of metadata for research and repository content is an essential part of the scholarly communication process and is necessary for the long-term access and preservation of our digital (and digitized) heritage. Metadata choices and practices affect the findability of resources in the online environment, and these choices, influenced by the content itself, also reflect the institutions, stakeholders, and users of specific repositories. Content in repositories may be one-of-a-kind, with academic libraries creating digital repositories to house and make available the campus's unique intellectual capital… Other institutions such as the American Museum of Natural History and the New York Public Library have also chosen to make curated digitized information freely available on the open Web” (Moulaison, Dyka & Gallant, 2015).

In 2014, the University of Missouri conducted a study of metadata practices of OpenDOAR repositories. They found that most repositories are applying library standards including Dublin Core, MODS, MARC, and LCSH.

readRead!
Read Moulaison, Heather Lea, Dyka, Felicity, & Gallant, Kristen (March/April 2015). OpenDOAR repositories and metadata practices. D-Lib Magazine, 21(3/4). Available online.

In the "real world", libraries are sometimes confronted with unique challenges related to representation and organization of information. Increasingly, librarians seek out a combination of approaches to make project work.

"Academic libraries find themselves confronted with the challenge of adding digital preservation activities to their ongoing services while continuing to provide access to digital collections through digital asset management systems (DAMS). The Marriott Library, which uses CONTENTdm as its DAMS, recently implemented Ex Libris Rosetta digital preservation system. In order to accommodate unrelated preservation and access systems, the Library was forced to revise its workflow for processing digital content. One major element of this revision was the Marriott Library's development of a platform-agnostic tool which would help relevant departments manage the digitization workflow, input and edit descriptive metadata, and package digital content for ingestion into disparate systems" (Neatrour and others, 2014).

In the "real world", you'll need to apply the skills you have and learn new skills as you go. Much will depend on your individual situation. Here are some possibilities.

Resources

ACRL (March 2015). Environmental Scan 2015. Available online.

Baracho, Renata Maria Abrantes & Cendon, Beatriz Valadares (2012). An image based retrieval system for engineering drawings. In D.R. Neal, Knowledge and Information: Indexing and Retrieval of Non-Text Information. Water de Gruyter. Available as an ebook through IUPUI.

Combs, Michele, Matienzo, Mark A., Proffitt, Merrilee, & Spiro, Lisa (2015). Over, under, around, and through: getting around barriers to EAD implementation. In OCLC Research, Making Archival and Special Collections More Accessible, 39-62. Available online.

Gartner, Richard & Lavoie, Brian (2013). Technology Watch Report 13-3: Preservation Metadata (2nd edition). Digital Preservation Coalition.

Gill, Tony, Gilliland, Anne J., Whalen, Maureen, & Woodley, Mary S. (2008). Introduction to Metadata. Online Edition, Version 3.0. Available online. Also, available as PDF.

Gregory, Lisa & Williams, Stephanie (July/August 2014). One being a hub: some details behind providing metadata for the Digital Public Library of America. D-Lib Magazine, 20(7/8). Available online.

Hider, Philip (2013). Information Resource Description: Creating and Managing Metadata. ALA Editions.

IBM (2015). Using semantic enrichment to enhance big data solutions.

Keller, Michael A. Persons, Jerry, Glaser, Hugh, & Calter, Mimi (October 2011). Linked Data for Libraries, Museums, and Archives: Survey and Workshop Report. Council on Library and Information Resources. Available online.

Knautz, Kathrin (2012). Emotion felt and depicted: consequences for multimedia retrieval. In D.R. Neal, Knowledge and Information: Indexing and Retrieval of Non-Text Information. Water de Gruyter. Available as an ebook through IUPUI.

Kovács, Béla Lóránt & Takács, Margit (2014). New search method in digital library image collections: a theoretical inquiry. Journal of Librarianship and Information Science, 46(3), 217-225.

Leigh, Katharine & Leigh, Richard N. (January 2015). Implementing RDA for digital libraries. Computers in Libraries, 35(1), 11-14.

Lorang, Elizabeth, Soh, Leen-Kiat, Datla, Maanas Varma, Kulwicki, Spencer (July/August 2015). Developing an image-based classier for detecting poetic content in historic newspaper collections. D-Lib Magazine, 21(7/8). Available online.

Miller, Steven J. (2011). Metadata for Digital Collections (How-to-Do-It Manual). New York: NealSchuman.

Moulaison, Heather Lea, Dyka, Felicity, & Gallant, Kristen (March/April 2015). OpenDOAR repositories and metadata practices. D-Lib Magazine, 21(3/4). Available online.

Nakasone, Sonoe & Sheffield, Carolyn (November/December 2013). Descriptive metadata for field books: methods and practices of the field book project. D-Lib Magazine, 19(11/12). Available online.

Neatrour, Anna, Brunsvik, Matt, Buckner, Sean, McBride, Brian, & Myntti, Jeremy (July/August 2014). The SIMP tool: facilitating digital library, metadata, and preservation workflow at the University of Utah’s J. Willard Marriott Library. D-Lib Magazine, 20(7/8). Available online.

Phillips, Jennifer (2013). Learning about metadata. In, J. Monson, LITA Guide: Jump-Start Your Career as a Digital Librarian, 127-144. American Library Association. Available as an ebook through IUPUI.

Reitz, Joan M. (2014). Online Dictionary for Library and Information Science. Libraries Unlimited. Available: http://www.abc-clio.com/ODLIS/odlis_a.aspx.

Rygiel, Pawel (2012). Subject indexing of images: Architectural objects with complicated history. In D.R. Neal, Knowledge and Information: Indexing and Retrieval of Non-Text Information. Water de Gruyter. Available through IUPUI.

Schaffer, Jennifer (2015). The metadata is the interface: better description for discovery of archives and special collections synthesized rom user studies. In OCLC Research, Making Archival and Special Collections More Accessible, 85-97. Available online.

Southwick, Silvia B. & Skoric, Jane (2013). Metadata into Practice. In, J. Monson, LITA Guide: Jump-Start Your Career as a Digital Librarian, 127-144. American Library Association. Available as an ebook through IUPUI.

van Hooland, Seth & Verborgh, Ruben (2014). Linked Data for Libraries, Archives, and Museum: How to Clean, Link and Publish Your Metadata. ALA Editions.

van Veen, Theo, Lonij, Juliette, & Koppelaar, Hanna (July/August 2015). Semantic enrichment: a low-barrier infrastructure and proposal for alignment. D-Lib Magazine, 21(7/8). Available online.

Warren, John W. (Summer 2015). Zen and the art of metadata maintenance. The Journal of Electronic Publishing, 18(3). Available online.

Westbrook, R. Niccole, Johnson, Dan, Carter, Karen, & Lockwood, Angela (May/June 2012). Metadata clean sweep: a digital library audit project. D-Lib Magazine, 18(5/6). Available online.

Whippey, Caroline (2012). Non-textual information in gaming: A case study of World of Warcraft. In D.R. Neal, Knowledge and Information: Indexing and Retrieval of Non-Text Information. Water de Gruyter. Available as an ebook through IUPUI.

Xu, Lei & Wang, Xiaoguang (May/June 2015). Semantic description of cultural digital images: using a hierarchical model and controlled vocabulary. D-Lib Magazine, 21(5/6). Available online.

Yang, Seungwon & Farag, Mohamed Magdy Gharib (2014). Ontologies. In E. Fox, R. Torres, Digital Library Technologies: Complex Objects, Annotation, Ontologies, Classification, Extraction, and Security. Synthesis Lectures on Information Concepts, Retrieval, and Services, 63-88. Morgan & Claypool Publishers. Available as an ebook through IUPUI.

Zeng, Marcia Lei & Qin, Jian (2015). Metadata, Second Edition. ALA Neal-Schuman.


| eduscapes | IUPUI Online Courses | About Us | Contact Us | © 2015-2016 Annette Lamb

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.