Step 4: Planning for metadata collection

Introductory video: AILLA Metadata

Planning for metadata collection and following your repository’s metadata schema

Types of metadata

Metadata, which is commonly described as data about data, is the information about your materials and files that helps to make them easier to organize, find, and reuse. Metadata falls into several different categories: descriptive, structural, technical, rights, and preservation.

Descriptive metadata is contextual information about the materials that explains what they are, provides contextual details about their production and helps users locate and reuse them.
Structural metadata is information about where an object is located in a sequence, hierarchy or file structure.
Technical metadata is information about the form, size, and specifications of digital objects.
Rights metadata includes information about an object’s copyright status, the copyright holder, and any relevant licenses.

Preservation metadata includes information about the preservation status of a digital file that is used to assure the data contained in the file has not been corrupted or lost.

Figure 35:

Grid that explains the 5 different types of metadata: descriptive, structural, technical, rights, and preservation.

Much technical, structural and preservation metadata will (or can) be created by archivists or archival software, but the descriptive metadata and the rights metadata are areas where you—the collector, compiler, and/or depositor—can make the biggest contributions to your collection’s metadata.

Descriptive metadata provides contextual information about the data you collect. The more descriptive metadata that accompanies a file, the more useful that file will be for research and reuse. Descriptive metadata about a video recording, for example, will include:

The name(s) of the person or people being recorded;
The name of the person doing the recording;
The date the recording was made;
The location where the recording was made;
The language(s) being spoken or signed;
The topic or subject of the recording.

None of this information is necessarily inherent to the recording, so it must be documented at the time the recording is made or soon afterwards. Without this information, the recording will be less findable and useful in the future, possibly even for the researcher or documenter who recorded it.

Included in the descriptive metadata should be any information that explains relationships between files. For example, if a set of digital photographs of woven designs in fabric are meant to accompany a PDF document that describes weaving techniques, this information should be included in the descriptive metadata about these objects. Note that such relationships between files are also relevant structural metadata.

Rights metadata, also known as rights management metadata, provides information about a data file such as the copyright status, licensing agreements, and restrictions or limitations on access, sharing and use. Some types of research data are automatically protected by copyright, while others are not, and every country has its own copyright laws. When you deposit data in an online data repository, you are essentially publishing the data in the country where that archive is located. Furthermore, when you deposit data into a digital repository, you are licensing that archive to do certain things with the files, such as distributing, copying, and displaying them. Many digital repositories allow you to draft your own rights statement, which is a statement about who owns the data and how it may be shared and used, or to pick a license to apply to your data. In the latter case, the available license choices will vary between repositories, but some frequently used possibilities include the following:

Creative Commons licenses for any work to which copyright would normally apply;
Open Data Commons licenses for databases and code;
Local Context Traditional Knowledge licenses to be used by or in collaboration with Indigenous Peoples or Communities; and
GNU licenses for software.

If you do not pick a license to apply to the data, then traditional copyright, which varies slightly from country to country, will be the default. Some repositories have a standard license that is applied systematically to everything in the repository. Check with your chosen repository to find out its requirements for rights metadata.

Finally, rights metadata also includes any restrictions or limitations to access of the data. Some digital repositories will let you restrict access to your data files for a certain amount of time (called an embargo); other repositories are completely open access, meaning that anyone can access, download, and reuse the data. Finally, some archives have the ability to apply traditional protocols for accessing the data; such protocols should be determined in collaboration with language and culture experts from the represented community.

Metadata schema

All digital archives utilize at least one metadata schema, which is a standardized system of organizing metadata for cataloging purposes. There are many different metadata schemas, and each one has its own predetermined set of metadata elements (or fields, such as author, title, date created, location, etc.) and rules for organizing those elements. Some commonly used metadata schemas in digital archives that specialize in language documentation materials include Dublin Core Metadata Initiative (DCMI, or simply DC), Component MetaData Infrastructure (CMDI), Open Language Archives Community Metadata (OLAC), and Metadata Object Description Schema (MODS), to name just a few. Many digital archives use only one or two specific metadata schemes, while other digital repositories, especially data repositories, may allow you to choose a particular metadata schema from a pre-programmed list that their repository software can support. While all archives use one or more specific metadata schemas, they might not utilize all of the elements that are represented in that schema. Finding out which metadata schema(s) are used by your chosen archive, including the specific metadata elements for each schema, will help you determine what metadata elements you should collect as you create your collection of data and files.

Controlled Vocabularies

Many metadata elements will utilize controlled vocabularies specific to that field, and these controlled vocabularies will vary from archive to archive. A controlled vocabulary is a fixed set of terms used to describe a given element of metadata such that each term indexes a particular concept. While the term controlled vocabulary may be new to you, if you have ever used a drop-down menu on a website form, then you are already familiar with the concept; that field can only be filled with one of a fixed set of responses. Archives use controlled vocabularies to index the records so that they can be searched and retrieved. In digital language archives, controlled vocabularies are used to describe the genres of language use, how different speech or research participants contributed to a collection, the relationships between files, or recording frequencies for audio recordings, to name just a few.

Figure 36:

Chart that shows a controlled vocabulary from the Data Documentation Initiative, organized by Name, Title, Description, File Type, and Version.

A special type of controlled vocabulary is an authority file, a kind of master list for specific persons, places, or things that is used to identify one specific person, place or thing and disambiguate it from others with the same or similar names. Archives typically use authority files to identify specific geographic locations, languages, and sometimes people. The most important authority file with respect to archiving language data is, without a doubt, the language authority file, which we discuss in the next subsection on “Language Metadata.”

You should make sure to review your archive’s controlled vocabulary and authority files before you begin to collect your data so that you know how to distinguish between the terms you will need to use, or learn how to map labels you already use (or are used by the community) onto the archive’s categories.

Complete and Continue

Discussion

Archiving for the Future