Step 5: Tracking and recording metadata in the field

Tracking and recording metadata in the field

As you collect primary data in the field, you should also create your metadata. The best practice is to create the metadata about each file at the same time that you create the file; this is the most reliable way to ensure that the metadata gets documented. However, in practice, this is not always possible. A next-best practice is to create metadata on a daily basis, ideally before you back-up the data so that the metadata will be backed-up too. Regardless of whether you follow the best practice or the next-best practice, the metadata needs to be created as soon as possible, and you should make metadata creation a part of your regular workflow during your period of data collection. (See Figure 41 below for a review of types of metadata).

Figure 41:

Infographic summarizing descriptive, structural, technical, rights, and preservation metadata.

Metadata Tracking Tools

Your archive may provide a tool for recording metadata that replicates the metadata fields it requires and presents information about its controlled vocabularies. If it does provide such a tool, you should use it to record your descriptive metadata or at least study it enough to know how the metadata elements that you have already collected will map to the metadata elements used by the archive. If your archive does not provide such a tool, you would benefit from contacting a representative of the archive to find out what metadata tool they recommend or what metadata schema the archive uses.

Figure 42:

Shows screenshot of the CDMI Maker metadata tool, which was previously used by ELAR and collected such metadata as Session, project, content, actors, and resources data.

Though many metadata management software applications are available (just google “metadata management” to see for yourself), few of these programs were designed with language documentation in mind. Two exceptions are the recently released (April 30, 2020) beta software lameta (https://sites.google.com/site/metadatatooldiscussion/) and the SIL International program SayMore (https://software.sil.org/saymore/), first released in 2010. Lameta, which is available for both Mac OS and Windows, is based on Saymore, which is only available for Windows 7, 8 and 10. Both lameta and SayMore allow you to enter and edit metadata records and then link the metadata records to specific media files; you can also batch the files into folders that can then be exported, along with the relevant metadata, as an archival package that you can submit directly to your chosen archive. Where SayMore has other functions in addition to metadata tracking (audio recording, segmenting and transcription/translation), the sole purpose of lameta is to record descriptive, technical, structural and rights metadata. The decision to limit lameta to just the metadata collection and linking tasks was based on the overwhelming preferences of language documenters to use other existing tools for recording, transcription and translation tasks. Basic instructions for how to use lameta can be found on the lameta webpage (linked above) and in the blog post announcing its release, (see ELAR Archive (2020) in “For Further Reading” below). For information about how to use SayMore, see its website (linked above), as well as Hatton (2013) and Moeller (2014) in “For Further Reading” below.

Figure 43:

Logo for the Metadata Editor for Transparent Archiving, or lameta.

In lieu of using metadata tracking software, many people use database software or spreadsheets to track metadata by creating a metadata template based on the metadata schema used by their chosen archive. Teams who have access to the internet and need to collaborate in real time might consider using AirTable (https://airtable.com/) or the Open Science Framework (https://osf.io/).

Some archives, such as the Archive of the Indigenous Languages of Latin America (AILLA), the Kaipuleohone Language Archive, and the Native American Languages Collection at the Sam Noble Museum of Natural History, have detailed metadata spreadsheets that are specific to the metadata schemas used by those archives. AILLA has even created a series of videos explaining how to complete each sheet or tab of its metadata spreadsheet; the playlist is available on AILLA’s YouTube channel (see the “Helpful Links” section under the “For Further Reading” page).

Anyone interested in metadata collection using mobile devices should read Richard Griscom’s (2020, see “For Further Reading”) description of repurposing the existing Open Data Kit to be used for two language documentation projects in Tanzania. According to Griscom, some benefits of using the Open Data Kits include

Scalability - the app can be used by various team members working independently and simultaneously;
Mobile technology - the app can be used in areas with no electricity or internet connection;
Open source software - the app can be adapted according to the metadata needs of each project.

However, some drawbacks include the following:

Open text fields will almost always contain errors that must be found and corrected;
Version control for metadata forms must be carefully managed; and
Some coding skills are required for creating a metadata output that archives will accept.

Whatever tool you use to track your metadata, keep these pointers in mind:

The best time to track and document metadata is as soon as possible after you create a new recording, file, annotation, etc.
Whatever methodology you use, use it diligently!
Remember that the data will be almost worthless if there is no accompanying metadata.

Of the types of metadata discussed in Step 4, the ones that you need to be the most concerned about during data collection are the descriptive and rights-related metadata. Each of these is discussed below.

Descriptive metadata

For each recording session, you should record as much descriptive information about the context of the recordings as possible. Of particular interest and relevance are:

The participants in the data/file creation. If it is a recording, who is seen/heard in it and who is making it? Is anyone else involved? What is their role? Note everyone involved in the process.
Date and location. Where and when is a recording being made?
Languages that are spoken in each recording. This is especially important in a multilingual environment.

The context and content of the file. Why is the recording being made?

Figure 44:

Shows a paper on a clipboard with a checklist reading "research and media languages, research participants, recording location, date of creation, media type, research content".

Keeping track of the purpose of each recording can help you organize your materials later and explain their contents to future archive users. Was a recording made to learn more about a particular cultural practice or aspect of local history? Was the recording session designed to elicit particular words or grammatical structures, or was the session designed to produce a particular kind of text such as a procedural text or a travel narrative? This information can help to fill out required metadata fields and open-ended descriptions.

In the process of collecting the descriptive metadata, you should also try to decide on meaningful names for your recording sessions that could serve as titles for the language events in your archived collections. Try to put these titles in the indigenous language whenever possible, and make every effort to determine native terms for various genres of language events as well (e.g., prayers, speeches, etc). Some archives allow you to put descriptive metadata in the Indigenous language as well as the interface language of the archive, so record as much of the metadata as possible in the Indigenous language.

A backup method for descriptive metadata creation is to include spoken (or signed) metadata headers at the beginning of each audio or video recording, or even title cards in videos. Metadata headers are useful in the event that your metadata files become corrupted or lost, and are especially useful for language data collections whose languages are unlikely to be understood by archivists.

Tracking this descriptive metadata as you create language documentation materials will help you record this information in a detailed way while it is fresh in your mind; it will add only minutes of additional work for each recording event, and it gives you more opportunities to answer any questions about your research that come up. This information should be written down or entered into your metadata tool (be it a database, a spreadsheet, or a notebook) for every recording.

Rights Metadata

While most language documenters are aware that they need to get informed consent from potential research participants before they engage in the data collection, many language documenters do not put appropriate thought into the details of the resulting intellectual property. They might not realize that research participants should be viewed as co-authors or co-creators of the data. However, this is becoming less of an issue as language documentation projects become more collaborative (more on this below).

As mentioned in Step 4, some of the “data” that results from language documentation is protected by copyright law, but some of it is not. For example, words in a language are not protected by copyright, but how words are arranged and presented in a dictionary is. While traditional knowledge genres such as mythology, cosmology and folklore are not protected by copyright, a particular performance of or a published version of a folktale can be protected. Though copyright laws are similar in all countries that have signed the Berne Convention (see “For Further Reading”), the details of the laws nevertheless vary between countries. However, all Berne Convention participants (see Figure 45) require some amount of originality in order for copyright to take effect. You should be especially aware that copyright law for databases varies greatly from country to country. While the organization of a database is protected by copyright almost everywhere, the contents of the database usually are not, though this is not necessarily the case in the European Union. See Creative Commons (2018, 2019, 2020), Morris, Waters & Burt (2020), and the US Copyright Office (no date) in “For Further Reading” to learn more about copyright and the Creative Commons licenses.

Figure 45:

World map demonstrating which countries have signed the Berne Convention, which is almost all but a few.

Most language archives have an All Rights Reserved approach to the data, meaning that all rights holders retain their copyrights to the archived materials. When you deposit data into an archive, you give the archive, and in some cases the archive's users, a non-exclusive license to reproduce, share and display the materials. Additionally, many data repositories allow depositors to choose an open license to apply to the data, which facilitates its downstream use (see Step 4 for a discussion on license types); when an open license is applied to a recording such as a spoken performances of a traditional story, that license makes it clear to an archives user how they are allowed to use that recording in their own work. Before you begin collecting data, make sure that you know the policy of your chosen archive, and make a plan for how you will explain the licensing requirements to your collaborators in the Indigenous Community.

Before you start making recordings or written texts, you and your collaborators need to have a conversation about the copyright to the data and you need come to an agreement about how you will publish and share the data. If your funder requires you to put all or most of the data into an archive, you must be up front about this, and get the necessary permission from your data co-creators. One way to do this is to request and document permission for publishing data in an archive at the same time that you get informed consent for participation in the project. Alternatively, you might want to draft a general rights statement to share with your collaborators and modify it as necessary according to the context and the wishes of each participant.

More and more often, communities that are involved in language documentation projects will elect to draft a Memorandum of Understanding (MoU) with any outside researchers who wish to participate in language documentation in that community. The MoU will clearly describe the collaboration, the roles and tasks of all parties, the output or deliverables that will result from the collaboration, who will own the copyright to the output, and what different participants are allowed to do with that output. For an example of an MoU made between a research unit and an Aboriginal Community, see Thom (2006); for guides to creating an MoU, see Mirza, Currier & Ossom Williamson (2016) and Ossom Williamson, Currier & Mirza (2016) in “For Further Reading” below.

Figure 46:

Screenshot of a metadata record showing rights and access statements.

Finally, remember that rights metadata also includes any restrictions or limitations on access to the data. Most language archives have some form of graded access (Johnson 2004, Seyfeddinipur et al. 2019) that will allow some data files to be publicly accessible to archive users while keeping others restricted to just certain users or circumstances. It is important not to assume that all graded access systems work the same way because they do not. Make sure that you understand the graded access system used by your chosen archive so that you can discuss access options with your collaborators. We will discuss access considerations more in Step 6.

Complete and Continue

Discussion

Archiving for the Future