Step 6: Quality, uniqueness, and sensitivity

Initial evaluation: Quality, uniqueness, & sensitivity

Shortly after creating your materials (and if you are doing fieldwork, ideally before you leave your fieldsite) you should review your materials to flag items that should either be left out of your collection or handled with special attention. Three things to look for are low-quality materials, (near) duplicates, and sensitive materials.

Appraisal is necessary for two key reasons: (1) there are real costs associated with data archiving, so you might not be able to afford to archive every single file you create, and (2) archiving is publishing, so you might not want to archive every file you create. 

Archiving digital materials is not free, which is why many digital repositories limit the total size of each collection, charge depositors a fee, or both. Every file that is archived will take up space on storage media, and archives (or their host institutions) must purchase or subscribe to more servers, cloud storage or LTO tape as the size of the digital collections that they store and preserve grows. All types of storage media have finite lifespans, and replacements must be purchased as media get worn or as newer technologies emerge. Media contents must be migrated from one carrier to another, which requires personnel, time, and equipment to perform. As a result, most digital archives are not very interested in preserving materials that are of low-quality, that are duplicated (or nearly duplicated) within the collection, or that must remain permanently restricted. 

In addition to reducing the archive’s financial and technical burden, weeding out these materials from your collection will help you to produce a curated collection that is worthy of publication (Woodbury 2014). When you submit files to any sort of digital repository, you are essentially publishing the contents of those files (Johnson 2004). You should put the same care and effort into selecting materials to put into an archive that you would in selecting words, ideas, and arguments for an article or book that you plan to publish. Moreover, weeding out sensitive materials that should not be shared as well as low-quality or duplicated materials will have the added benefit of making your archived collection easier to navigate and use in the future because users will not have to sort through many duplicated items, unlistenable recordings, and metadata records for files that they are not allowed to access. 

Low-quality materials 

Many good recordings have been ruined by clucking chickens, rain on a metal roof, dwindling batteries, or poorly placed microphones. Audio files with lots of intrusive noise can be difficult to listen to and are poor candidates for reuse, especially if the noise cannot be removed by filtering. The advantage of appraising these files as soon as possible, and especially while you are still collecting data, is that if you determine that the audio quality of your recording is too poor to archive but that the content is unique and important, then you will still have an opportunity to record again. You can either re-record in a different location or time of day, or else ask someone to respeak the recording as they listen to it to get a cleaner recording that could be more easily understood. However, if you are not able to re-record, then you might have no other choice but to archive that low-quality recording. 

Susan Kung tells of a recording she made of two men working together in a carpentry shop. While discussing how to saw pieces of wood to create the shapes they needed, they used many different numeral classifiers that Kung never heard used in any other context. The recording is quite difficult to listen to due to the sawing and hammering that the men were doing, but Kung did not have the opportunity to record a respeaking (this was not a common methodological practice in 2001). Thus, this recording is an example of a low-quality recording with high-value content. In this case, Kung chose to archive the recording, along with a warning in the descriptive metadata (see Kung, Vigueras Huerta & Vigueras Patricio 2001). 

Sometimes only part of a recording will be low quality. For example, a video of live taping of a radio broadcast could add quite a bit of information about the context of the audio recording, especially if the camera image captures the recording setting and if there are multiple participants in the broadcast. Much less useful would be a tight closeup of a single broadcaster whose mouth is obscured by the radio microphone. In cases like these, and especially if space is a concern, the audio recording could be separated from the video file and (if wanted) a still image from a frame of the video could be included with the audio file.

Another thing to consider when evaluating your materials is whether or not they are relevant to your collection or project. While some amount of candid slice-of-life photographs, ambient sound recordings, or landscape photos do add value to a collection, it is important to not add so many of these that they become a significant part of your collection. In the medley of photographs of cute animals in Figure 47 below, the photographs on the top row are arguably more relevant to include in the materials that you will submit to your chosen digital repository because they also feature other aspects of community life (from left to right, a clay pot, a woodshed, and the side of a house). However, the relevance of photos on the bottom is less obvious; their inclusion in a language documentation collection will need to be justified, for example, if the photos accompany a recorded narrative about Canela and Pinta, the two dogs appearing in the bottom row of photos. It is a good practice to separate your photos on a regular basis into those that you plan to archive and those that are part of your own personal memorabilia.

Figure 47:

Shows six photos of dogs and cats taken in the field.

Near duplicates

After a point, having multiple copies of the same thing brings diminishing returns. While a few different recordings of a single narrative or wordlist can be beneficial for acoustic analysis or grammatical study, multiple photographs of the same subject generally are not very useful (e.g., Figure 48). Pick the best photograph of your subject and identify it in your metadata as the one to deposit in the archive.

Figure 48:

Two images of a man bending down to pick up sticks which are nearly identical.

Sensitive materials

You should check to make sure that the materials you are collecting are in fact suitable for depositing into an online digital archive and appropriate to share with the worldwide public via the archive. You should get permission to archive recordings before you record, and confirm afterwards with the people being recorded to make sure they are comfortable with you archiving what they’ve just recorded. During the transcription and translation process, you should also be vigilant for any materials that should not be part of your online collection. It is possible that some sensitive or private information may be included in the midst of talk that otherwise is fine to be made publically available. If you learn any materials are sensitive, you should also determine if there are conditions under which they may circulate, if any. This kind of work should be done continually throughout your data collection so that you may have the chance to clarify any questions that crop up. Keep in mind that every culture has different practices for handling and accessing information, knowledge, music, art, religion, etc.; what might seem like ordinary information to you, might be sensitive to your collaborators. For a much more detailed discussion of materials that might be culturally sensitive, please see the Protocols for Native American Archival Materials (see Figure 49). 

Figure 49:

Screenshot from the Protocols for Native American Archival materials, which says the related policy and legal topics.

As you learned in Step 1, most digital repositories have technical ways to restrict access to files, e.g., graded access, granular access or time embargoes. Make sure that you understand how access restrictions can be done in your chosen repository so that you can accurately explain the options to your collaborators. However, if particular materials are so sensitive that they will require complete restriction or special protocols for access, you should consider carefully if those files should even be placed in a digital repository that is outside the jurisdiction or control of the speech community. While some DELAMAN archives and other digital repositories are able and willing to collaborate directly with a community organization to ensure that the community controls all access to their cultural heritage recordings, many repositories are not able to provide this extra level of service. Furthermore, rather than relying on a digital archive to restrict access to materials, you should consider instead if it is even appropriate to remove sensitive materials from the community or deposit them in a digital archive. To quote Kim Christen, “[...] not all objects are ripe for inspections, documentation, and use” (2018, p. 407), and the same applies to recorded Traditional Knowledge. Finally, Indigenous members of language documentation projects might want to consider applying Traditional Knowledge Labels to some or all of the materials that are to be archived. The TK Labels are a way to indicate the specific cultural protocols that should be applied to any digital cultural heritage materials. These labels should be applied only in consultation and collaboration with members of the Speech Community.

Complete and Continue  
Discussion

0 comments