Step 2: Seven filenaming tips

Tip 1: Be Brief

Filenames with more than 25 characters are problematic for some scripts or programs. Instead of creating a long filename, abbreviate the name to a shorter but still understandable version. While many people like to put all kinds of information into a file’s name, such as a topic, date, location, language, speaker’s name, researcher’s name, and theme, this kind of contextual and descriptive information is better stored either in a metadata log, within the contents of the files themselves, or both.

EasternChatino-Cieneguilla-MargaritaJ-20200212-spinning_thread.mov

EasternChatino-Cieneguilla-MargaritaJ-20200212-spinning_thread-es_Pablo.wav

EasternChatino-Cieneguilla-MargaritaJ-20200212-spinning_thread-es_Pablo.eaf

EasternChatino-Cieneguilla-MargaritaJ-20200212-spinning_thread-es_Pablo-en_Josephine.txt

In the example above, someone recorded Margarita Jiménez spinning cotton into thread on February 12, 2020 in the town of Cieneguilla as part of a project documenting the Eastern Chatino language, and made the MOV file as a result. Later, Pablo watched the video and recorded himself translating Margarita’s Chatino words into Spanish, making the WAV file. Pablo then transcribed his recording using ELAN, making the EAF file. Finally, Josephine translated Pablo’s Spanish transcription into English, making the TXT file. Their filenames store lots of information about the history of each file, but 

  • This information is better stored in a metadata register (and recorded or written in the files themselves if possible);
  • The filenames are very long and can cause problems for technical processes; and
  • personally-identifiable information is written out plainly.

Another way this project’s files could have been named is 

ctp-cien-20200212-thread.mov

ctp-cien-20200212-thread-es.wav

ctp-cien-20200212-thread-es.eaf

ctp-cien-20200212-thread-en.txt

Here the language being documented is represented by a three-letter code ctp, the recording location is marked by an abbreviation of its name, cien, the date is kept in its eight-digit form, a one-word label for the main topic of the recording is given, and the language of the translation is given at the end with a two-letter code, es or en. The names of participants and contributors to the materials are not stored in the filenames, but are recorded in the project’s metadata register, saving space and keeping personally-identifiable information out of filenames. Many parts of even these filenames could be omitted if they are not relevant to the project. For example, the recording location, topic, and target language labels could be considered irrelevant for this project, resulting in even more compact filenames like ctp-20200212.txt. Anticipating what kinds of files your project will produce will help you know what kinds of information to include in filenames to ensure that no names are duplicated. If you know that your project will translate a collection of stories into two languages, it would be useful to include the target language in filenames (e.g. ctp-20200212-en.txt, ctp-20200212-fr.txt, and so on). If multiple people will be creating files for your project in multiple locations, it could be useful to have some location or recorder information recorded in the filename. If keeping track of participants is necessary (for example, if multiple people participate in an experiment), you should consider referring to them in filenames with a code. Speakers and signers could be given a short alphanumeric code that is added to their recordings’ filenames. Beyond shortening the length of filenames, this also keeps personally identifiable information out of a part of the collection that will be very public and visible: in most digital archives, all filenames are visible even if access to the files themselves is restricted. If a participant decides that they would like their participation to be anonymous after the initial deposit, it may be possible to remove their names from catalog records, but it could be much more difficult (or impossible) to remove their name from the filename of an archived file.

Tip 2: Pay attention to characters

Use only letters, numbers, dashes, and underscores in the filenames. Do not use any special characters like parentheses, and avoid using characters with diacritics, like accent marks and tildes. 

Most operating systems have restrictions on how a file or a folder can be named. In Windows systems, for example, a slash cannot appear in a file or folder name since that character has a special function (being used to separate different levels of structure in file paths). However, not all operating systems, file systems, and programming languages used to process files share all of the same restrictions. A file or folder can be given a name in one system that is not allowed in other systems. This is especially common when files are transferred between UNIX systems and Windows systems. Some special characters to avoid using in filenames are * . " / \ [ ] : ; | ,.

While many file systems will allow characters with diacritics marks, care should be taken since not all systems and applications will correctly handle characters with diacritics, and some complex letters which look the same to a human reader will be read differently by computers. For example, the letter â can either be written as a single character (Latin small letter A with circumflex) or as a sequence of two characters (Latin small letter A plus combining circumflex accent). In Table 1 below, note how the machine-readable UTF-8 hex code for the two ways to write â look very different.

Table 1:

Table that shows that there are two hex codes for what seems to be the same letter "a" with a circumflex on top

Tip 3: Do not use spaces

Spaces might interrupt or stop programs or scripts that the digital archive needs to run on the files. Instead of using spaces in your filenames, use dashes, underscores, or CamelCase (capitalizing the first letter of each word) instead. While spaces are allowed in the names of files and folders in many operating systems, handling filenames that contain spaces requires special treatment in many programming languages, and processing files with spaces in their names may cause some digital archival tools to fail or not function properly. If you need to separate parts of a file or folder’s name, you should use hyphens, underscores, or CamelCase, as shown in Table 2 below. 

Table 2:

Table showing ways of writing a filename without using spaces: using hyphens, underscores, or Camel Case.

Tip 4: Do not use periods

Another commonly used, but sometimes problematic character is the period. Computer scripts may recognize the period as the symbol that separates the filename from the file extension, which indicates the file type. Use a period only to set off the file extension from the rest of the filename.

Tip 5: Follow the international archival standard for dates

The international archival standard format for dates (ISO 8601) gives the year using four digits, followed by two digits for the month, followed by two digits for the day. For example, May 23rd, 1974 would be written as 1974-05-23 or 19740523 with or without dashes, respectively. There are many ways a date could be represented in a filename, as seen in Table 3. Using a standardized format like YYYY-MM-DD or YYYYMMDD reduces ambiguity when reviewing materials later (was cta-20180503.wav recorded on the 3rd of May or the 5th of March?) and allows for materials to be sorted by date when sorted alphanumerically by filename.

Table 3:

Table shows how there are multiple ways of representing a date, which may cause confusion.

Tip 6: Use leading zeros

To maintain the correct order of numbered files, remember to use leading zeros in the filenames. For example, 25 files in a series in which all other descriptors are the same will need to be numbered sequentially using leading zeros for the single-digit numbers (i.e., 01, 02 … 09). This will prevent files numbered in teens from being sorted before the file numbered two. See Figures 19 and 20 for what this looks like.

Figure 19:

Image shows how not putting a leading zero in the file name "tee dash carnaval 2" places it behind "tee dash carnaval 14".

Figure 20:

Image shows how putting a leading zero in the file name "tee dash carnaval 02" places it before "tee dash carnaval 14".

Tip 7: Practice versioning

For multiple versions of the same file, do not use terms like "final". Instead, distinguish the different versions with the letter "v" for version and then the version number. And don't forget the leading zero!

You should now have a better understanding of the importance of establishing a filenaming system that will work for your project as well as an idea of how to create that system. If you already have data files that do not follow the guidance and tips given here, perhaps you can modify your existing system. However, we do not advocate changing your filenames unless you have a compelling reason to do so (e.g., your existing filenames have spaces or problematic characters). Before you make any changes to your existing filenames, make sure that your data files are backed up on at least two different media types (e.g., external hard drive and cloud storage) to avoid accidental data loss. Always document all changes that you make in a way that will allow you to recover, if necessary, any information that you change or lose.

Complete and Continue  
Discussion

0 comments