Step 3: Enduring, open, and supported file formats

Introductory video: Sustainable File Types

Selecting enduring, open, and supported file formats

The mission of data repositories is to preserve materials so that they can be accessed in the future. However, even if a digital file is faithfully preserved, it is still possible that the future user may not be able to open or read the file if its format is no longer supported by existing software. 

Figure 21:

Image of a cuneiform tablet from the 26th century BC.

When preparing materials for long-term preservation, it is important to make sure that the files will be in formats that future users will be able to read and use. Preserving files that are not in enduring, open and supported formats runs the risk of turning those files into something like digital cuneiform tablets. Cuneiform tablets (shown in Figure 21 above) are thousands of years old and the marks on them can still be clearly seen today, yet very few people can even tell what language is written on them, let alone read them (Watkins & Snyder 2003). Even though a file’s bitstream (all the ones and zeros) can be preserved, there can still be changes over time in the rest of the world that make it hard to recover the information within.

Enduring file formats

The files that you preserve should be in enduring formats that will be useful well into the future. While it is hard to say which formats will be accessible in the future, open formats are more likely to be accessible than proprietary formats, and formats supported by your archive are more likely to be accessible in the future than formats your archive does not support. 

Open formats

With respect to digital files, open formats are free from restrictions placed upon proprietary formats, and are more likely to be accessible in the future. Many of the most widely used programs to make files and media are proprietary software distributed by for-profit businesses. The creators of proprietary software have control over its use, modification, and distribution since it is their intellectual property. Part of the proprietor’s intellectual property rights is the right to keep other people from using their formats or to charge fees for using their formats. The proprietor can also choose to stop supporting a format or the applications that read and open it. This means that a file created in a proprietary format may only be accessible to people who have purchased the necessary software, and even then only if the necessary software is still available and compatible with the hardware and operating systems of the future. Since open formats do not have the same kinds of restrictions, it is more likely that applications to read open formats will be available in the future as they can be developed or modified by third parties and are not reliant on the proprietor.

Figure 22:

Image shows a man deciding between two choices: proprietary formats, represented by a red devil, and open formats, represented by an angel.

Some proprietary formats include files produced by the Microsoft Office suite (Figure 23 below), such as DOCX documents or XLSX spreadsheets. If you have files in proprietary formats that you would like to archive, you should consider exporting their contents to an open format. Documents can be converted to plain TXT text files or archival PDF files (also called PDF/A, more on this document type below); spreadsheets can be saved as tab-separated (TSV) or comma-separated (CSV) text files that can be imported into any spreadsheet program. 

Figure 23:

Image showing the icons of various software within the Microsoft Office umbrella.

Of course, some functionality of the proprietary formats may be lost, but often an alternative can be found to preserve the information within. For example, the slideshow presentation application PowerPoint allows users to write text in a box of “Presenter’s notes” that is visible when editing the slide deck or on the presenter’s (but not the audience’s) screen during a presentation. One way to make a PPTX file suitable for preservation is to export it as an archival PDF file. This method, however, does not preserve speaker notes, so other methods would have to be explored to preserve meaningful content in the speaker notes. For example, a handout with the slide image and speaker notes could be produced and saved as an archival PDF file, or the text of the speaker notes could be extracted from the slides and saved in a TXT file that can be archived alongside the PDF/A of the full slide images. Note that a single file in an unsupported or proprietary format can correspond to multiple archival files—this is especially true of Excel workbooks. Only one sheet of an XLSX file can be exported as a CSV or TSV file, creating one file for each sheet of the workbook.

Figure 24:

Summarizes the benefits of open formats, including that software code is publicly accessible and that they are less likely to become obsolete.

Supported Formats and Archives

While considering which enduring and open file formats would work for your data, you should check with your archive to see which enduring and open formats they support. Technical and structural limitations may mean that a repository may not support some kinds of files, even if they are open and enduring. For example, some repository software with media viewers that allow users to view or stream media within the browser itself may function only with certain kinds of media files and not others.

Figure 25:

Image shows an Apple computer with popup messages saying that a presentation cannot be opened with the application Keynote because both the presentation and the application are too old

The more file formats an archive supports, the more complex and expensive the processes involved with repository maintenance and long-term digital preservation will be. Thus, it is both pragmatic and cost-effective for a digital repository to limit the number of file formats it can manage. Furthermore, different digital repositories use different tools and software, which means that one archive might not support the same file formats as another archive. Thus, it is important to determine which file formats are supported by your chosen archive.

Complete and Continue  
Discussion

0 comments