Step 3: Text and image formats

Media Formats

We will now take a closer look at the primary media file formats that are found in digital language repositories: text-based media including archival PDF, images, audio, and video. 


Section header image that says "Text Formats" on the left and "XML Formats" on the right.

Text formats

Text documents are meant to be read. This includes both documents intended to be read by people, such as books, articles, and word processing documents, as well as files meant to be read by machines such as tagged documents (HTML or XML) and code. Many human-readable text documents are created through proprietary software such as the Microsoft Office suite. These files are not suitable for long-term preservation, but many applications are capable of saving files in enduring formats that are likely to be supported well into the future. Perhaps the most widely supported such format, and one that can preserve a great deal of formatting and layout, is the archival PDF document, PDF/A. Depending on the kind of text, other kinds of less structured (and potentially more reusable) formats may be suitable, such as plain text (TXT) or XML formats (pictured in Figure 26 below).

Figure 26:

This XML file includes both English and Spanish translations for a resource description.

Header image that says "Archival PDF / A Format"

Archival PDF documents

Some document formats may contain within them types of media that may be unsupported, but can be converted to a format that will be supported. This is the case with PDF files. These documents are intended to display a document as it can be printed on a page, but can also contain some features that are difficult to sustain, such as special fonts, transparent layers, embedded audio and video media, etc. Before sending PDF files to your archive, you should attempt to convert them to the archival PDF/A format. This specification only allows those features of PDF documents whose continued support has been agreed to, so an archival PDF/A document will be readable in the future whereas other kinds of PDF file may not be. When saving files as PDF/A, you may discover some unsupported features in your documents. Some features not supported in PDF/A documents include transparent objects and embedded audio and video files. While you run the risk of losing some features when you convert a document into a PDF/A file, these are the very features that would be problematic for the long-term preservation and future use of that file. When you open a PDF file in Adobe Acrobat (the free reader or the professional version), you will know it is a PDF/A if there is a blue banner across the top with the message “This file claims compliance with the PDF/A standard and has been opened read-only to prevent modification” (see Figure 27 below). You will not see this message if you are using the Preview app on a Mac computer.

Figure 27:

Example file with message saying the file has been opened read only and would need to have editing enabled in order to edit the document.

Header image that says "Image formats".

Image formats

Digital image files are either raster images or vector images. Images produced by cameras or scanners will always be raster images, whereas files produced by image editing programs could either be raster images or vector images.

Commonly supported raster image formats for digital archives include TIFF, JPEG, and PNG formats. Some archives may also support additional formats such as JPEG 2000, and images may also be embedded in PDF/A files. There are not great differences between these formats in terms of image quality and preservation, and any one of these supported images would be suitable for preservation. For detailed information on compression and digital images, see the FADGI digital image guidelines (Figure 28).

Figure 28:

Shows the FADGI star rating system, from 1 stars to 4 stars.

Vector images do not specify information pixel by pixel, but by mathematically describing lines and curves. Vector images, unlike raster images, are scalable and do not result in pixelated images when magnified. Vector images are commonly made with proprietary software such as Adobe Illustrator (AI). Proprietary vector image formats should be either converted into a supported enduring raster image format (at an appropriate image size and resolution, the amount of information in a given unit of a file) or else converted into an open vector image format such as SVG.

Complete and Continue  
Discussion

0 comments