(This post originally appeared on the PEGI Project.)
At the PEGI Mini-Forum held in conjunction with the Depository Library Council meeting in Arlington, Virginia, in mid-October 2017, we asked some of our attendees to explore the concept of risk as it pertains to long-term access. What types of government information are most at risk? What factors affect the collection and preservation of these materials? Participants, most of whom were government information librarians, acknowledged that some of the factors for born-digital content are quite similar to print and other formats, while others reflect the fact that our existing preservation infrastructure was developed to manage tangible and discrete materials. While there is a complex network of issues to consider as we work toward development of a national agenda, the discussion below focuses on three vectors, or congruent categories, for risk of information loss.
One vector for risk is format. While preservation best practices are developed and improved for almost all kinds of materials held in stewardship by cultural heritage institutions, these best practices do not often filter down to a heterogeneous collection such as federal information. Audiovisual content, both in digitally-disseminated formats and in tangible formats like DVDs and VHS tapes, are susceptible to the same threats of obsolescence, digital rights restrictions, and physical damage as any content in that format. However, they may be included in collections for which mitigating these risks is not the primary focus. This content has also been harder to identify and add to government information collections since it is not always disseminated following the same practices as text-based content.
Another example of format-related risk is official content made public as a presentation. Government entities may post slides in Microsoft Powerpoint format, which does not meet standards for digital preservation, or they may post presentations using an external service. When the content is converted into a PDF, it may change the layout, graphical appearance, and even remove textual element. And again, this content can easily be missed when collectors focus on more traditional formats such as documents and reports.
These risk factors make it all the more necessary that government information sources be identified as in need of preservation in collaboration with projects seeking to collect and preserve content by format type. For example, the Internet Archive’s United States Government Films collection fills an important gap for both tangible and digital formats, even though it does not fully overlap with traditional library collections for this area.
A second notable vector is the extent to which content, once collected, can be organized in ways that make it discoverable and useful using means and practices that are primarily adapted to print dissemination practices. When a report is disseminated as a multi-part PDF, for example, additional information may be needed to identify the parts, their relationships, and their relative versioning. This “collectibility” factor affects all forms of content, but when met with format-related risk, the result is amplified.
Content is also difficult to collect when there is discontinuity between the ways in which the content had been provided in the past and how it is provided now. For example, when an publication was initially provided as a serial or journal, it is structured with sequential markers such as volumes and issues. When that same content is subsequently developed for and made available on a website that is updated at irregular intervals and changed rather than added on to, the connections between the versions may not be clear, and the way it had been described in the past may no longer be applicable.
Adding descriptive metadata and repurposing existing embedded metadata helps to bridge discontinuity and address these access issues. Research, experimentation, and development undertaken by the UNT Libraries Digital Projects Unit and its many partners demonstrates an important model for finding solutions to these challenges.
A third identified risk vector is the commitment of cultural heritage institutions to collecting and preserving different kinds of content. This risk can take many forms: some institutions may determine that their core users have little to no likely future need for this content, or do not recognize or acknowledge the possibility that their core users may someday want to access materials that are currently available online. In these cases, no action is taken to collect content. Alternately, the need may be acknowledged, but due to resource constraints (which can include budget, personnel, and infrastructure) no actions are taken.
For institutions that do collect born-digital government information, some content is inherently of greater interest with respect to existing collections, known user needs, and topics of local concern. Existing print preservation programs demonstrate that certain topics and agencies are attractive for libraries to commit to retain and preserve; it must be presumed that others are less so, and consequently more endangered for the long term. Similarly, while focused collection building and deliberate preservation strategies are essential to guarantee long-term access to digital content, much of what is published will only be caught in broad-net collection projects, if at all.
Because of this, it is important to recognize that both precise, item-by-item, collection building, and content capture at scale, have important roles in a broad, coordinated, and participatory strategy for preservation. To its credit, the Government Publishing Office has adopted these, with the Cataloging & Indexing program processing born-digital content at the title and item level, while the Web Archiving program collects snapshots of websites.
We hope this overview will provoke new conversations on how to categorize risk, and identify ways to address it, while recognizing that existing initiatives are engaging with these challenges in ways that others can learn from and adopt.
Written on February 8th, 2018 by Shari Laster