By Martin Klein, Scientist in the Research Library at Los Alamos National Laboratory (LANL), Shawn M. Jones, Ph.D. student and Graduate Research Assistant at LANL, Herbert Van de Sompel, Chief Innovation Officer at Data Archiving and Network Services (DANS), and Michael L. Nelson, Professor in the Computer Science Department at Old Dominion University (ODU).
Links on the web break all the time. We frequently experience the infamous “404 – Page not found” message, also known as “a broken link” or “link rot.” Sometimes we follow a link and discover that the linked page has significantly changed and its content no longer represents what was originally referenced, a scenario known as “content drift.” Both link rot and content drift are forms of “reference rot”, a significant detriment to our web experience. In the realm of scholarly communication where we increasingly reference web resources such as blog posts, source code, videos, social media posts, datasets, etc. in our manuscripts, we recognize that we are losing our scholarly record to reference rot.
Robust Links background
As part of The Andrew W. Mellon Foundation funded Hiberlink project, the Prototyping team of the Los Alamos National Laboratory’s Research Library together with colleagues from Edina and the Language Technology Group of the University of Edinburgh developed the Robust Links concept a few years ago to address the problem. Given the renewed interest in the digital preservation community, we have now collaborated with colleagues from DANS and the Web Science and Digital Libraries Research Group at Old Dominion University on a service that makes creating Robust Links straightforward. To create a Robust Link, we need to:
- Create an archival snapshot (memento) of the link URL and
- Robustify the link in our web page by adding a couple of attributes to the link.
Robust Links creation
The first step can be done by submitting a URL to a proactive web archiving service such as the Internet Archive’s “Save Page Now”, Perma.cc, or archive.today. The second step guarantees that the link retains the original URL, the URL of the archived snapshot (memento), and the datetime of linking. We detail this step in the Robust Links specification. With both done, we truly have robust links with multiple fallback options. If the original link on the live web is subject to reference rot, readers can access the memento from the web archive. If the memento itself is unavailable, for example, because the web archive is temporarily out of service, we can use the original URL and the datetime of linking to locate another suitable memento in a different web archive. The Memento protocol and infrastructure provides a federated search that seamlessly enables this sort of lookup.
Robust Links sustainability
- we moved the source files into the IIPC GitHub repository so they can be maintained (and versioned) by the community and served with the correct mime type via GitHub Pages and
The other sustainability issue relates to the Memento infrastructure to automatically access mementos across web archives (2nd fallback mentioned above). Here we continue our path in that LANL and ODU, both IIPC member organizations, maintain the Memento infrastructure.
Acknowledgements and feedback
Lastly, we would like to thank DataCite for granting two DOIs to the IIPC for this effort at no cost. We are also grateful to ODU’s Karen Vaughan for her help minting the DOIs.
For feedback/comments/questions, please do not hesitate and get in touch (martinklein0815[at]gmail.com)!