Architecturally, the Time Travel portal operates in a manner similar to a distributed search. Hence, it faces challenges related to query routing, response time optimization, and response freshness. The new infrastructure includes some rule-based mechanisms for intelligent routing but a thorough solution is being investigated in the IIPC-funded Web Archive Profiling project. A background cache continuously fetches TimeMap information from distributed archives, both natively or by-proxy compliant with the Memento protocol. Its collection consists of a seed list of popular URIs augmented with URIs requested by Memento clients. Whenever possible, responses are delivered from a front-end cache that remains in sync with the background cache using the ResourceSync protocol. If a request can not be delivered from cache, because cached content is unavailable or stale, realtime TimeGate requests are sent to Memento-compliant archives only. This setup achieves a satisfactory balance between response times, response completeness, and response freshness. If needed, the front-end cache can be bypassed and a realtime query can explicitly be initiated using the regular browser refresh approach, e.g. Shift-Reload in Chrome.
The development of the Time Travel portal was also strongly motivated by the desire to lower the barrier for developing Memento related functionality, especially at the browser side. Memento protocol information is – appropriately – communicated in HTTP headers. However, browser-side scripts typically do not have header access. Hence, we wanted to bring Memento capabilities within the realm of browser-side development. To that end, we introduced several RESTful APIs:
- An API that provides a redirect to a Memento via a Wayback-style URI, e.g. http://timetravel.mementoweb.org/memento/20081128/http://apple.com. This redirect will be to the Memento that is temporally closest to the conveyed date-time, and that is held by one of the systems covered by the Memento Aggregator.
- An API that provides a JSON description of a Memento via a Wayback-style URI, e.g. http://timetravel.mementoweb.org/api/json/20081128/http://apple.com. This description contains information about the Memento that is temporally closest to the conveyed datetime, but also of duplicates thereof (if any exist) as well as of the previous/next/first/last Mementos. Again, responses are for Mementos in all systems covered by the Memento Aggregator.
- Access to two different TimeMaps, both available in application/link-format and JSON serialization. One TimeMap, which can be promptly delivered, merely provides pointers to potential TimeMaps in various systems covered by the Memento Aggregator, leaving the work of collecting actually information about Mementos to the client. The other TimeMap is completely assembled by our service. Depending on the state of the cache, it may take a while to deliver a response to the client.
- A machine-actionable Archive Registry that lists systems covered by the Memento Aggregator and their properties, e.g. whether they are natively or by-proxy compliant with the Memento protocol.
We are thrilled by the continuous growth in the usage of these APIs and would be interested to learn which kind of applications people out there are building on top of our infrastructure. We know that the new version of the Mink browser extension uses the new APIs. Also, the Time Travel’s Reconstruct service, based on pywb, leverages our own APIs. Memento for Chrome now obtains its list of archives from the Archive Registry. Also, the Robust Links approach to combat reference rot is based on API calls, but that will be the subject of another blog post.
IIPC members that operate public web archives that are not yet Memento compliant are reminded Open Wayback and pywb natively support Memento. From the perspective of the Time Travel portal, compliance means that we don’t have to operate a Memento proxy, that archive holdings can be included in realtime queries, and that both Original URIs and Memento URIs can be used to Find/Reconstruct. From a broader perspective, it means that the archive becomes a building block in a global, interoperable infrastructure that provides a time dimension to the web.