By Friedel Geeraert, Researcher on the PROMISE project at the Royal Library of Belgium
It all began in 2016 when the State Archives and KBR (the Royal Library of Belgium) decided to join forces and set up a joint web archiving project at the federal level in Belgium. Belgium is, sadly, one of the few European countries without a national web archive. Together with the universities of Ghent and Namur and the university college Bruxelles-Brabant they set themselves the task to develop a federal strategy for the preservation of the Belgian web. Funding was secured via the BRAIN.be programme of the Belgian Science Policy Office and in July 2017 the PROMISE project (Preserving Online Multiple Information: towards a Belgian strategy) kicked off.
One of the strengths of the PROMISE project is the interdisciplinarity of the research team. The State Archives and KBR provide expertise in collection curation and information and documentation management while the University of Namur (Research Centre in Information, Law and Society) provide the legal expertise. The University of Ghent (Research Group for Media, Innovation and Communication Technologies; Ghent Centre for Digital Humanities) and the University college Bruxelles-Brabant (HE2B) collaborated on the technical aspects of the project. The former also worked on analysing the user requirements for web archives. This approach not only ensured the necessary expertise but also led to cross-fertilisation between the different research domains.
Our objectives and how we learned from others
The project team worked on four main objectives:
- Identify best practices in the field of web-archiving
- Develop a strategy for archiving the Belgian web
- Set up a pilot project for the archiving of the Belgian web and providing access to these collections
- Make recommendations for the implementation of a sustainable web archiving service
More than two years onwards, a lot has happened within the project. To achieve the first objective, the research team did an extensive literature review of web archiving practices. This was supplemented by in-depth interviews with representatives of 13 web archiving institutions in Europe and Canada. Operational, technical and legal aspects were covered in these interviews and it was a very instructive phase for all researchers involved. The research results were published in the International Journal of Digital Humanities.
Inspired by the first phase, a strategy was outlined by KBR and the State Archives that covers the entire web archiving workflow. The legal analysis done within the project also informed both institutions about what they are legally allowed or required to do. Another important source of input were the results of a survey on user requirements since it is the intention of KBR and the State Archives to focus on the user when developing a functional web archive.
The strategy also included elaborate cost calculations based on different scenarios that were linked to different selection strategies: limited selective collections only, elaborate selective collections in combination with a limited broad crawl and elaborate selective collections in combination with an extensive broad crawl. A list of tasks and necessary infrastructure was drafted for each of these scenarios, spanning the different functions of the OAIS-model with the addition of the functions selection and capture. An estimation was made of the time needed to accomplish each task per job profile involved in the task. The total number of hours was then multiplied by an average wage per profile to come to a total cost for each scenario. The purpose of this exercise was to allow the board of directors of State Archives and KBR to make informed decisions about which web archiving strategy is preferable and financially viable.
Selection and metadata
The third research phase consisted of a number of elements: creating seed lists for selective collections in accordance with the collection development policies of KBR and the State Archives, creating descriptive metadata based on a recent study by the OCLC, doing a pilot broad crawl based on a sample of 10.000 and 100.000 domain names, capturing these collections and providing access to these collections. The prototype for access is in its final stages of development after which we aim to evaluate the entire pilot project.
The project was completed at the end of December 2019 and the PROMISE project team is now working on making recommendations for the implementation of a sustainable web archiving service including legal considerations concerning access to web archives, operational procedures, a business model and technical and functional requirements for web archiving tools.
So how promising is the future of the Belgian web archive? As is the case with many new endeavours, structural financing plays a key role. It is the intention of KBR and the State Archives to approach the political level in Belgium and make a convincing case for the necessity of a Belgian web archive. During the concluding colloquium ‘Saving the web: the promise of a Belgian web archive’ that was held on 18 October 2019, Niels Brügger, Valérie Schafer and many others shared inspiring ideas with the PROMISE project team that can be used to make a very strong case. It is the sincere hope of both institutions that the results of the PROMISE project will live on in a sustainable web archive at the federal level in Belgium.
The end of the project also induces reflection. Over the course of the project, the team had the pleasure of being introduced to the (inter)national web archiving community, for which the IIPC and RESAW provide very important platforms. We feel that we owe a lot to the exchanges we had with other web archiving professionals and researchers and we would like to thank you all for the inspiration you have given us over the years and look forward to many exchanges to come.
- PROMISE project
- ‘Saving the web: the promise of a Belgian web archive’: , KBR, 18 October, 2019
- Vlassenroot, E., Chambers, S., Di Pretoro, E. et al. Web archives as a data resource for digital scholars. Int J Digit Humanities 1, 85–111 (2019)
- Vlassenroot, E., PReserving Online Multiple Information: towards a Belgian strategy (results of a survey on user requirements), 2018
- Geeraert, F., Soyez, S. The first steps towards a Belgian web archive: a federal strategy. IIPC WAC 2019, Zagreb