COVID-19: Collecting so that we don’t forget

by Martine Renaud, Librarian, Bibliothèque et Archives nationales du Québec [1]

The COVID-19 pandemic has dominated the news for months because of its sheer scale and its impact on our economy and social life as well as our health. How will it be remembered in a few years? The Spanish flu epidemic of 1918-1919 is sometimes described as the forgotten pandemic[2]. This time, how can we make sure nothing is forgotten? Preserving the memory of this turbulent and exceptional time is crucially important for tomorrow’s researchers.

Capturing the Web

The Web and social media are playing a key role in the pandemic. They enable the instant spread of information (as well as fake news), provide a space for exchange and communication in a context of social distancing. BAnQ has been collecting Québec websites on a selective basis since 2009. The result of this harvesting is largely available on the BAnQ portal. Sites for which BAnQ has not gotten permission are preserved, but not made available. They can be accessed for research purposes.

Collaborative Collection

In February 2020, the International Internet Preservation Consortium (IIPC) called on its members, including BAnQ, to create a collaborative collection of websites dealing with the emerging pandemic.

BAnQ’s contribution to this collection formed the basis of the Québec collection, which we decided to create once the scale of the crisis became apparent. BAnQ  had already created several collections on special events, for example the 375th anniversary of the city of Montreal, the collection on the pandemic is part of this corpus around exceptional events.

The Québec collection includes Québec government websites, and sections of websites, dealing with the pandemic. It also includes the websites of public health authorities (Directions de la santé publique), Québec’s National Public Health Institute (INSPQ), as well as the CISS and CIUSS (Integrated Health and Social Services Centres). Web pages about the pandemic from a number of cities and towns are included, as well as universities, CEGEPs (senior high schools), and school boards. Websites of companies that are particularly affected by the pandemic, such as financial institutions and supermarket chains, are also included.

Articles dealing with COVID-19 from Québec-wide and regional papers are collected, as well as parts of the websites of professional orders and associations. Of course, sites that have emerged or been in the news since mid-March, such as, are also harvested. At the time of writing, over 15,000 URL addresses have been collected, and new ones are added every week.

Capturing social media

As for social media, BAnQ collects the Twitter feeds and Facebook pages of personalities and public bodies involved in front-line management of the crisis, such as Premier François Legault, Québec’s health ministry (Santé Québec), and the City of Montréal’s police department (Service de police de la Ville de Montréal). All over the world, memory institutions are working to preserve traces of the pandemic. Thanks to these efforts, it is our hope that nothing will be forgotten.


[1] This article will appear in the June 2020 issue of À rayons ouverts – Chroniques de Bibliothèque et Archives nationales du Québec, No. 106 (Spring/Summer 2020), p. 26.

 [2] Alfred W. Crosby, America’s Forgotten Pandemic – The Influenza of 1918, 2e édition, Cambridge, Cambridge University Press, 2003, (consulté le 4 mai 2020).

Quebec Websites: A Decade of Harvesting

This year Bibliothèque et Archives nationales du Québec (BAnQ) celebrates their 10th anniversary of archiving Québec websites. We are delighted to announce that BAnQ will be hosting the next IIPC General Assembly and Web Archiving Conference. The events will be held on 11-13 May 2020.

By Martine Renaud, Librarian, Legal Deposit and Acquisitions Department at Bibliothèque et Archives nationales du Québec 

About Bibliothèque et Archives nationales du Québec
At once national library, national archives and public library of a major metropolitan city, Bibliothèque et Archives nationales du Québec (BAnQ) brings together, preserves and promotes heritage materials from or related to Quebec.

In 2009, after several years of work and reflection, BAnQ began to harvest and archive Québec websites. As discussed in an article on BAnQ’s blog (in French), these heritage materials are often volatile and ephemeral. Harvests were initially carried out as part of a pilot project.

BAnQ takes a selective approach to Web harvesting. A number of factors make it difficult to thoroughly harvest the Quebec Web, including the size of the body of materials to be collected, given BAnQ’s limited resources, the legal constraints, i.e., the requirement to obtain a license granting permission from the Web Producer or other copyright owners to make their site accessible and finally context, because Quebec does not have its own domain name.

In the news in 2009 
The reach of these first harvests was modest: about 25 government organizations, chiefly ministries.

Looking at the sites collected in 2009, what do we see? Obviously, they reflect what was topical at the time. In 2009, much attention was paid to the influenza epidemic. Does anyone still remember the infamous H1N1 virus? A major vaccination campaign was underway during the winter of 2009, and the Quebec government had a website dedicated to this topic:

The Pandémie influenza website, which is no longer in existence. 

On the Quebec Ministry of Finance website, a number of documents dealt with the effects of the 2008 global financial crisis on Quebec’s economy:

Quebec Ministry of Finance website, 2009.

Still in the news today
While the flu pandemic and the economic crisis are presumably behind us, some news items from 2009 are still topical today. In 2009, reports submitted as part of the Bouchard-Taylor Consultation Commission on Accommodation Practices Related to Cultural Differences were available on the Commission website:

Website of the Bouchard-Taylor Consultation Commission, which no longer exists.

The website is not online anymore, and yet cultural differences and accommodations are still in the news today.

The Quebec National Assembly
Quebec’s National Assembly website also provides interesting historical perspectives. It includes a page dedicated to Quebec’s current Premier, François Legault, who at the time was simply an elected member of the Parti Québécois. As Premier, he is now leader of the Coalition Avenir Québec, a party he co-founded in 2011.

Quebec Web harvests since 2009
Ten years later, harvests have become more numerous. They are broader in scope and much more diverse, with BAnQ’s reach now extending beyond government websites.

The following table compares the 2009 harvests and those carried out as of March 1, 2019:

2009 2009-2019
Number of harvests 16 12,823
Number of organizations whose website is made available 25 1,295
Documents harvested 17,026,257 149,647,697
Total size of archives (terabytes) 0.90 31

It is interesting to see how the use of images, and audio and video materials, has increased:

2009 2009-2019
Type of documents harvested Number Size (Gb) Number Size (Gb)
HTML pages 15,073,735 306 122,146,682 4,967
Images 1,275,183 49 18,159,220 1,454
Applications (PDF, Word, Excel, etc.) 644,117 526 5,702,995 3,695
Video materials 17,009 19 1,309,413 20,288
Audio materials 7,458 4 79,660 320
Other 8,755 0.01 2,249,727 235

Proliferating applications are a major challenge for institutions that harvest websites. BAnQ relies on Heritrix.

Contents to explore and to work with
Since 2009, harvests have progressively widened their scope. They now provide a number of corpuses of interest to researchers, particularly in the digital humanities field. Websites dealing with Quebec provincial elections in 2012, 2014 and 2018 have been harvested (major parties, political blogs, news sites, etc.). The municipal elections of 2013 and 2017 have also been covered. In addition, we harvest what are known as “thematic” (i.e. non-governmental) sites: cultural organizations (museums, libraries, and archives), community organizations, professional associations, regional newspapers, and so on.

Websites harvested by BAnQ can be accessed through an interface. Interested researchers may also access the data directly on request.