The French coronavirus (COVID-19) web archive collection: focus on collaborative networks

BnF’s Covid-19 web archive collection has drawn considerable media attention in France, including coverage in Le Monde, 20 minutes and TV Channel France 3. The following blog post was first published in Web Corpora, BnF’s blog dedicated to web archives.


By Alexandre Faye, Digital Collection Manager, Bibliothèque nationale de France (BnF)
English translation by Alexandre Faye and Karine Delvert

The current global coronavirus pandemic (Covid-19) poses an unprecedented challenge for the web archiving activities. The impact on society is such that the ongoing collection requires several levels of coordination and cooperation at a national and international level.

Since its spreading out of China and its later development in Europe, coronavirus outbreak has become a pervasive theme on the web. This sanitary crisis is being experienced in real time by populations simultaneously confined and largely connected, with a sense of emergency as well as underlying questioning. Archived websites, blogs, and social media should make up a coherent, significant and representative collection. They will be primary sourcesfor future research, and they are already the trace and memory of the event.

#jenesuispasunvirus

At the end of January 2020, while the Wuhan megapolis is quarantined, the first hashtags #JeNeSuisPasUnVirus and #CORONAVIRUSENFRANCE appear on Twitter. They denounce and show the stigma experienced by the Asian community in France. The Movement against racism and for friendship between peoples (Mouvement contre le racisme et pour l’amitié entre les peuples, MRAP) quickly published a page on its website entitled “a virus has no ethnic origin”. This is the first webpage related to coronavirus to have been selected, crawled and preserved under French legal deposit.

Group dynamics

The coronavirus collection is not conceived as a project, in the sense that it would be programmed, would have a precise calendar and would be limited to predetermined topics. It grows as a part of the both National and local news media and Ephemeral News Current Topics collections. The National and local news media collection brings together a hundred of national and local press websites, including the editorial content, such as headlines and related articles as well as Twitter accounts which are collected once a day. The News Current Topics collection, which requires both a technical and organizational approach, relies on the coordination of an internal network of digital curators from their relevant fields”. It facilitates dynamic and reactive identification of web content related to contemporary issues and important events. By documenting the evolution, spreading and overall impact of the pandemic in France, archiving policy embraces all facets of the public health crisis: medical, social, economic, political and more broadly scientific, cultural and moral aspects.

“A virus has no ethnic origin”. Movement Against Racism and for Friendship Between Peoples (MRAP) website. Archive of February 21, 2020.

70 selected seed URLs were crawled in January and February, while the spread of the virus out of China seemed to be limited and under control. Since March 17, date of the French lockdown, 500 to 600 seed URLs per week are selected and assigned to a crawl frequency: several times a day for social networks, daily for national and local press sites, weekly for news sections dedicated to the coronavirus, monthly for articles and dedicated websites which are created ex nihilo. Thus the section of the economic review L’Usine nouvelle is crawled weekly, because it organizes a stream of articles. Less dynamic, the recommendation pages of the National Research and Security Institute (INRES), is assigned monthly frequency.

By mid-April 2020, more than 2,000 selections and settings were created. This reactivity is all the more necessary due to the fact that certain web pages selected in the first phase have already disappeared from the live web.

The regional dimension

The geographical approach is also at the core of the archiving dynamics. The web does not entirely do away with territorial dimensions, as shown by the research works led on this topic. One may even think that they were reinforced as France is hit by the sanitary crisis, as the crisis coincides with the campaign for the municipal elections.

The curators of partner institutions all over the French territory have spontaneously enriched the selections on the coronavirus sanitary crisis. They contributed by including local and regional contents into account. This network is a key element to the national cooperation framework. Initiated in 2004 by the BnF, it relies on a network of 26 regional libraries and archives services, which share this mission of print and web legal deposit by participating in collaborative nominations. Its contribution proved to be significant since over 50% of the nominated websites selected until 15th April refer to local/regional content.

Simplified access to teleconsultation. ARS Guyana. Archived, April 5, 2020.

As a corollary, the crawl devoted to local elections has not been suspended after the 1st poll (which took place on March 15th), although the second poll (due to take place the following weekend) had been postponed and the whole electoral process suspended due to the crisis. In particular, the Twitter and Facebook accounts of the mayors elected in the 1st poll and those of the candidates who are still in contention for the 2nd poll have continued to be collected. These archives, as statements of mayors and candidates on the web during the weeks that had preceded and followed the 1st poll of local elections, already appear to be a major source for both electoral history and coronavirus pandemic in France.

Historic abstention rate in the local elections in the Oise “cluster”. francetvinfo.fr. Capture of March 16, 2020.

International cooperation

At the international level, the BnF and also in this way the other French participating libraries contribute to the archiving project “Novel Coronavirus (2019-nCoV) outbreak”. This initiative launched in February 2020 is supported by the IIPC Content Development Group (CDG) in association with the Internet Archive. It brings together about thirty libraries and institutions collaborating around the world on this web archive collection. At the end of May, more than 6,800 preserved websites representing 45 languages had been put online on Archive-it.org and indexed in full text.


The BnF has for many years been pursuing a policy of cooperation with the IIPC to promote preservation and use of web archives on an international scale. One of the research challenges is to facilitate comparisons of the different national webs, in particular for the global and transnational phenomena such as #MeToo and the current health crisis. A first contribution was sent at the end of February to the IIPC.  It consisted of an 80 seeds selection made during the first phase of the pandemic, just before Europe became the main active center in front of China. Some of these pages have already disappeared from the living web.

According to the IIPC’s new recommendations and considering the evolution of the pandemic in France, the next contribution to the IIPC should be a tight selection (almost 5% of the French collection) linked to high priority subtopics include: information about the spread of infection; regional or local containment efforts; medical and scientific aspects, social aspects; economic aspects; and political aspects. A third of those websites reports on medical domain. A second third provides information about French territories that are remote from Europe: French Guiana and West Indies, Reunion and Mayotte. The last part concerns citizen’s initiatives and debates during the lockdown.

For examples, the special INED’s website hosting gives information on local excess mortality, articles from Madinin’art, Montray Kreyol, Free Pawol were selected by a local curator and banlieues-sante.org is website of an NGO which acts against medical inequality and has created a YouTube channel explaining protection measures in 24 languages including sign language.

Dr François Ehlinger on EHPAD. Nicole Bertin’s Blog. Website capture from the Charente-Maritime region. Capture on April 3, 2020

What’s next?

Some of the websites nominated by the BnF and its partners tend to constitute a collective memory of the event. Until mid-April, the share of social networks represented 40% of the nominations, with a slight predominance of Twitter over Facebook. Although a large share is devoted to official accounts – namely, of institutions or associations (@AssembleeNat, @restosducoeur, @banlieuesante) or to accounts created ex nihilo (@CovidRennes, @CoronaVictimes, @InitiativeCovid), hashtags prevail in the set of selections.

The aim is to archive a representative part of individual and collective expressions by capturing tweets around the most significant hashtags: multiple variations of the terms “coronavirus” and “confinement” (#coronavacances, #ConfinementJour29), criticism of the way the crisis has been managed (#OuSontLesMasques, #OnOublieraPas), instruction dissemination and expressions of sympathy show a unique and characteristic mobilisation of citizens while following the pace of the news (#chloroquine, #Luxfer).

Daniel Bourrion, “The virus journals” on face-ecran.fr. Archived April 3, 2020.

Archives relating to the coronavirus, as they account for the outcomes of the sanitary crisis and of the lockdown in various domains, end up in overlapping the set of themes to which the BnF and its partners pay a particular attention or for which focused crawls have already been conducted or will be led. For instance, digital literature or confinement diaries, relationships between the body and public health policies, epidemiology and artificial intelligence, family life in confinement and feminism, can be mentioned.

“Next” isn’t just a matter of a unique form of promoting this special archive collection, which remains a work-in-progress. It is neither a delimited project nor an already closed. It is documentation for many kinds of research projects and also heritage for all of us.

Guide for confined parents. The French Secretariat for Equality (Le Secrétariat d’Etat chargé de l’égalité entre les femmes et les hommes et de la lutte contre les discriminations). Capture of April 10.

2 thoughts on “The French coronavirus (COVID-19) web archive collection: focus on collaborative networks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s