This year Bibliothèque et Archives nationales du Québec (BAnQ) celebrates their 10th anniversary of archiving Québec websites. We are delighted to announce that BAnQ will be hosting the next IIPC General Assembly and Web Archiving Conference. The events will be held on 11-13 May 2020.
By Martine Renaud, Librarian, Legal Deposit and Acquisitions Department at Bibliothèque et Archives nationales du Québec
About Bibliothèque et Archives nationales du Québec
At once national library, national archives and public library of a major metropolitan city, Bibliothèque et Archives nationales du Québec (BAnQ) brings together, preserves and promotes heritage materials from or related to Quebec.
Context
In 2009, after several years of work and reflection, BAnQ began to harvest and archive Québec websites. As discussed in an article on BAnQ’s blog (in French), these heritage materials are often volatile and ephemeral. Harvests were initially carried out as part of a pilot project.
BAnQ takes a selective approach to Web harvesting. A number of factors make it difficult to thoroughly harvest the Quebec Web, including the size of the body of materials to be collected, given BAnQ’s limited resources, the legal constraints, i.e., the requirement to obtain a license granting permission from the Web Producer or other copyright owners to make their site accessible and finally context, because Quebec does not have its own domain name.
In the news in 2009
The reach of these first harvests was modest: about 25 government organizations, chiefly ministries.
Looking at the sites collected in 2009, what do we see? Obviously, they reflect what was topical at the time. In 2009, much attention was paid to the influenza epidemic. Does anyone still remember the infamous H1N1 virus? A major vaccination campaign was underway during the winter of 2009, and the Quebec government had a website dedicated to this topic:

On the Quebec Ministry of Finance website, a number of documents dealt with the effects of the 2008 global financial crisis on Quebec’s economy:

Still in the news today
While the flu pandemic and the economic crisis are presumably behind us, some news items from 2009 are still topical today. In 2009, reports submitted as part of the Bouchard-Taylor Consultation Commission on Accommodation Practices Related to Cultural Differences were available on the Commission website:

The website is not online anymore, and yet cultural differences and accommodations are still in the news today.
The Quebec National Assembly
Quebec’s National Assembly website also provides interesting historical perspectives. It includes a page dedicated to Quebec’s current Premier, François Legault, who at the time was simply an elected member of the Parti Québécois. As Premier, he is now leader of the Coalition Avenir Québec, a party he co-founded in 2011.
Quebec Web harvests since 2009
Ten years later, harvests have become more numerous. They are broader in scope and much more diverse, with BAnQ’s reach now extending beyond government websites.
The following table compares the 2009 harvests and those carried out as of March 1, 2019:
2009 | 2009-2019 | |
Number of harvests | 16 | 12,823 |
Number of organizations whose website is made available | 25 | 1,295 |
Documents harvested | 17,026,257 | 149,647,697 |
Total size of archives (terabytes) | 0.90 | 31 |
It is interesting to see how the use of images, and audio and video materials, has increased:
2009 | 2009-2019 | |||
Type of documents harvested | Number | Size (Gb) | Number | Size (Gb) |
HTML pages | 15,073,735 | 306 | 122,146,682 | 4,967 |
Images | 1,275,183 | 49 | 18,159,220 | 1,454 |
Applications (PDF, Word, Excel, etc.) | 644,117 | 526 | 5,702,995 | 3,695 |
Video materials | 17,009 | 19 | 1,309,413 | 20,288 |
Audio materials | 7,458 | 4 | 79,660 | 320 |
Other | 8,755 | 0.01 | 2,249,727 | 235 |
Proliferating applications are a major challenge for institutions that harvest websites. BAnQ relies on Heritrix.
Contents to explore and to work with
Since 2009, harvests have progressively widened their scope. They now provide a number of corpuses of interest to researchers, particularly in the digital humanities field. Websites dealing with Quebec provincial elections in 2012, 2014 and 2018 have been harvested (major parties, political blogs, news sites, etc.). The municipal elections of 2013 and 2017 have also been covered. In addition, we harvest what are known as “thematic” (i.e. non-governmental) sites: cultural organizations (museums, libraries, and archives), community organizations, professional associations, regional newspapers, and so on.
Websites harvested by BAnQ can be accessed through an interface. Interested researchers may also access the data directly on request.
[…] context of social distancing. BAnQ has been collecting Québec websites on a selective basis since 2009. The result of this harvesting is largely available on the BAnQ portal. Sites for which BAnQ has […]
LikeLike