By Sabine Schostag, Web Curator, The Royal Danish Library
Introduction – a provoking cartoon
In a sense, the story of Corona and the national Danish Web Archive (Netarchive) starts at the end of January 2020 – about 6 weeks before Corona came to Denmark. A cartoon by Niels Bo Bojesens in the Danish newspaper “Jyllandsposten” (2020-01-26) showing the Chinese flag with a circle of yellow corona-viruses instead of the stars caused indignation in China and captured attention worldwide. We focused on collecting reactions on different social media and in the international news media. Particularly on Twitter, a seething discussion arose with vehement comments and memes about Denmark.
From epidemic to pandemic
After that, the curators again focused on the daily routines in web archiving, as we believed that Corona (Covid-19) was a closed chapter in Netarchive’s history. But this was not the case. When the IIPC Content Development Working Group launched the Covid-19 collection in February, the Royal Danish Library contributed the Danish seeds.
Suddenly, the Corona virus arrived in Europe and the first infected Dane came home from a skiing trip in Italy. The epidemic turned into a pandemic. On March 12, the Danish Government decided to lockdown the country: all public employees where sent to their home offices and borders were closed. Not only the public sector shut down, trade and industry, shops, restaurants, bars etc. had to close too. Only supermarkets were still open and people in the Health Care sector had to work overtime.
While Denmark came to a standstill, so to speak, the Netarchive curators worked at full throttle on the coronavirus event collection. Zoom became the most important work tool for the following 2½ months. In daily Zoom meetings, we coordinated who worked on which facet of this collection. To put it briefly, we curators had coronavirus on our minds.
Event crawls in Netarchive
The Danish Web Archive crawls all Danish news media between several times daily and one time weekly, so there is no need to include news articles in an event crawl. Thus, with an event crawl we focus on augmented activity on social media, blog articles, new sites emerging in connection to the event – and reactions in news media outside Denmark.
Coronavirus documentation in Denmark
The Danish Web collection on coronavirus in Denmark is part of a general documentation on the corona lockdown in Denmark in 2020. This documentation is a cooperation between several cultural institutions, the National Archives (Rigsarkivet), the National Museum (Nationalmuseet), the Workers Museum (Arbejdermuseet), local archives and, last but not least, the Royal Danish Library. The corona lockdown documentation was supposed to be done in two steps: the “here and now” collection of documentation under the corona lockdown and a more systematic follow-up by collecting materials from authorities and public bodies.
“Days with Corona” – a call for help
All Danes were asked to contribute to the corona lockdown documentation, for instance by sending photos and narratives from their daily life under the lockdown. “Days with Corona” is the title of this part of the documentation of the Danish Folklore Archives run by the National Museum and the Royal Library.
Netarchive also asked the public for help by nominating URLs of web pages related to coronavirus, social media profiles, hashtags, memes and any other relevant material.
Help from colleagues
Web archiving is part of the Department for Digital Cultural Heritage at the Royal Library. Almost all colleagues from the department were able to continue with their every day work from their home offices. Many colleagues from other departments were not able to do so. Some of them helped the Netarchive team by nominating URLs, as this event crawl could keep curators busy more than 7½ hours a day. We used a Google spreadsheet for all nominations (fig. 1)
The Queen’s 80th birthday
On April 16, Queen Margarethe II celebrated her 80th birthday. One of the first things she did after the Corona lockdown, on March 13, was to cancel all her birthday celebration events. In a way, she set a good example, as everybody was asked not to meet with no more than ten people, ideally we only should socialize with members of our own household.
As part of the Corona event crawl, we collected web activity related to the Queen’s birthday, which mainly consisted of reactions on social media.
The big challenge – capturing social media
Knowledge of the coronavirus Covid-19 changes continuously. Consequently, authorities, public bodies, private institutions, and companies change information and precaution rules on their webpages frequently. We try to capture as much of these changes as possible. Companies and private individuals offering safety gear for protection against the virus was another facet in the collection. However, capturing all relevant activity on social media was much more challenging than the frequent updates on traditional web pages. Most of the social media platforms use technologies, which Heritrix (used by Netarchive for event crawling) is not able to capture.
More or less successfully, we tried to capture content from Facebook, TikTok, Twitter, YouTube, Instagram, Reddit, Imgur, Soundcloud, and Pinterest. Twitter is the platform we are able to crawl with Heritrix with rather good results. We collect Facebook profiles with an account at Archive-It, as they have a better set of tools for capturing Facebook. With frequent Quality Assurance and follow-ups, we also get rather good results from Instagram, TikTok and Reddit. We capture YouTube videos by crawling the watch-URLs with a specific configuration using YouTube dl. One of the collected YouTube videos comes from the Royal family’s YouTube channel: the Queens address to the people on how to behave to prevent or limit the spreading of the coronavirus (https://www.youtube.com/watch?v=TZKVUQ-E-UI, Fig. 2).
As Heritrix has problems with dynamic web content and streaming, we also used Webrecorder.io, although we have not yet implemented this tool in our harvesting setup. However, captures with Webrecorder.io are only drops in the ocean. The use of Webrecorder.io is manual: a curator clicks on all the elements on a page we want to capture. An example is a page on the BBC website, with a video of the reopening of Danish primary schools after the total lockdown (https://www.bbc.com/news/av/world-europe-52649919/coronavirus-inside-a-reopened-primary-school-in-the-time-of-covid-19, Fig. 3). There is still an issue with ingesting the resulting WARC files from Webrecorder.io in our web archive.
Danes produced a range of podcasts on coronavirus issues. We crawled the podcasts we had identified. We get good results when having an URL to a RSS feed, which we crawl with XML extraction.
Capture as much as possible – a broad crawl
Netarchive runs up to four broad crawls a year. We launched our first broad crawl for 2020 just in the beginning of the Danish Corona lockdown – on March 14. A broad crawl is an in-depth snapshot of all dk-domains and all other Top Level Domains (TDLs) where we have identified Danish content. A side benefit of this broad crawl might be getting Corona-related content into the archive – content which the curators do not find with their different methods. We identify content both with classic/common? keyword searches and using a variety of link scraping tools / link scrapers.
Is the coronavirus related web collection of any value to anybody?
In accordance with the Danish personal data protection law, the public has no access to the archived web material. Only researchers affiliated with Danish research institutions can apply for access in connection with specific research projects. We have already received an application for one research project dealing with values in the Covid-19 communication. We hope that our collection will inspire more research projects.