Archive of Tomorrow – Capturing online health (mis)information

By Alice Austin, Web Archivist, Archive of Tomorrow

Centre for Research Collections, Main Library, University of Edinburgh

Copyright ©2021 R. Stevens / CREST (CC BY-SA 4.0)

It goes without saying that the Covid-19 pandemic has cast a harsh light across our society and exposed fault lines in a number of areas, not least in the fragility of our information infrastructures. Over the last two years we have seen misinformation spread at a similar speed to the virus, with the consequence that any future attempts to try and examine the medical pandemic as an historical and social phenomenon will also have to reckon with the misinformation pandemic. Government and medical websites have changed on a daily basis as new information emerges, and there has been a massive proliferation of comment on social media and other online platforms about the virus and other health issues. Clinical advice, data and scientific evidence have been contested, revised, used and misused with dramatic and sometimes tragic consequences, and yet the digital record of this is fragile and difficult to access. There have been sustained and laudable efforts to ensure that inaccurate and potentially harmful information is taken down swiftly, with the result that a researcher exploring (e.g.) the emergence of ivermectin as a Covid ‘miracle cure’ might find they come up against a lot of dead ends and 404s.

Goals of the Archive of Tomorrow

In response, the Archive of Tomorrow project hopes to capture an accurate record of how people use the internet to find, share, and discuss health and health-related topics so that current and future researchers can understand public health practices in the digital age. We hope to capture 10,000 targets – ranging from official, ‘approved’ and verified sources, to unofficial, sometimes controversial publications – and to secure access permission for this content to produce a ‘research-ready’ collection. The project is ambitious, not just in its intention to build a useful evidence base of historical web resources but also in the attempt to develop an ethical and meaningful precedent for archiving possible mis- or dis-information. Because it crystallises so many of these issues, COVID is one subject that we’re focusing on in detail, but we’re also looking at capturing other health-related debates such as those that surround reproductive rights, ‘alternative’ medicines, assisted dying, and the use of medical cannabis.


Having launched in Feb 2022, the project is still in the early stages of development. It’s being led by the National Library of Scotland with web archivists based in university libraries in Edinburgh, Oxford and Cambridge, and invaluable input from the British Library’s web archiving team. This kind of collaborative working feels very much representative of the Covid-era – it’s hard to imagine a project like this emerging in the days when remote working and Zoom meetings were the exception rather than the norm! We’ll be talking more about the collaborative nature of the project at the IIPC WAC conference in May – and registration is open now!

Selecting ‘health information’

Thinking about how work practices have changed throughout the pandemic brings us to something that has been a challenge for the project team to unravel – how to define the boundaries around ‘health information’ – where it begins and ends, how health relates to other spheres like politics, law, employment and so on. We have to impose boundaries on our collecting, and while some boundaries are legislative or technological, such as the exclusion of broadcast media like podcasts and videos from the collection), some are cultural: for example, to what extent do protests against Covid measures such as masks and lockdowns count as health information? What about artistic responses to the pandemic? And how well are we able represent health information-seeking behaviours in languages other than English?

Welsh COVID-19 Pandemic guide: what to do and not do. Copyright © 2020 G. Hegasy (CCBY-SA 4.0)

Archivists have long understood that we can’t collect everything – and we don’t try to! As with so much collecting, the challenge lies in how to communicate our selection decisions without dictating the way the archived material is used and encountered. In this case, we’re trying to capture public health discourse and not be part of the conversation ourselves, but we do have a degree of responsibility when considering health mis/dis/information – to what extent should such inaccurate, or refuted or dangerous content be flagged in the UKWA interface? How do we make such content available responsibly without inserting our perspective into the debates?

Archive of Tomorrow workshop

At this stage we have more questions than answers, and we anticipate that this will continue. The project isn’t designed to solve these problems, but rather, to articulate them in a way that opens the door for future work and solutions. Our first activity towards this goal is the workshop that we’re hosting at the end of the month. We hope that by engaging with current and future researchers with an interest in online information-seeking behaviours or public health we can develop and produce a valuable, research-ready collection that will give real insight into how the internet has been used for health information during the pandemic and beyond.

3 thoughts on “Archive of Tomorrow – Capturing online health (mis)information

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s