By Nicola Bingham, Lead Curator Web Archives, British Library and Co-Chair of the IIPC Content Development Working Group
I was delighted to present an update on the Content Development Group’s (CDG) activities at the 2020 IIPC General Assembly (GA) on behalf of myself, Alex and the curators that have worked so hard on collaborative collections over the past year.
Although it was disappointing not to have been in Montreal for the GA and Web Archiving Conference (WAC), it is the case that there are many advantages in attending a conference remotely. Apart from cost and time savings, it meant that many more staff members from our organisations could attend. I liked the fact that I could see many “old” web archiving friends online and it did feel like the same friendly, enthusiastic, innovative environment that is normally fostered at IIPC events. I was also delighted to see some of the attendee’s pets on screen, although it did highlight that other people’s cats are generally much more affectionate than my own, who has, I have to say, contributed little to the field web archiving over the years, although he did show a mild interest in Warcat.
Several things become clear when tasked with pre-recording a presentation with a time limit of 2 to 3 minutes. Firstly, it is extremely difficult to fit everything you need to say into such a short space of time; secondly, what you do want to say must be tightly scripted – although this does have the advantage that there is no room for pauses or “errs” in a way that can sometimes pepper my in-person presentations. Thirdly, recording even a two-minute video calls for a surprising number of retakes, taking many hours for no apparent reason. Fourthly, naively explaining these facts to the Programme and Communications Officer leads quite seamlessly to the suggestion of writing a blog post in order that one can be more expansive on the points bulleted in the two-minute presentation….
CDG Collection Update
Since our last General Assembly in Zagreb, in June 2019, the CDG has continued working on several established, and two new collections:
- The International Cooperation Organizations Collection was initiated in 2015 and is led by Alex Thurman of Columbia University Libraries. It previously consisted of all known active websites in the .int top-level domain (available only to organizations created by treaties), but was expanded to include a large group of similar organizations with .org domain hosts, and renamed Intergovernmental Organizations this year. This increased the collection from 163 to 403 intergovernmental organizations, all of which will continue to be crawled each year.
- The National Olympic and Paralympic Committees, led by Helena Byrne of the British Library was initiated in 2016 and consists of websites of national Olympics and Paralympics committees and associations, as identified from the official listings of these groups found on the official sites http://www.olympic.org and http://www.paralympic.org.
- Online News Around the World led by Sabine Schostag of the Royal Danish Library. This collection of seeds was first crawled in October 2018 to document a selection of online news from as many countries as possible. It was crawled again in November 2019. The collection was promoted at the Third RESAW Conference, “The web that was: archives, traces, reflections” in Amsterdam in June 2019 and at the IFLA News Media Conference at Universidad Nacional Autónoma de México, Mexico City in March 2020.
- New in 2019, the CDG undertook a Climate Change Collection, led by Kees Teszelszky of the National Library of the Netherlands. The first crawl took place in June, with a final crawl shortly after the UN Climate summit in September 2019.
- New in 2019, a collection on Artificial Intelligence was undertaken between May and December, led by Tiiu Daniel (National Library of Estonia), Liisi Esse (Stanford University Libraries) and Rashi Joshi (Library of Congress).
The main collecting activity in 2020 has been around the Covid-19 Global pandemic. This has involved a huge effort by IIPC members with contributions from over 30 members as well as public nominations from over 100 individuals/institutions.
We have been very careful with scoping rules so that we are able to collect a diverse range of content within the data budget – and Archive-It generously increased the data limit for this collection to 5TB. Collecting will continue to run, budget permitting, while the event is of global significance.
Publicly available CDG collections can be viewed on the Archive-It website.https://archive-it.org/home/IIPC and an overview of the collection statistics can be seen below.
Researcher-use of Collections
The CDG has worked closely with the Research Working Group co-chairs to promote and facilitate use of the CDG collections which are now available through the Archives Unleashed Cloud thanks to the Archives Unleashed project. The collections have been analysed and there are a large amount of derivatives available to researchers at IIPC-led events and/or research projects. For more information about how to access these collections please refer to the guidelines.
Next Steps/Getting in touch
We would very much welcome new members to the CDG. We will be having an online meeting in the next couple of months which would be an excellent opportunity to find out more. In the meantime, any IIPC member is welcome to suggest and/or lead on possible 2021 collaborative collections. For more information please contact the co-chairs or the Programme and Communications Officer.
Nicola Bingham & Alex Thurman CDG co-chairs