Reflections on the 2024 IIPC General Assembly and Web Archiving Conference

By Friedel Geeraert, Expert in web archiving at KBR | Royal Library of Belgium


This year’s IIPC General Assembly and Web Archiving Conference took place at the Bibliothèque nationale de France (BnF) in Paris. It was wonderful to be welcomed once again into the warm web archiving community, especially in the superb surroundings the BnF had to offer. The welcome reception in the oval reading room at the BnF Richelieu site was especially memorable in that respect. Other than the lovely encounters with web archiving colleagues from around the world, the General Assembly and the Web Archiving Conference program had a lot to offer.

GAWAC2024Reception
Opening remarks by the President of the BnF, Gilles Pécout, in Salle Ovale.
Photo credit: Guillaume Murat, BnF

The General Assembly gave insight into the strategic plan for 2026-2031 and the reflections of the Steering Committee during their meeting that took place the day before. The transparency about their discussions and the active call for participation of members in determining the strategic priorities of the IIPC was greatly appreciated. The historical overview of the changes that have taken place in the Consortium Agreement was also fun to see, as it showed how the IIPC has grown as an organization over the decades. 

Workshops offered participants opportunities to gain hands-on experience in becoming confident trainers in the domain of web archiving, running your own full stack SolrWayback, and crawling using the Browsertrix Cloud, among others. Panel discussions and keynotes allowed for deepening one’s knowledge about Skyblog (a French pioneer in social networks), the archivability of websites, archiving social media, and training Large Language Models. Sessions focused on a myriad of subjects such as capturing unique content (ads, digital artworks, memes, etc.), digital preservation, and planning (tenders, sustainability of web archiving programs, training, etc.). The poster sessions and the drop-in and lightning talks allowed participants to gather information on a whole range of concepts very efficiently.

This is only a selection of themes that were covered during the conference. The program comprised three parallel sessions, all covering interesting topics, thereby inspiring a significant level of FOMO in participants.

Friedel_Geeraert_DITalk_GAWAC24
Friedel Geeraert presenting a KBR drop-in talk. Photo credit: Olga Holownia.

At KBR, there are currently three projects in the pipeline:

  1. Setting up a web archive on a voluntary basis (via a public tender)

  2. Extending the legal deposit legislation to online content

  3. The BelgicaWeb research project. The project is funded by BELSPO, the Belgian Science Policy Office, through the BRAIN 2.0 program and aims to make the born-digital heritage of Belgium accessible and FAIR.

Bearing in mind this institutional context, a number of elements evoked during the General Assembly and Web Archiving Conference are particularly useful. Within the BelgicaWeb project, we will further look into SolrWayback and Browsertrix Cloud. APIs offered by organizations such as Arquivo.pt are also sources of inspiration. Initiatives such as Datasheets for Web Archives by Emily Maemura and Helena Byrne can also prove useful in describing the provenance of collections of archived web content. Using PWIDs to reference web sources archived in certain web archive collections has also been adopted as best practice within the BelgicaWeb project.

As a member of the Preservation Working Group at KBR, I found the session on Digital Preservation especially useful. The Danish Royal Library proved itself once again as one of the leading examples in Europe where digital preservation of born-digital content is concerned. Thanks to their presentations, we will be looking further into Bitrepository.org.

All in all, this was another great edition of the IIPC GA & WAC. I can’t wait for the next conference in Oslo!

2023 General Assembly and Web Archiving Conference Wrap-up

04413a82-ca84-4f6e-a6e4-855402fb72b0
Link to album: https://flic.kr/s/aHBqjAGaxc
Credits: Jacqueline van der Kort (Beeldstudio KB), Ode-Louise Eshuis (www.byode.nl) & Olga Holownia.

The IIPC annual General Assembly (GA) and Web Archiving Conference (WAC) are the most important events on the IIPC calendar. They have long provided a forum for our members and the broader web archiving community to present their work, exchange ideas, and network. For us, each GA and WAC represents a new collaboration with hosting member institutions as we build connections with different hosts and web archiving teams every year. We had the pleasure of co-organizing GA and WAC with not one but two member institutions this year: the Netherlands Institute for Sound and Vision and KB, the National Library of the Netherlands.

The 2023 WAC marked the first in-person IIPC event in four years, since the 2019 conference in Zagreb, and it took some time to adjust and prepare for a face-to-face gathering. The organization of in-person events always presents its own set of challenges, and effective communication with the local team is crucial. We were very fortunate to work over the course of nine months with our excellent colleagues from KB and Sound & Vision on the organizing committee, all of whom went above and beyond to help make the conference a success. We were uncertain about the transition back to an in-person conference after such a lengthy break until the start of the GA – from the first moments of the registration and coffee hour, the atrium was buzzing with excitement, an atmosphere that could not be replicated on Zoom. We loved getting to meet delegates in-person for the first time, and it was also great to reconnect with delegates we met at previous conferences.

#webarchivering

The 2023 IIPC annual event returned to the Netherlands after 12 years: previously, in May 2011, the IIPC General Assembly was held at KB in the Hague in May 2011. This year, KB hosted the 2023 Steering Committee meeting. The 2023 GA and WAC took place in Hilversum, in the Netherlands Institute for Sound & Vision’s refreshingly colorful building that houses one of the largest audiovisual collections in Europe as well as a newly opened Media Museum and web archiving collection of audiovisual content. The conference began with a high-energy start with opening remarks from Sound & Vision’s Eppo van Nispen, setting the tone for a busy and enthusiastic few days of web archiving discussions. 

Our opening keynote by Eliot Higgins of Bellingcat discussed the importance of open source investigation, leading to a thoughtful Q&A chaired by Sound & Vision’s Johan Oomen

The conference closing keynote by Marleen Stikker of Waag Futurelab (chaired by Martijn Kleppe from KB) highlighted the story of the Digital City, provoking a conversation on public values in the digital domain. 

Jeffrey van der Hoeven of KB (and IIPC’s 2023 Vice-Chair) ended the program with a thoughtful closing speech that showcased all of the work of the 2023 GA&WAC, and that highlighted the words of former KB colleague Kees Teszelsky: “Keep archiving & keep collecting and describing your own heritage. A collection without context is only half of use for researchers.”

Fv7_J8AXoAAmfXy

Photo of Jeffrey van der Hoeven (KB) delivering WAC 2023 closing remarks.
Photo credit: Yves Maurer. https://twitter.com/yvesmaurer/status/1657042261790556165?s=20

One of the benefits of co-organizing our events is that they provide an opportunity to promote the work of our hosts both at home and abroad. Earlier this year, we published pages on web archiving in the Netherlands, highlighting specific programs at KB and Sound & Vision as well as national initiatives. On this theme, our hosts often offer a public event which typically accompanies the conference to raise awareness of regional and global digital preservation activities. This year we were able to include a public panel discussion on the Netherlands UNESCO projects as well as an introduction to collaborative, transnational web archiving.

52943672311_99ce1cd525_c
Photo of Tamara van Zwol (Dutch Digital Heritage Network) and panelists from IIPC WAC 2023 Public Event: Building Digital Heritage Together: Dutch and Transnational Perspectives, 10 May.
Photo credit: Jacqueline van der Kort; Beeldstudio KB| nationale bibliotheek

The IIPC WAC 2023 program and online resources

This year’s “Resilience and Renewal” conference theme reflected on two decades of practice and collaboration. The conference marked the start of the Consortium’s 20th-anniversary celebration, including “efforts to support a community that has grown beyond its own membership to include a transnational range of institutions involved in developing and delivering web archiving practice and programs” (WAC 2023 CfP). Presentations, panels and workshops for the in-person and online program were reviewed and selected by the WAC Program Committee chaired by Lauren Ko of the University of North Texas Libraries (UNT), who also serves as co-chair of the IIPC Tools Development Portfolio.    

This year, the authors and delegates across all events represent over 170 organisations from over 40 countries. We were fortunate to bring together around 90 presenters for the in-person and online WAC. We immensely appreciate our presenters for sharing their time and expertise with us, as well as for their willingness to share slides and recordings of talks, which has enabled us to add to our growing collection of web archiving resources available online. We also had several excellent workshops, lightning talks, and drop-in talks at the conference in Hilversum. 

This year’s program also saw the launch of a Mentoring Program organized by the IIPC Membership Engagement Portfolio, pairing experienced mentors with delegates that were new to the conference or to the web archiving profession. This year’s Mentoring Program saw 16 mentor/mentee matches and we are hoping to offer this networking opportunity again at our future conferences. 

Online Day

We have always provided online meetings and workshops to connect with our global membership. The range of these events was significantly increased in 2017 with the addition of our research and technical webinars. Due to the pandemic, we had to boost our online offerings even more, including running the last few WACs (co-hosted by the National Library of Luxembourg in 2021 and the Library of Congress in 2022) entirely online. The 2023 Online Day allowed us to cover different time zones and made sure that we could offer at least one day of programming to those who couldn’t travel to Hilversum. This year we used the format developed for previous online editions of WAC, which comprised pre-recorded talks made available to conference delegates ahead of WAC and live Q&A sessions. Both pre-recorded talks and recorded Q&As for the WAC 2023 Online Day are also now available on YouTube.

The General Assembly

IIPC 2023 GA
Link to album: https://flic.kr/s/aHBqjAGaxc
Credits: Jacqueline van der Kort (Beeldstudio KB), Ode-Louise Eshuis (www.byode.nl) & Olga Holownia.

The General Assembly has been a member benefit from the early years of the IIPC. First hosted by the National Library of France in 2007, the IIPC annual event gradually expanded to include Open Days, now known as WAC. The GA has been highly valued for its practitioner-oriented content, fostering informal networking opportunities and facilitating important strategic discussions within the consortium. 

This first return to an in-person meeting since 2019 was a big success with 39 member institutions from 27 countries attending.

The GA is an excellent opportunity for IIPC networking and catching up in-person, as well as a chance for Working Groups to meet and plan for the next season of events. It is also an opportunity for members to get updates on IIPC activities and planning. This year’s General Assembly program began with an address from IIPC’s 2023 Chair, Youssef Eldakar from Bibliotheca Alexandrina. Reports from the 2023 Executive Board, Working Groups, and Portfolios followed, and then there were member updates after a break. The afternoon’s program included meetings for each of our working groups: Content Development, Research, Training, and the Tools Portfolio. It was wonderful to see so many of our members in-person, and to hear their updates. We also launched our 2023 Member Survey and Member Activity Survey at the GA, a Membership Engagement Portfolio initiative that will shape future programming and IIPC strategic planning. 

Thank you!

Organization of our annual events would not have been possible without the incredible support of our members. Our Program Committee was, as always, outstanding, and we would like to thank all the volunteers who chaired sessions during the online and in-person events. We would particularly like to thank Lauren Ko (University of North Texas), Paul Koerbin (National Library of Australia), and Meghan Lyon (Library of Congress) for providing additional assistance with the 2023 conference theme, program, and reviews. For the third year in a row, we’ve also been lucky enough to work with Robin Saklatvala to deliver the online day. 

We’ve already mentioned the amazing work of the 2023 Organizing Committee, all of whom worked tirelessly to make this year’s conference a success. We’d also like to thank the staff at Sound & Vision for all of their help. All of their work was crucial to the conference’s success, from the excellent catering staff and chef providing our attendees with delicious food and coffee, to the hard-working audiovisual technicians for making sure that presentations were able to run smoothly, to Marloes and Rachel’s incredible help managing the registration desk. Thank you also to our student volunteers for their support during the conference.

We would also like to thank the UNT Digital Library for working with us for the past five years to provide a home for slides and other IIPC materials, making them accessible, well-preserved, and easily citable. The 2023 presentations are available in the IIPC Conference collection. 

Last but not least, we’d like to thank all GA and WAC 2023 delegates for attending both online and in person. Your questions to presenters, engaging discussions during breaks, and presence at the conference and the Online Day all helped to make the conference what it was. Thank you also to our GA and WAC delegates for providing us with such great and thorough feedback. We’ve used it to help prepare the 2024 WAC Call for Proposals and will be considering it further in the organization of future events. 

There have been some wonderful post-conference blog posts, summarizing some of the highlights for different conference attendees. It has been great to see what had the biggest impact for delegates at the 2023 conference! We have included a list of these posts below, and will keep updating it as new materials become available. Please feel free to contact us if you have a post that isn’t yet included. 

We’re now taking the opportunity to invite everyone to both explore the 2023 collection of recordings and slides and to submit a proposal for the 2024 conference hosted by the National Library of France. We hope to see you all in Paris next year!

Blog posts 

Resources

 

The 2023 IIPC Web Archiving Conference Reflections

By Friedel Geeraert, Expert in web archiving at KBR | Royal Library of Belgium


The IIPC Web Archiving Conference 2023 took place in Hilversum in The Netherlands at the beautiful building of Sound and Vision. The warm atmosphere of the web archiving community gathered there more than compensated for the cold rain outside. Over the two day conference, presentations were given about themes such as new initiatives and collections, COVID-19, collaborations, digital scholarship and research, tool development, quality assurance, outreach, inclusive representation, data management, preservation and infrastructure. The different workshops organised during both days provided the opportunity to gain more hands-on experience. The programme was so interesting that it was difficult to choose which track to follow or which workshop to choose.

S&V
Netherlands Institute for Sound and Vision in Hilversum
Photo: Olga Holownia | IIPC

Open Source Investigation and Public Values in the Digital Domain

The two keynote speakers, Eliot Higgins of Belligcat and Marleen Stikker of Waag Futurelab shared their expertise and vision. Higgins provided insight into Bellingcat, the independent group of investigative journalists and their ethical digital investigation into conflicts such as the war in Ukraine to debunk misinformation. Bellingcat also initiates programmes to teach students to think critically about online information and sources, thereby helping them to make better informed decisions and formulate well-founded opinions, which is hopeful in light of the polarisation of society.

IIPC WAC 2023 keynote: Eliot Higgins
Eliot Higgins | Bellingcat & Johan Oomen | Sound & Vision
Photo: Olga Holownia | IIPC

Stikker explained her alternative history of the internet focusing on the social roots instead of its military origins and the role we can all play into managing the internet as a commons and govern it accordingly. She suggests assessing the foundation by asking critical questions about the underlying assumptions and the organisation of government, guaranteeing human rights and ensuring a regenerative socio-economic model (as opposed to the current extractive model). Above all, she argues for undertaking action by for example moving towards platforms that are not governed by big commercial corporations such as Signal and Mastodon.

IIPC 2023 WAC keynote: Marleen Stikker
Marleen Stikker | Waag Futurelab
Photo: Olga Holownia | IIPC

Thoughts, tips and takeaways

As always, participants came away with their heads filled with ideas and useful information. Armed with numerous pages filled with notes and lists of people I need to contact in the coming months to obtain more information, I returned to KBR in Belgium. I will be using the coming year to further look into ARCH, the Archive Research Compute Hub, developed by the Archives Unleashed Project, the Browsertrix Cloud, developed by the Webrecorder team and SolrWayback, developed by the Danish Royal Library. Providing more descriptive information about web archive collections was another interesting idea that was evoked by the web archiving team of the BnF and by Emily Maemura and Helena Byrne as well in their ‘datasheets for datasets’ concept.

IIPC WAC 2023: workshops
“Describing Collections with Datasheets for Datasets” workshop
Photo: Jacqueline van der Kort | Beeldstudio KB

Web Archiving Conference  may 2023
Jefferson Bailey | Internet Archive, ARCH workshop
Photo: Jacqueline van der Kort | Beeldstudio KB

Other aspects that sparked my interest are preservation practices in the context of web archiving, for example WARC validation presented by the team of the National Archives of the Netherlands, the need for consistent use of data repositories such as Zenodo, Software Heritage and the Internet Archive and the use of the URN PWID to reference web archive sources. Other ideas that arose during the conference were linked to quality assurance and analysis: the use of tools such as Screaming Frog by the team at the UK Government Web Archive, the WAVA tool (Web Visualisation and Analysis), developed by the team behind the Web Curator Tool, and the use of rubrics as demonstrated by the speakers of the Library of Congress. The team behind the End of Term web archive also talked about tools used by CommonCrawl that are promising to create derivative datasets and enriched metadata.

Community coming together

These are only a few examples of the wealth of interesting ideas evoked at this conference but on top of that it was wonderful to catch up with other members of the web archiving community during the breaks. Over cups of coffee, delicious ciabatta and sweet pastries, topics of conversation ranged widely from planned changes to national legislations, evolutions in Twitter collection policies and public tenders on one end of the seriousness spectrum, with the sock affinity of an awfully cute puppy and discussions about the best gifts to give to 6 month old babies on the other end. Many thanks for the organisers of this year’s conference for such a great edition of the IIPC WAC. Needless to say, I’m already looking forward to next year’s edition on April 25-26, 2024 hosted by the BnF.

Web Archiving Conference: May 2023
Photo: Jacqueline van der Kort | Beeldstudio KB

Remembering Past Web Archiving Events With Library of Congress Staff

By Meghan Lyon, Digital Collection Specialist, Library of Congress and member of WAC 2022 Program Committee


Since joining the Library of Congress Web Archiving Program remotely in 2020, I have had the pleasure of participating in IIPC activities and getting to know the generous and hardworking members of this community. Although—due to Covid-19 restrictions—I have yet to meet many of my colleagues in person, I feel as though I’ve been wholeheartedly welcomed. It is a privilege to be a member of the Program Committee for the 2022 Web Archiving Conference and General Assembly, which will be hosted virtually by the Library of Congress.

Last year, I remember the tireless planning efforts of Senior Program Officer Olga Holownia as she and then-IIPC Chair, now Vice-Chair, Abbie Grotke and staff members from the National Library of Luxembourg (2021’s amazing virtual conference host) tested the virtual conference platform. They tested virtual tables, planned for break-out chats post-session where engaged members could continue discussions from the previous panel. The end result was engaging and exciting, especially for a virtual conference.

It was a pleasure to learn at that time about topics as diverse as the Frisian web (Kees Teszelszky, “Side fûn: mapping the Frisian web domain in the Netherlands”), flash capable browser emulation (Ilya Kreymer & Humbert Hardy, “Not gone in a Flash! Developing a Flash-capable remote browser emulation system”), and experimental methods of quality assurance for web archives (Brenda Reyes Ayala, James Sun, Jennifer McDevitt & Xiaohui Liu, “Detecting quality problems in archived website using image similarity”). Ayala, et.al.’s presentation led me to Dr. Ayala’s research, which has greatly impacted QA workflow development here at the LoC. Workflow development will be included in the panel “Advancing Quality Assurance for Web Archives: Putting Theory into Practice” in the upcoming 2022 Web Archiving Conference. If you missed the 2021 conference, you can still view selected talks and Q&A sessions on the IIPC YouTube channel

WAC2021
IIPC 2021 Web Archiving Conference co-hosted with the National Library of Luxembourg.

With that, I’d like now to ask Abbie Grotke, my supervisor and Vice-Chair of the IIPC, as well as Grace Thomas, one of my teammates on the Web Archiving Team, some questions about their experience in the IIPC community:

Meghan Lyon: Give us a snapshot of your first experience — or of a memorable experience — at an IIPC WAC & GA of times past.

Grace Thomas (Senior Digital Collection Specialist, Web Archiving Team):

The first IIPC WAC I attended was Web Archiving Week 2017 in London. I had joined the Library of Congress Web Archiving Team less than a year earlier and I was still trying to figure out the extent of this new world. From what my seasoned coworkers said about the web archiving community, I knew it was small and geographically disparate – a modest group of faceless individuals shouldering the massive task of archiving the web – but the events in London showed me how kind, collaborative, and very real everyone is. We are all dealing with the exact same issues at different scales and, most importantly, I got the feeling that everyone was there because they wanted to carry on this work and find solutions to those problems together.

WAW2017-ArchivesUnleashed
Archives Unleashed hackathon during the Web Archiving Week 2017 at the British Library.
Photo credit: Olga Holownia.

I also attended the 2018 WAC in Wellington, New Zealand which provided me the opportunity of a grand adventure in a stunning locale! Even now, nearly four years later, I frequently recall Dr Rachael Ka’ai-Mahuta’s keynote about the archiving of Indigenous Peoples’ language, culture, and movement, which gave me an important framework for thinking about the ethics of cultural ownership. The farthest I had ever traveled from home, having been surrounded by Māori customs and artifacts that week further deepened these concepts and I’m grateful to have been in that place exactly at that time.

Although, I have to say the most memorable WAC experience was nearly missing my flight back to the US from London in 2017 and seeing Abbie’s face break into a relieved smile as I sprinted up to the gate at Heathrow! I guess I didn’t want to leave the WAC… and who would?

WAC2022-Dr_Rachael_Ka’ai-Mahuta
Keynote by Dr Rachael Ka’ai-Mahuta titled Te Māwhai – te reo Māori, the Internet, archiving, and trust issues. Photo credit: Mark Beatty.

Abbie Grotke: Besides reliving that moment of almost losing Grace in London (oh my!) my first IIPC memories were from way back in the beginning of the consortium. There was talk of this international group forming, and although I did not get to go to an early meeting in Rome, Italy, I attended another very early-days discussion called “National Libraries Web Archiving Consortium” which was held at the Library of Congress in the March 2003. It was there that (besides colleagues at Internet Archive) I first met fellow web archiving colleagues from the British Library, Bibliothèque nationale de France, National and University Library of Iceland, and National Library of Canada (now Library and Archives Canada). These, along with LC and Internet Archive and a number of other institutions were the early founders of the consortium and many became good friends and colleagues for many years. I couldn’t have imagined then that I would still be involved in this community all these years later!  A lot of those folks have moved on or retired, but our institutions still work closely together to this day.

One of my favorite memorable experiences was when I was communications officer, supporting the Steering Committee of the IIPC, who had been in Oslo for a meeting of the Access Working Group where we were hashing out requirements for an access tool. We all hopped on a plane to Trondheim, then a puddle jumper plane from Trondheim to Mo i Rana where the other National Library buildings were, for a Steering Committee meeting up there. Gildas Illian (the IIPC technical lead at the time) and I were in the very back of the plane which had a row entirely across the back, looking straight down the middle aisle. Most of the Steering Committee was on the plane, which was having some horrible turbulence. Even though we were terrified by the flight I just remember laughing so much (coping mechanism!) with Gildas about the fact that if the plane went down, the consortium would be over. We also couldn’t stop laughing at the “barf bags” in front of us, which said “uuf da” – which I now say ALL the time and always think back to that moment. We of course landed safely. That was also the meeting where a colleague from New Zealand was calling in the entire two days of meetings despite the time difference, and at dinner we started talking to a plant in the middle of the table as if it were him. Good times!

This slideshow requires JavaScript.

Meghan Lyon: Tell us one thing you love or appreciate about being a part of the international web archiving community.

Abbie Grotke: It truly is the most supportive community and I am forever grateful about the opportunity to meet and know so many helpful colleagues from across the globe. And there is nothing like a conference in a beautiful library in an unfamiliar city with the smartest experts in web archiving in the world. I’ve forged some wonderful friendships over the years. While a virtual meeting is not quite the same and I can’t wait until we can meet again in person, I’ve been amazed at how we’ve adapted as a community to an entirely virtual event. In many ways it’s allowed for a richer experience – more people who might not have been able to travel to the conference and meetings can participate, and in my mind that’s always a benefit! I hope we can continue to keep a blend of in person and virtual events in the future. Come join us!

WAC 2022

Registration is now open. Register separately for each day you plan to attend—May 23, 24, and 25 for the WAC, May 17-19 for the General Assembly. View the schedule and abstracts, and learn more about the Conference and GA sessions on the IIPC Website: 2022 Web Archiving Conference!


IIPC General Assembly & Web Archiving Conference 

IIPC GA & WAC collection at UNT Digital Library

2021 Web Archiving Conference: presentations & recordings

The 2021 Web Archiving Conference in Luxembourg

By Ben Els, Digital Curator at the National Library of Luxembourg and Chair of the 2021 General Assembly (GA) and Web Archiving Conference (WAC) Organising Committee

In collaboration with the IIPC, the National Library of Luxembourg (BnL) had the honour of hosting the 2021 edition of the Web Archiving Conference. As a virtual event, this conference brought together experts and researchers from 39 countries, to present and evaluate the latest developments in the world of web archiving. The last edition of this conference took place in 2019 in Zagreb and since many web archiving institutions haven’t had the opportunity for a local exchange with fellow practitioners, this international meeting plays a vital role in the concerted efforts in Internet preservation.

From in-person to virtual

The preparations for this year’s conference started in June 2019 and, initially, the 2021 conference was meant to take place in our new library building. As you can imagine, the ups and downs of the past 1 ½ years had a significant influence on the preparations for this conference: Will people be able to travel? What will the safety measures be? Should we plan for a hybrid solution? Eventually, we decided that a hybrid solution would hardly be feasible from an organisational standpoint and would likely also disadvantage participants attending the live event. When the decision was made to go for an online event, we faced another set of questions: how to combine the advantages of a virtual meeting with the indispensable aspects of a physical conference? In other words, what are the most valuable experiences that people would like to take away from a real-life meeting?

Our conclusion was to aim our efforts at enabling lively discussions, to focus on Q&A sessions and networking, which would normally happen during coffee breaks or social events. We spent several months researching and testing different video-conference and virtual event platforms. Finally, we decided to abandon the idea of 8-10 hour Zoom calls and moved to a different format, using the relatively new platform Remo. We also asked the speakers and panelists to make their presentations available ahead of the conference as pre-recorded videos. This way, participants were able to watch the videos they were interested in beforehand, so that during the conference, we could jump into the Q&A part right away. This format allowed for a more lively experience, with more engaging discussions. This was illustrated by the fact that many participants stayed with the event from 08:00 in the morning until midnight!

Customising the online experience 

As Olga explained during the General Assembly: “online doesn’t mean less work”. We realised that Remo is not a perfect platform and that there was a smart adaptation phase for first-time users. Therefore, a lot of work went into organising training events during the three weeks before the conference. We made sure that all speakers, session chairs and panelists took part in at least one of these familiarisation sessions, which helped the event in getting a lot of technical and organisational questions out of the way ahead of time. Moreover, we had a number of volunteers on board, making sure that the program of the conference would run smoothly and all technical difficulties could be dealt with as quickly as possible. The team composed of five core members of the organising committee and the conference “super elves” was operating like a well-oiled machine – and that over the course of three days from 08:00 in the morning until midnight.

Online and all time zones

The second challenge of a virtual event is the differences in time zones, when people want to follow the discussions from their homes on the other side of the world. For this reason, we arranged the conference schedule in a way that would allow participants from all time zones to follow at least 6 hours of program during their normal working hours. This inclusive approach has proven to be successful, by surpassing the previous records for registrations and attendance. It is safe to say that the 2021 edition of the Web Archiving Conference has reached more people than ever before.

Virtually in Luxembourg  

The third challenge in an online event: how to highlight the character of the hosting institution, since the conference venue doesn’t really matter on the Internet? In collaboration with our sponsors and partners, including the National Research Fund and the Luxembourg – Let’s make it happen initiative, we tried to represent the BnL and the country of Luxembourg in a virtual space. On the customised floorplan in Remo, we highlighted our partners and included hints at cultural, historical and culinary Luxembourg landmarks. If you would like to learn more about the Emoxie icons and their stories, we invite you to a virtual visit to Luxembourg. Please don’t forget to stop by at the National Library! 

Lessons learnt and raising awareness

Before WAC 2021, the BnL didn’t have a lot of experience with hosting larger conferences and even less experience in online events. Although the time commitment should not be underestimated, the whole process was at the same time, an incredibly valuable learning experience (not to mention how much fun we had during preparation calls and all throughout the conference). Hosting the Web Archiving Conference has also pushed the BnL in getting to know all parts of the inner workings of the IIPC and getting in contact with many member institutions. Locally, we were able to draw attention to the role of the National Library as a frontrunner in digital preservation in Luxembourg (Mois des archives: Web Archive & Mir brauchen dréngend e Bachelor an den Informatiounswëssenschaften). We were also able to organise a shared panel with the University of Luxembourg, to highlight local efforts in documenting the Covid-19 pandemic.


From the incredibly generous feedback, we also learned that the attention to detail and thoughtful planning have not gone unnoticed by the participants. For that part, the BnL can only accept a fraction of the praise: without Olga’s and Robin’s tireless commitment and expertise, we never could have reached the goals that were set up at the beginning. Therefore, next year’s hosts should be reassured to have both of them on board and set the bar for 2022 even higher.

Collaborative collecting at webarchive.lu

By Ben Els, Digital Curator at the National Library of Luxembourg & the Chair of the Organising Committee for the 2021 IIPC General Assembly and the Web Archiving Conference

Our previous blog post from the Luxembourg Web Archive focused on the typical steps that many web archiving initiatives take at the start of their program: to gain first experience with event-based crawls. Elections, natural disasters, and events of national importance are typical examples of event collections. These temporary projects have occupied our crawler for the past 3 years (and continue to do so for the Covid-19 collection), but we also feel that it’s about time for a change of scenery on our seed lists.

How it works

Domain crawl

Aside from following the news on elections and Covid-19, we also operate 2 domain crawls a year, where basically all websites from the “.lu” top level domain are captured. We use the research from the event collections to expand the seed list for domain crawls and, therefore, also add another layer of coverage to those events. However, the captures of websites from the event collections remain very selective and are usually not revisited, once discussions around the event are over. This is why we plan to focus our efforts in the near future on building thematic collections. As a comparison:

Event collections Thematic collections

Temporary

Evolving
Multifaceted coverage of one topic or event Focus on one subject area

The idea is that event collections serve as a base to extract the subject areas for  thematic collections. In turn, the thematic collections will serve as a base to start event collections, and save time on research. In time, event collections will help with a more intense coverage for the subjects of thematic collections and the latter will capture information before and after the topic of an event collection. For example, the seed list from an election crawl can serve as a basis for the thematic collection “Politics & Society”. The continued coverage and expansion from this collection will serve as an improved basis for a seed list, once the next election campaign comes around. Moreover, both types of collections will help in broadening the scope of domain crawls and achieve better coverage of the Luxembourg web.

Collaboration with subject experts

Special Collections at webarchive.lu

During election crawls, it has always been important for us to invite the input from different stakeholders, to make sure that the seed list covers all important areas surrounding the topic. The same principle has to be applied to the thematic collections. No curator can become an expert in every field and our web archiving team will never be able to research and find all relevant websites in all domains and all languages from all corners of the Luxembourg web. Therefore, the curator’s job has to be focused on finding the right people, who know the web around their subject, experts in their field and representatives of their communities, who can help to build and expand seed lists over time. This means relying on internal and external subject experts, who are familiar with the principles of web archiving and incentivised to offer their help in contributing to the Luxembourg web archive.

While, technically, we haven’t tested the idea of this collaborative Lego-tower in reality, here are some of the challenges we would like to tackle this year:

  • The workflows and platform used to collect the experts’ contributions need to be as easy to use as possible. Our contributors should not have require hours of training and tutorials to get started and it should be intuitive enough to pick up working on a seed list, after not having looked at it for several months.

  • Subject experts should be able to contribute in the way that best fits their work rhythm: a quick and easy option to add single seeds spontaneously when coming across an interesting website, as well as a way to dive in deeper into research and add several seeds at a time.

  • We are going to ask for help, which means additional work for contributors inside and outside the library. This means that we need to keep the motivate the subject experts and convince them that a working and growing web archive represents a benefit for everybody and that their input is indispensable.

Selection criteria for special collections

Next steps

As a first step, we would like to set up thematic collections with BnL subject experts, to see what the collaborative platform should look like and what kind of work input can be expected from contributors in terms of initial training and regular participation. The second stage will be to involve contributors from other heritage institutions who already provided lists to our domain crawls in the past. After that, we count on involving representatives of professional associations, communities or other organisations interested in seeing their line of business represented in the web archive.

On an even larger scale, the Luxembourg Web Archive will be open to contributions from students and researchers, website owners, web content creators and archive users in general, which is already possible through the “Suggest a website” form on webarchive.lu. While we haven’t received as many submissions as we would like, there have been very valuable contributions, of websites that we would perhaps never have found otherwise. We also noticed that it helps to raise awareness through calls ofor participation in the media. For instance, we received very positive feedback for our Covid-19 collection. If we are able to create interest on a larger scale, we can get much more people involved and improve the services provided by the Luxembourg Web Archive.

Call for participation in the Covid-19 collection on RTL Radio

Save the date!

While we work on putting the pieces of this puzzle together, we are also moving closer and closer to the 2021 General Assembly and Web Archiving Conference. It’s been two years since the IIPC community was able to meet for a conference, and surely you are all as eager as we are, to catch up, to learn and to exchange ideas about problems and projects. So, if you haven’t done so already, please save the date for a virtual trip to Luxembourg from 14th -16th June.

IIPC Content Development Group’s activities 2019-2020

By Nicola Bingham, Lead Curator Web Archives, British Library and Co-Chair of the IIPC Content Development Working Group

Introduction

I was delighted to present an update on the Content Development Group’s (CDG) activities at the 2020 IIPC General Assembly (GA) on behalf of myself, Alex and the curators that have worked so hard on collaborative collections over the past year.

Socks, not contributing to Web Archiving

Although it was disappointing not to have been in Montreal for the GA and Web Archiving Conference (WAC), it is the case that there are many advantages in attending a conference remotely. Apart from cost and time savings, it meant that many more staff members from our organisations could attend. I liked the fact that I could see many “old” web archiving friends online and it did feel like the same friendly, enthusiastic, innovative environment that is normally fostered at IIPC events. I was also delighted to see some of the attendee’s pets on screen, although it did highlight that other people’s cats are generally much more affectionate than my own, who has, I have to say, contributed little to the field web archiving over the years, although he did show a mild interest in Warcat.

Several things become clear when tasked with pre-recording a presentation with a time limit of 2 to 3 minutes. Firstly, it is extremely difficult to fit everything you need to say into such a short space of time; secondly, what you do want to say must be tightly scripted – although this does have the advantage that there is no room for pauses or “errs” in a way that can sometimes pepper my in-person presentations. Thirdly, recording even a two-minute video calls for a surprising number of retakes, taking many hours for no apparent reason. Fourthly, naively explaining these facts to the Programme and Communications Officer leads quite seamlessly to the suggestion of writing a blog post in order that one can be more expansive on the points bulleted in the two-minute presentation….

CDG Collection Update

Since our last General Assembly in Zagreb, in June 2019, the CDG has continued working on several established, and two new collections:

  • The International Cooperation Organizations Collection was initiated in 2015 and is led by Alex Thurman of Columbia University Libraries. It previously consisted of all known active websites in the .int top-level domain (available only to organizations created by treaties), but was expanded to include a large group of similar organizations with .org domain hosts, and renamed Intergovernmental Organizations this year. This increased the collection from 163 to 403 intergovernmental organizations, all of which will continue to be crawled each year.
  • The National Olympic and Paralympic Committees, led by Helena Byrne of the British Library was initiated in 2016 and consists of websites of national Olympics and Paralympics committees and associations, as identified from the official listings of these groups found on the official sites http://www.olympic.org and http://www.paralympic.org.
  • Online News Around the World led by Sabine Schostag of the Royal Danish Library. This collection of seeds was first crawled in October 2018 to document a selection of online news from as many countries as possible. It was crawled again in November 2019. The collection was promoted at the Third RESAW Conference, “The web that was: archives, traces, reflections” in Amsterdam in June 2019 and at the IFLA News Media Conference at Universidad Nacional Autónoma de México, Mexico City in March 2020.
  • New in 2019, the CDG undertook a Climate Change Collection, led by Kees Teszelszky of the National Library of the Netherlands. The first crawl took place in June, with a final crawl shortly after the UN Climate summit in September 2019.
  • New in 2019, a collection on Artificial Intelligence was undertaken between May and December, led by Tiiu Daniel (National Library of Estonia), Liisi Esse (Stanford University Libraries) and Rashi Joshi (Library of Congress).

Coronavirus (Covid-19) Collection

The main collecting activity in 2020 has been around the Covid-19 Global pandemic. This has involved a huge effort by IIPC members with contributions from over 30 members as well as public nominations from over 100 individuals/institutions.

We have been very careful with scoping rules so that we are able to collect a diverse range of content within the data budget – and Archive-It generously increased the data limit for this collection to 5TB. Collecting will continue to run, budget permitting, while the event is of global significance.

Publicly available CDG collections can be viewed on the Archive-It website.https://archive-it.org/home/IIPC and an overview of the collection statistics can be seen below.

CDG Collection statistics. Figures correct as of 15th June 2020. Slide presented at IIPC GA 17th June 2020.

Researcher-use of Collections

The CDG has worked closely with the Research Working Group co-chairs to promote and facilitate use of the CDG collections which are now available through the Archives Unleashed Cloud thanks to the Archives Unleashed project. The collections have been analysed and there are a large amount of derivatives available to researchers at IIPC-led events and/or research projects. For more information about how to access these collections please refer to the guidelines.

Next Steps/Getting in touch

We would very much welcome new members to the CDG. We will be having an online meeting in the next couple of months which would be an excellent opportunity to find out more. In the meantime, any IIPC member is welcome to suggest and/or lead on possible 2021 collaborative collections. For more information please contact the co-chairs or the Programme and Communications Officer.

Nicola Bingham & Alex Thurman CDG co-chairs

The CDG Working Group at the 2019 IIPC General Assembly in Zagreb.

Luxembourg Web Archive – Coronavirus Response

By Ben Els, Digital Curator, The National Library of Luxembourg

The National Library of Luxembourg has been harvesting the Luxembourg web under the digital legal deposit since 2016. In addition to the large-scale domain crawls, the Luxembourg Web Archive also operates targeted crawls, aimed at specific subjects or events. During the past weeks and months, the global pandemic of the Coronavirus, has put society before unprecedented challenges. While large parts of our professional and social lives had to move even further online, the need to capture and document the implications of this crisis on the Internet, has seen enormous support in all domains of society. While it is safe to admit that web archiving is still a relatively unknown concept to most people in Luxembourg (probably also in other countries), it is also safe to say, that we have never seen a better case to illustrate the necessity of web archiving and ask for support in this overwhelming challenge.

webarchive.lu

Media and communities

At the National Library, we started our Coronavirus collection on March 16th, while there were 81 known cases in Luxembourg. While we have been harvesting websites in several event crawls for the past 3 years, it was clear from the start that the amount of information to be captured would surpass any other subject by a great deal. Therefore, we decided to ask for support from the Luxembourg news media, by asking them to send us lists of related news articles from their websites. This appeal to editors quickly evolved into a call for participation to the general public, asking all communities, associations, and civil interest groups to share their responses and online information about the crisis. Addressing the news media in the first place, gave us great support in spreading the word about the collection. Part of our approach to building an event collection, is to follow the news and take in information about new developments and publications of different organisations and persons of interest. As the flow and high-paced rhythm of new public information and support was vital to many communities, we also had to try and keep up with new websites, support groups and solidarity platforms being launched every day. However, many of these initiatives are not covered equally in the news or social media, a situation which is even more complicated through Luxembourg’s multilingual makeup. We learned about the challenges from the government and administrations, to convey important and urgent information in 4 or 5 languages at a time: Luxembourgish, French, German, English and Portuguese. The same goes for news and social media, and as a result, for the Luxembourg Web Archive. Therefore, we were grateful to receive contributions from organisations, which we would not have thought of including ourselves, and who were not talked about as much in the news.

© The Luxembourg Government

Effort and resources

While the need and support for web archiving exploded during March and April, it was also clear, that the standard resources allocated to the yearly operations of the web archive would not suffice in responding to the challenge in front of us. The National Library was able to increase our efforts, by securing additional funding, which allowed us to launch an impromptu domain crawl and to expand the data budget on Archive-It crawls. We are all aware of the uphill battle in communicating the benefits of archiving the web. There is a feeling that, while people generally agree on the necessity of preserving websites, in most cases there is little sense of urgency or immediate requirement – since after all, most everyday changes are perceived as corrections of mistakes, or improvements on previous versions. In my opinion, the case of Coronavirus related websites, made the idea of web archiving as a service and obligation to society much clearer and easier to convey.

© Ministry of Health

Private and public

The Web offers many spaces and facets for personal expression and communication. While social media have played a crucial part in helping people to deal with the crisis, web archives face some of their biggest challenges in harvesting and preserving social media. Alongside the technical difficulties and enormous related costs, there is the question of ethics in collecting content which is not 100% private, but also not 100% public. For instance, in Luxembourg, many support groups launched on Facebook, where people could ask their questions about the current situation and new developments in terms of what is

allowed, find help and comfort to their uncertainties. There are several active groups in every language, even some dedicated to districts of the city, with neighbours looking after each other. While it is important to try to capture all facets of an event (especially if this information is unique to the Internet) I am uncertain, whether it is ethical to capture the questions, comments and conversations of people in vulnerable situations. Even though there are sometimes thousands of members per group and pretty much everyone can join, they are not fully open to the public.

Collecting and sharing

covidmemory.lu

Besides the large-scale crawls and Archive-It collection, we also contributed part of our seed list to the IIPC’s collaborative Novel Coronavirus collection, led by the Content Development Working Group. Of course, the National Library did not limit its response to archiving websites. With our call for participation, we also received a variety of physical and digital documents: mainly from municipalities and public administrations who submitted numerous documents, which were issued to the public in relation the reorganisation of public services and the temporary restrictions on social life.

We also received some unexpected contributions, in the form of poems, essays and short diary entries written during confinement, describing and reflecting upon the current situation from a very personal angle. Likewise, a researcher shared his private bibliometric analysis of scientific literature about the Coronavirus. Furthermore, the University of Luxembourg’s Centre for Contemporary and Digital History has launched the sharing platform covidmemory.lu, enabling ordinary people living or working in Luxembourg to share their photos, videos, stories and interviews related to COVID-19.

Web Archiving Week 2021

Since the 2021 edition of the IIPC Web Archiving Conference will be part of the Web Archiving Week, in  partnership with the University of Luxembourg and the RESAW network, I am not going to spoil too much about the program by saying that we will continue exploring these shared efforts and responses during the week of June 14th – 18th 2021. We are looking forward to welcoming you all to Luxembourg!

Let’s time travel with the IIPC!

IIPC has been organising its annual meetings for over 15 years. The first full Steering Committee meeting and the meetings of working groups were held in Canberra in 2004. The most recent General Assembly (GA) and Web Archiving Conference (WAC) were held Zagreb in June 2019. What started as a small get-together of web archiving enthusiasts from a dozen national libraries and the Internet Archive, has gradually become an important fixture in the web archiving calendar. We have been very fortunate that our members have volunteered to host the events in Singapore, The Hague, Washington D.C., Ljubljana, Stanford, Reykjavík, London, Wellington, Zagreb and Ottawa. The GA also returned to Canberra in 2008.

 

Due to Covid-19, this year we will not meet in person but we can time travel! While preparing for the next annual event hosted by the National Library of Luxembourg (15-18 June 2021), we will be trawling through the history of the GA and the WAC. We will be collecting, publishing and archiving memories from past events in a variety of formats, ranging from tweets, blog posts to a GA and WAC digital repository and bibliography. All new and older posts will be available in the “GAWAC” archive.

This slideshow requires JavaScript.

We are starting from 2019, which was the first GA for Friedel Geeraert of KBR, The Royal Library of Belgium. This was also the first GA for the British Library web archivists Helena Byrne and Carlos Rarugal, the organisers of a workshop called “Reflecting on how we train new starters in web archiving”.

Abstracts from the 2019 presentations and slides are available on the conference website. You can also watch the keynote speeches and panel discussions on our YouTube Channel and browse through the photos on the IIPC Flickr. The 2019 GA and WAC were hosted by the National and University Library in Zagreb. The Croatian Web Archive (HAW), which last year celebrated its 15th anniversary, has launched its new interface earlier this year. You can browse the archive and the thematic collections at https://haw.nsk.hr/en.

Photo: Tibor God.

Discovering the web archiving community at the IIPC events in Zagreb

By Friedel Geeraert, Scientific Assistant Web Archiving, KBR – Royal Library of Belgium

Last year I had the privilege of participating in the IIPC General Assembly and Web Archiving Conference in Zagreb for the first time as the representative of KBR (the Belgian Royal Library), who was at that time the youngest IIPC member. Last year KBR was involved in a research project called PROMISE that studied the question of web archiving at the federal level in Belgium.

The General Assembly provided good insight into the working of IIPC as an organisation. It was very interesting to participate in the reflection about the future form of IIPC during the General Assembly. According to member institutions the top three priorities for the coming years should be: 1) community-led tools, 2) providing platforms for sharing knowledge and 3) networking and support for innovation in research on the archived web. Furthermore, the reports of the Treasurer and Porgoramme and Communications Officer indicated the different possibilities of engaging with the organisation and other IIPC members: TSS (Technical Speaker Series) and RSS (Research Speaker Series) Webinars, Online Hours, the different working groups (Content Development, Training Working Group, Preservation, Research Working Group), the Discretionary Funding Programme. I took part in the workshops of the Preservation, Training and Research Working Groups which allowed me to discover different initiatives launched within web archiving institutions all over the world.

The Web Archiving Conference brought a plethora of developments within web archiving to light. A lot of focus was on outreach and on how to promote web archives (via library labs for example). Another theme was researcher interaction with web archives and opening up access to complementary files such as crawl and access logs, derivative files or documentation about curatorial decisions and Heritrix settings. The use of machine learning on archived web material was another recurring theme. From a curatorial perspective trending collection themes are minorities, emerging formats such as interactive fiction or retrospective web archiving. It was also stressed that divergent opinions should feature in a web archive in order to avoid curatorial bias. Furthermore, even though I don’t have a technical background, it was fascinating to discover new developments such as size reduction of indexes, Browsertrix or automated quality assurance.

On top of all that rich information, the networking possibilities were fantastic. Within the PROMISE project, we did an extensive literature review concerning web archiving initiatives in Europe and Canada. It was a wonderful opportunity to meet some of the web archivists and researchers I admire in person. It is safe to say that I came back inspired and with a head full of ideas for the Belgian web archive. I’m already looking forward to the next edition.

This slideshow requires JavaScript.