Collaborative collecting at webarchive.lu

By Ben Els, Digital Curator at the National Library of Luxembourg & the Chair of the Organising Committee for the 2021 IIPC General Assembly and the Web Archiving Conference

Our previous blog post from the Luxembourg Web Archive focused on the typical steps that many web archiving initiatives take at the start of their program: to gain first experience with event-based crawls. Elections, natural disasters, and events of national importance are typical examples of event collections. These temporary projects have occupied our crawler for the past 3 years (and continue to do so for the Covid-19 collection), but we also feel that it’s about time for a change of scenery on our seed lists.

How it works

Domain crawl

Aside from following the news on elections and Covid-19, we also operate 2 domain crawls a year, where basically all websites from the “.lu” top level domain are captured. We use the research from the event collections to expand the seed list for domain crawls and, therefore, also add another layer of coverage to those events. However, the captures of websites from the event collections remain very selective and are usually not revisited, once discussions around the event are over. This is why we plan to focus our efforts in the near future on building thematic collections. As a comparison:

Event collections Thematic collections

Temporary

Evolving
Multifaceted coverage of one topic or event Focus on one subject area

The idea is that event collections serve as a base to extract the subject areas for  thematic collections. In turn, the thematic collections will serve as a base to start event collections, and save time on research. In time, event collections will help with a more intense coverage for the subjects of thematic collections and the latter will capture information before and after the topic of an event collection. For example, the seed list from an election crawl can serve as a basis for the thematic collection “Politics & Society”. The continued coverage and expansion from this collection will serve as an improved basis for a seed list, once the next election campaign comes around. Moreover, both types of collections will help in broadening the scope of domain crawls and achieve better coverage of the Luxembourg web.

Collaboration with subject experts

Special Collections at webarchive.lu

During election crawls, it has always been important for us to invite the input from different stakeholders, to make sure that the seed list covers all important areas surrounding the topic. The same principle has to be applied to the thematic collections. No curator can become an expert in every field and our web archiving team will never be able to research and find all relevant websites in all domains and all languages from all corners of the Luxembourg web. Therefore, the curator’s job has to be focused on finding the right people, who know the web around their subject, experts in their field and representatives of their communities, who can help to build and expand seed lists over time. This means relying on internal and external subject experts, who are familiar with the principles of web archiving and incentivised to offer their help in contributing to the Luxembourg web archive.

While, technically, we haven’t tested the idea of this collaborative Lego-tower in reality, here are some of the challenges we would like to tackle this year:

  • The workflows and platform used to collect the experts’ contributions need to be as easy to use as possible. Our contributors should not have require hours of training and tutorials to get started and it should be intuitive enough to pick up working on a seed list, after not having looked at it for several months.

  • Subject experts should be able to contribute in the way that best fits their work rhythm: a quick and easy option to add single seeds spontaneously when coming across an interesting website, as well as a way to dive in deeper into research and add several seeds at a time.

  • We are going to ask for help, which means additional work for contributors inside and outside the library. This means that we need to keep the motivate the subject experts and convince them that a working and growing web archive represents a benefit for everybody and that their input is indispensable.

Selection criteria for special collections

Next steps

As a first step, we would like to set up thematic collections with BnL subject experts, to see what the collaborative platform should look like and what kind of work input can be expected from contributors in terms of initial training and regular participation. The second stage will be to involve contributors from other heritage institutions who already provided lists to our domain crawls in the past. After that, we count on involving representatives of professional associations, communities or other organisations interested in seeing their line of business represented in the web archive.

On an even larger scale, the Luxembourg Web Archive will be open to contributions from students and researchers, website owners, web content creators and archive users in general, which is already possible through the “Suggest a website” form on webarchive.lu. While we haven’t received as many submissions as we would like, there have been very valuable contributions, of websites that we would perhaps never have found otherwise. We also noticed that it helps to raise awareness through calls ofor participation in the media. For instance, we received very positive feedback for our Covid-19 collection. If we are able to create interest on a larger scale, we can get much more people involved and improve the services provided by the Luxembourg Web Archive.

Call for participation in the Covid-19 collection on RTL Radio

Save the date!

While we work on putting the pieces of this puzzle together, we are also moving closer and closer to the 2021 General Assembly and Web Archiving Conference. It’s been two years since the IIPC community was able to meet for a conference, and surely you are all as eager as we are, to catch up, to learn and to exchange ideas about problems and projects. So, if you haven’t done so already, please save the date for a virtual trip to Luxembourg from 14th -16th June.

One thought on “Collaborative collecting at webarchive.lu

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s