Help Identify News Sites for the IIPC Online News Around the World Project!

By Sabine Schostag, Statsbiblioteket, Aarhus


iipc_onlinenewsThe IIPC’s Content Development Working Group, which is leading an effort to build collaborative, global, web archives on a variety of topics of interest to our members, is kicking off a new project that we are calling “Online News Around the World: A Snapshot in Time” Our goal is to document online news websites during one week of the year from ALL of the countries in the world.


You read that right – ONLINE NEWS FROM ALL OF THE COUNTRIES IN THE WORLD. We never said IIPC members were entirely sane, did we? We know this is a lofty goal, but we have a few reasons for doing this:

  • raise global awareness of the critical need for the web archiving;raise awareness of the importance of preserving born-digital news;
  • create a cohesive and comprehensive collection that will engage researchers;
  • archive content from countries and regions not currently being archived by IIPC members.


“Week 46” November 14, 2017 – November 20, 2017. Strange idea? Maybe not… Week 46 was appointed “ordinary news week” back in the end of the 1990s, by Anker Brink Lund, philosophical doctor and professor at Copenhagen Business School. He wrote in 2014:

The project News week has it origin in an old dream I tried to realize for many years. My burning desire was not only to be able to analyze spectacular news cases, but also to map the journalistic feeding chain in general by registering ALL news in a specific period. This kind of projects needed lots of money and many expert resources. In autumn 1999, I was given both, because the newly opened journalist studies in Odense needed trainee places for their students and because a parliamentary analysis group on the political power wanted to know more about the journalistic power in Denmark. Ever since, I have used week 46 for all kinds of media analyses, together with national and international research colleagues…1

The World Wide Web is more than twenty years old. The IIPC thinks it is time to include web news in this “extraordinary ordinary news week.”


We know we might not reach our lofty goal instantly, and that it will take some time to identify news sites from around the world. We plan to start gradually with a goal of news sites from IIPC member countries, at first.  But, here is where you fit in. The Content Development Group needs your help! Please nominate 10 news sites from your country to our nomination tool: Once we receive nominations, the Content Development Group will review the list to determine what set will be archived.

For more information about the project and to find out more about how to help, please contact the Project Team at or reply to this blog post with your questions!


1 Citation from: Anker Brink Lund: Analysis – An extraordinary ordinary news week. In:, 2014-11-14.

Is my web archive usable?

by Fernando Melo and Daniel Gomes, – the Portuguese Web Archive has preserved pages from the Web archived since 1996 and provided a public search service to access such information. Fulfilling goals relies on the continuous quality assurance of the provided service.

Figure 1: Desktop version of the ReplayBar User Interface of

A new version of was released on May 2016. This release was based on PyWb wayback software and included major design changes in the user interfaces for the replay of archived Web pages (ReplayBar), namely:

  • a drill-down navigation bar that enables fast browsing across several archived versions of a given web page;
  • a top banner that enables the users to take full-page screenshots, print and share archived Web pages.

We conducted usability testing sessions for quality assurance. The obtained results showed an effectiveness average rate of 91% and satisfaction of 80%. Usability testing showed that the new user interfaces were well-accepted by participants of the test which enabled us to identify and prioritise issues in order to plan future developments.


The methodology applied for usability testing was based on the observation and registration of user behaviours while performing designated tasks. A pre-test was first conducted to identify issues with task descriptions.

Sessions were conducted by a facilitator and notes registered by an observer. Each session began with an explanation of its purpose by the facilitator who also asked the user to perform the tasks in the most natural way possible while thinking-aloud.

The users were asked to perform the tasks using their own favourite device (e.g. tablet, smartphone, laptop). The objective was to maximise the detection of unforeseen compatibility issues and to prevent difficulties stemming from the unfamiliarity with the test device.

Figure 2: Tablet version of the ReplayBar User Interface of
Figure 3: Smartphone version of the ReplayBar User Interface of

The observer took notes of the difficulties and comments from the users. Rates were assigned to the effectiveness while performing each task:

  • 0: The user could not complete the task;
  • 1: The user completed the task with difficulties or help;
  • 2: The user completed the task without any difficulty.

At the end of each session, the users were asked to fill an anonymous satisfaction questionnaire.

Devices and Tasks

The users chose to use the following combinations of device, operating system and web browser:

  • Laptop w/ Windows 8 + Chrome Web Browser
  • IPhone 6 plus w/ iOS + Safari Web Browser
  • Tablet w/Android + Chrome Web Browser
  • Laptop w/ Windows 10 + Chrome Web Browser
  • Laptop w/ MacOS 10 + Chrome Web Browser

The users were asked to execute the following tasks:

  1. Find in the website of the school/university where you studied.
  2. Check if there is an archived version of the site when you attended the school/university. Write the date of this version.
  3. Write the date of the earliest archived version of the Website of your school/university.
  4. Share this version with your former colleagues.
  5. Print an archived Web page of your choice to show at home.
  6. You were asked to write an article about your old school/ university. Save the home page of that site as image to be included in your article.
  7. Use freely and search for something you find interesting.

Effectiveness and Satisfaction results

Table 1. Tasks and average effectiveness rate for their completion.

Our results show that the users were able to browse and perform the defined tasks with an average effectiveness rate of 91% (Table 1). Each usability session ended with the completion of a satisfaction questionnaire, where the users were asked to assess the degree of agreement with the following statements:

Table 2. Results of the Satisfaction Questionnaire.

The overall average satisfaction was 80%. The statements to which users most agreed were “The information provided for the system was easy to understand” (86%), and “This system has all the functions and capabilities I expect it to have” (86%). The statement “The organization of information on the system screens was clear” was the one with a lower agreement percentage (69%).

Lessons learned

The obtained results helped us identify the issues to be addressed in the future developments. For instance, regarding the new ReplayBar user interface:

  • the meaning of the magnifying glass icon seems confusing to the users;
  • the size and contrast of some texts needs to be increased;
  • a progress indicator after user actions becomes crucial in low-bandwidth conditions;
  • users should be able to select the file location of the generated screenshots;
  • the function “Search in other archives”, that links to the Memento Time Travel portal when the users reach a content that was not preserved by, needs to be emphasised.
Figure 4. links to the Memento Time Travel portal when users browse an unavailable content so that they can find it in other web archives.
Figure 4. links to the Memento Time Travel portal when users browse an unavailable content so that they can find it in other web archives.

It was very important to allow users to perform the usability tasks with their own devices because it enabled us to test a wider variety of Web Browsers and Operating Systems and detect unforeseen compatibility issues. We identified, for instance, that the new ReplayBar used a CSS element that prevented it from working on any iOS device, which was a major drawback that had remained undetected until usability testing. The definition of broad but realistic tasks for the usability sessions, even if we could not have anticipated the archived pages that the users would access, was fundamental to avoid influencing user answers, observe unexpected behaviours, as well as, understanding user’s  expectations and difficulties.

Once again, the obtained results showed that usability testing is a mandatory step in the quality assurance of a web archive. Not performing usability testing is an unnecessary risk that jeopardises the success of the significant investments required to build a web archive.

Find out more

Visit us at the Web Archiving Conference 2017 in Lisbon!

Wanted: New Leaders for OpenWayback

By Kristinn Sigurðsson, National and University Library of Iceland

The IIPC is looking for one or two people to take on a leadership role in the OpenWayback project.

The OpenWayback project is responsible not only for the widely used OpenWayback software, but also for the underlying webarchive-commons library. In addition the OpenWayback project has been working to define access related APIs.

The OpenWayback project thus plays an important role in the IIPCs efforts to foster the development and use of common tools and standards for web archives.

openwayback-bannerWhy now?

The OpenWayback project is at a cross roads. The IIPC first took on this project three years ago with the initial objective to make the software easier to install, run and manage. This included cleaning up the code and improving documentation.

Originally this work was done by volunteers in our community. About two years ago the IIPC decided to fund a developer to work on it. The initial funding was for 16 months. With this we were able to complete the task of stabilizing the software as evidenced by the release of OpenWayback 2.0.0 through 2.3.0.

We then embarked on a somewhat more ambitious task to improve the core of the software. A significant milestone that is now ending as a new ‘CDX server’ or resource resolver is being introduced. You can read more about that here.

This marks the end of the paid position (at least for time being). The original 16 months wound up being spread over somewhat longer time frame, but they are now exhausted. Currently, the National Library of Norway (who hosted the paid developer) is contributing, for free, the work to finalize the new resource resolver.

I’ve been guiding the project over the last year since the previous project leader moved on. While I was happy to assume this role to ensure that our funded developer had a functioning community, I felt like I was never able to give the project the kind of attention that is needed to grow it. Now it seems to be a good time for a change.

With the end of the paid position we are now at a point where there either needs to be a significant transformation of the project or it will likely die away, bit by bit, which is a shame bearing in mind the significance of the project to the community and the time already invested in it.

Who are we looking for?

While a technical background is certainly useful it is not a primary requirement for this role. As you may have surmised from the above, building up this community will definitely be a part of the job. Being a good communicator, manager and organizer may be far more important at this stage.

Ideally, I’d like to see two leads with complementary skill sets, technical and communications/management. Ultimately, the most important requirement is a willingness and ability to take on this challenge.

You’ll not be alone, aside from your prospective co-lead, there is an existing community to build on. Notably when it comes to the technical aspects of the project. You can get a feel for the community on the OpenWayback Google Group and the IIPC GitHub page.

It would be simplest if the new leads were drawn from IIPC member institutions. We may, however, be willing to consider a non-member, especially as a co-lead, if they are uniquely suited for the position.

If you would like to take up this challenge and help move this project forward, please get in touch. My email is kristinn (at) landsbokasafn (dot) is.

There is no deadline, as such, but ideally I’d like the new leads to be in place prior to our next General Assembly in Lisbon next March.

IIPC Hackathon at the British Library: Laying a New Foundation

By Tom Cramer, Stanford University

This past week, 22-23 September 2016, members of the IIPC gathered at the British Library for a hackathon focused on web crawling technologies and techniques. The event saw 14 technologists from 12 institutions near (the UK, Netherlands, France) and far (Denmark, Iceland, Estonia, the US and Australia). The event provided a rare opportunity for an intensive, two-day, uninterrupted deep dive into how institutions are capturing web content, and to explore opportunities for advancing the state of the art.

I was struck by the breadth and depth of topics. In particular…

  • Heritrix nuts and bolts. Everything from small tricks and known issues for optimizing captures with Heritrix 3, to how people were innovating around its edges, to the history of the crawler, to a wishlist for improving it (including better documentation).
  • Brozzler and browser-based capture. Noah Levitt from the Internet Archive, and the engineer behind Brozzler, gave a mini-workshop on the latest developments, and how to get it up and running. This was one of the biggest points of interest as institutions look to enhance their ability to capture dynamic content and social media. About ⅓ of the workshop attendees went home with fresh installs on their laptops. (Also note, per Noah, pull requests welcome!)
  • Technical training. Web archiving is a relatively esoteric domain without a huge community; how have institutions trained new staff or fractionally assigned staff to engaged effectively with web archiving systems? This appears to be a major, common need, and also one that is approachable. Watch this space for developments…
  • QA of web captures: as Andy Jackson of the British Library put it, how can we tip the scales of mostly manual QA with some automated processes, to mostly automated QA with some manual training and intervention?
  • An up-to-date registry of web archiving tools. The IIPC currently maintains a list of web archiving tools, but it’s a bit dated (as these sites tend to become). Just to get the list in a place where tool users and developers can update it, a working copy of this list is now in the IIPC Github organization. Importantly, the group decided that it might be just as valuable to create a list of dead or deprecated tools, as these can often be dead ends for new adopters. See (and contribute to)  Updates welcome!
  • System & storage architectures for web archiving. How institutions are storing, preserving and computing on the bits. There was a great diversity of approaches here, and this is likely good fodder for a future event and more structured knowledge sharing.

The biggest outcome of the event may have been the energy and inherent value in having engineers and technical program managers spending lightly structured face time exchanging information and collaborating. The event was a significant step forward in building awareness of approaches and people doing web archiving.

IIPC Hackathon, Day 1.

This validates one of the main focal points for the IIPC’s portfolio on Tools Development, which is to foster more grassroots exchange among web archiving practitioners.

The participants committed to keeping the dialogue going, and to expanding the number of participants within and beyond IIPC. Slack is emerging as one of the main channels for technical communication; if you’d like to join in, let us know. We also expect to run multiple, smaller face-to-face events in the next year: 3 in Europe and another 2-3 in North America with several delving into APIs, archiving time-based media, and access. (These are all in addition to the IIPC General Assembly and Web Archiving Conference in 27-30 March 2017, in Lisbon.) If you have an idea for a specific topic or would like to host an event, please let us know!

Many thanks to all the participants at the hackathon last week, and to the British Library (especially Andy Jackson and Olga Holownia) for hosting last week’s hackathon. It provided exactly the kind of forum needed by the web archiving community to share knowledge among practitioners and to advance the state of the art.

Web Archiving Rio 2016: The Story So Far

By Helena Byrne, Assistant Web Archivist, The British Library

The IIPC Content Development Group (CDG) has been busy archiving the trials and tribulations of the Rio 2016 Summer Olympic and Paralympic Games. The Olympics might be over but in just a few days the Paralympics will begin and fans will be glued to their screens again.

This project is collecting public platforms such as websites, articles, news reports, blogs and social media about Rio 2016. You can follow updates on this project on Twitter by using the collection hashtag #Rio2016WA. The CDG group has been more active on Twitter and recently hosted a Twitter chat on 10th August 2016 to give an insight on what’s involved in web archiving the Olympics. The chat was based on set questions published in an IIPC blog post with a Q&A session and some time for live nominations. This was an international chat; even though it was small it helped us to make connections with a wider audience. The chat was added to Storify as well as the final archived collection of the Games.

So far the Rio 2016 Collection has over 4,000 nominations from IIPC members and the general public. The nominations up to now are from seventy six countries across the world. However as you can see from the Google Map there are still many countries that have not been covered. Can you help fill the void?

The majority of the public nominations cover Ireland, the Pacific Islands & South Korea and are in a range of languages such as English, Korean, Dutch, Georgian & French to name but a few. Some countries on the map have only one site nominated while others have many, even if you see that there are nominations from your country the web pages you are looking at might not be in the collection. There is still time for you to get involved in web archiving the Olympics and Paralympics. The public nomination form will be open till 21st September 2016. If you would like to make a nomination you can follow these guidelines. This is your chance to be part of the Games!

On Your Marks, Get Set, Go!

By Helena Byrne, Assistant Web Archivist, The British Library

The Rio 2016 Olympic and Paralympic Games are nearly underway and for the next few weeks sports fans will be glued to the events. As with all major sporting events so much happens on and off the playing field.

When we look back at these events, what do we look at? Archives play an essential role in collecting these snapshots in our lives. As we live in a digital world web archives play a central role in this process. The IIPC Content Development Group curated three large Summer and Winter Olympics collections (2010, 2012 and 2014) and is now archiving the events both on and off the playing field in Rio.

Now it’s your opportunity to have your say about what goes into this collection. The IIPC CDG is calling on you to get involved through the public nomination form. As you can see from our map we still have large parts of the world that aren’t represented in the collection. Do you know of any Olympic or Paralympic websites from these countries?

If you want to find out more about what’s involved in documenting Rio 2016, why not join our Twitter chat and help us archive Rio 2016?

When: Wednesday 10th August at 3pm GMT time 
 At your desk
How: Using Twitter hashtag #Rio2016WA and our previous blog post
Audience: Librarians, Archivists, Sports Researchers and anyone with an interest in web archiving. 
 Nicola Bingham and Helena Byrne, British Library; Eilidh MacGlone, National Library of Scotland

Chat Programme

  1. Introductions
  2. Questions on selecting websites
  3. Instructions on how you can select sites
  4. Add web selections to the public nomination form
  5. Wrap up

Chat Questions

  1. What Olympic collections are available online or in libraries and museums?
    • Are they physical or digital collections?
    • Do you have a favourite go to collection that you like to use?
  2. What’s involved in selecting websites or web pages for the collection?
    • Sourcing, appraising, selecting
  3. What types of resources do researchers like to use most when researching sport?
    • If you could only choose one resource what would it be?
  4. Questions and answers from the audience about the Rio 2016 Collection.

Don’t forget to use the collection hashtag #Rio2016WA when answering the questions. So on your marks, get set, go!

A map of the nominations so far. There are still some parts of the world not covered in this collection. However, all of the National Olympic and Paralympic Committees from around the world are archived in a separate collection.

2016 Rio Games Collection – How to Get Involved!

By Helena Byrne, Assistant Web Archivist, The British Library

The International Internet Preservation Consortium (IIPC) would like your help to archive websites from around the world related to the Olympic and Paralympic Games.

The IIPC has members in 33 countries but there are over 200 countries competing in the games and we need your help ensure that these countries are represented in the collection.

IIPC World Map

What we want to collect:

Public platforms in various formats such as:

  • Websites
  • Articles
  • News Reports
  • Blogs
  • Facebook
  • Twitter

The subjects covered on these sites can vary from:

  • Sports Events
  • Athletes/Teams
  • Doping/Cheating and Corruption
  • Olympic/Paralympic Venues
  • Gender
  • Fandom
  • Environmental Issues
  • Zika Virus
  • General News/ Commentary
  • Computer Games (eGames)
  • Other

How to get involved:

Once you have selected the web pages you would like to see in the collection it only takes less than 5 minutes to fill in the submission form.