Once Upon a Time in the Web: A Conference Throughout 20 Years of Web Archiving

by Ariane Bouchard, The Bibliothèque nationale de France20ans_archives_bannieretwitter_1500x500

2016 was a year of many anniversaries for web archiving: 25 years of the World Wide Web, 20 years of web archiving by Internet Archive, and in France, the 10th anniversary of DADVSI law (which created legal deposit of the Internet), and the 5th anniversary of the decree implementing it. To celebrate these anniversaries, the French National Library (BnF) and the National Audiovisual Institute (INA), the two institutions in charge of legal deposit of the Internet in France, organized in November 2016 a conference open to all and bringing together researchers, librarians and web pioneers. The event, called “Once upon a time in the web, 20 years of web archiving“, had three main objectives: to provide an overview of web archiving in France, to give demonstrations of discovery tools for web archives, and to present researchers’ works based on corpora from these collections.

The day started with a demonstration of BnF and INA discovery tools, on which much work has recently been carried out. On the BnF side, the main evolution is a full text prototype searching the “Incunabula of the web” (web archives between 1996 and 2000), a tool developed in cooperation with research teams from France’s National Scientific Research Center (CNRS). INA showed a demonstration of its search interface and visualization and research tools. Speakers and the audience agreed that these kinds of tools, especially those enabling full text searches, are the key for a wider and better use of archives.

This idea was confirmed even more by presentations from researchers who have actually used the tools to explore web archives. In recent years, INA and BnF have developed partnerships with several research teams or institutions, and the conference was the opportunity to share methodologies and needs. Among the panellists were Valérie Beaudouin, Dana Diminescu for E diaspora, Valérie Schafer and Francesca Musiani for Web 90 and ASAP, and Sophie Gebeil for her work on memory of migrations on the web. These presentations were much appreciated by the librarians in charge of building collections, as a rewarding return on their day to day work.

Topics of the day also included harvest of complex objects (such as videos, books and newspapers), the notion of territory (which is a notion induced by the scope of French law on legal deposit, although it seems somewhat paradoxical when it comes to the Internet), and legal issues regarding digital heritage.

Participants rewound time throughout the proceedings, from the present day to the beginning of the web. The last part of the conference thus gave room to pioneers and precursors. Julien Masanès and Bruno Bachimont told a lively story of the implantation of web archiving at the BnF and INA in the early 2000’s. Loïc Damidaville from AFNIC (registry for the extension .fr) and several web producers went back to the early days of the web in the late 90’s. They all conveyed the feeling that web archiving was – and still is – a great adventure.

In the end, the event attracted 200 people, and more visitors on Twitter (#20ansDLweb). But it reached a much larger audience thanks to a series of radio broadcasts, articles in national newspapers  (for example, in Le Monde in October 2016) , and of course on the web. From the point of view of France, 2016 certainly was a breakthrough year for web archiving, as it gained higher visibility. We hope that 2017 will be another great year.

Video recordings and proceedings of the conference will soon be available online.

Call to Serve on the IIPC Steering Committee

IIPC is actively seeking member organisations interested in serving on the IIPC Steering Committee. The Steering Committee is the executive body of the IIPC and allows 15 member organisations to take a leadership role in the high-level strategic planning, development and management of programs, policy creation, overall administration, and contribution to IIPC Portfolios and other activities. The current leadership group is documented at: http://www.netpreserve.org/leadership

This year, five seats on the Steering Committee will become vacant. We strongly encourage any IIPC member interested in serving on the Steering Committee to nominate themselves for election.

Who can run for election?

Serving on the Steering Committee is open to any current IIPC member. There are no core or founding members that own their seat on the SC. The institutions that are hosting the ‘Officers’ (Programme Officer, Communications Officer and Treasurer) are members of the Steering Committee as long as they host those roles.

What is at stake?

Serving on the Steering Committee is a chance for motivated members to help guide the IIPC’s mission of improving the tools, standards and best practices of web archiving while promoting international collaboration and the broad access and use of web archives for research and cultural heritage. Steering Committee members are expected to take an active role in leadership, contribute to SC and/or Portfolio activities, and help guide and administer the organisation.

Steering committee members are elected for 3 years and meet twice a year in person, once during the General Assembly, once in September and two or more additional times by teleconference.

How to run for election?

All nominees, both new nominees and existing members whose term is expiring but are interested in continuing to serve, are asked to write a short statement (max 200 words) outlining their vision for how they would contribute to IIPC via serving on the Steering Committee. Statements can point to past contributions to the IIPC or the SC, relevant experience or expertise, new ideas for advancing the organisation, or any other relevant information.

All statements will be posted online and emailed to members prior to the election with ample time for review by all membership.


What happens next?

  • 1 December to 20 March: Members are invited to nominate themselves by sending an email to the IIPC Programme and Communications Officer.
  • 3 April to 7 April: Nominees statements are published on the Netpreserve Blog and Members mailing list. Nominees are encouraged to campaign through their own networks.
  • 10 April to 10 MayMembers are invited to vote online. An online voting tool will be used to conduct the vote. The PCO will monitor the vote, ensuring that each organisation votes only once for all nominated seats and that the vote is cast by the organisation’s official representative. People will be encouraged to cast their vote before, during, and after the GA.
  • 10 May: Voting ends.
  • 15 May: The results of the vote are announced officially on the Netpreserve blog and Members mailing list.
  • 1 June: end/start of SC members terms. The newly elected SC members start their term on the 1st of June and are invited to attend a first meeting (by teleconference) by the end of June. The next face to face SC meeting will take place in Ottawa in September 2017.

If you have any questions, feel free to contact the IIPC PCO.

New OpenWayback lead

By Lauren Ko, University of North Texas Libraries

In response to IIPC’s call, I have volunteered to take on a leadership role in the OpenWayback project. Having been involved with web archives since 2008 as a programmer at the University of North Texas Libraries, I expect my experience working with OpenWayback, Heritrix, and WARC files, as well as writing code to support my institution’s broader digital library initiatives, to aid me in this endeavor.openwayback-banner

Over the past few years, the web archiving community has seen much development in the area of access related projects such as pywb, Memento, ipwb, and OutbackCDX – to name a few. There is great value in a growing number of available tools written in different languages/running in different environments. In line with this, we would like to keep the OpenWayback project’s development moving forward while it remains of use. Further, we hope to facilitate development of access related standards and APIs, interoperability of components such as index servers, and compatibility of formats such as CDXJ.

Moving OpenWayback forward will take a community. With Kristinn Sigurðsson soon relinquishing his leadership position, we are seeking a co-leader for the OpenWayback project. We also continue to need people to contribute code, provide code review, and test deployments. I hope this community will continue not only to develop access tools, but also access to those tools, encouraging and supporting newcomers via mailing lists and Slack channels as they begin building and interacting with web archives.

If your institution uses OpenWayback, please consider:

If you are interested in taking a co-leadership role in this project or are otherwise interested in helping with OpenWayback and IIPC’s access related initiatives, even if you don’t know how that might be, I welcome you to contact me by the name lauren.ko via IIPC Slack or email me at lauren.ko@unt.edu.

Rio 2016 Round Up

By Helena Byrne, Assistant Web Archivist, The British Library

The IIPC Content Development Group (CDG) 2016 Summer Olympic and Paralympic Games collection is now live http://archive-it.org/collections/7235.

The collection period ran from June to October 2016, it covered events on and off the playing field. The CDG used a combination of collaborative tools during this project as well as input from the general public.
rio-globe

Collection Fast Facts:

Final Number of Nominations:

In total 4,817 seeds were nominated, 4,642 from CDG members and 176 from public nomination form.

Countries:

125 countries are covered in the collection but the number of nominations varies between the countries: it ranges from 1 to 5 seeds to a couple of hundreds. The top 5 countries covered were France (681), Brazil (553), Japan (447), the Great Britain (341) and Canada (327).

Languages:

34 different languages were recorded.

iipc-rio-2016-collection-languages

What’s Next?:

Quality Assurance:

Now that the collection phase of the project is over, it is hoped that we will be able to do some Quality Assurance (QA) on the archived nominations. Criteria on how to evaluate an archived website can be found here. There are two ways this will be done: the first is through the crawl reports generated by Archive-IT account while the second is through a visual inspection of the website. The second option can be done by anyone using the collection, whether they are IIPC members or individuals interested in the web archiving process.  As there are a large number of sites to look through this would require input from people outside the CDG.  Can you help us do QA on this collection?

Report an issue with the collection:

While using the collection if you would like to flag any issues with the content, you can fill in this Google Form:  https://goo.gl/forms/utvyE8FztZdjFSaB3

Guidelines:

The CDG will publish a ‘Best Practice for Developing Collaborative Collections’ on the IIPC website. This will not only form the guidelines for future CDG collections but will hopefully be of use for anyone working on a collaborative project.

Target Audience:

 This collection will be invaluable for web archives researchers in terms of data mining as well as researchers who focus on sports and Olympic events.

Thank you for contributing to this project, you can keep up to date with any further developments on this project through the collection hashtag #Rio2016WA.


Collection timelines and updates:

Help Identify News Sites for the IIPC Online News Around the World Project!

By Sabine Schostag, Statsbiblioteket, Aarhus

What?

iipc_onlinenewsThe IIPC’s Content Development Working Group, which is leading an effort to build collaborative, global, web archives on a variety of topics of interest to our members, is kicking off a new project that we are calling “Online News Around the World: A Snapshot in Time” Our goal is to document online news websites during one week of the year from ALL of the countries in the world.


Why?

You read that right – ONLINE NEWS FROM ALL OF THE COUNTRIES IN THE WORLD. We never said IIPC members were entirely sane, did we? We know this is a lofty goal, but we have a few reasons for doing this:

  • raise global awareness of the critical need for the web archiving;raise awareness of the importance of preserving born-digital news;
  • create a cohesive and comprehensive collection that will engage researchers;
  • archive content from countries and regions not currently being archived by IIPC members.

When?

“Week 46” November 14, 2017 – November 20, 2017. Strange idea? Maybe not… Week 46 was appointed “ordinary news week” back in the end of the 1990s, by Anker Brink Lund, philosophical doctor and professor at Copenhagen Business School. He wrote in 2014:

The project News week has it origin in an old dream I tried to realize for many years. My burning desire was not only to be able to analyze spectacular news cases, but also to map the journalistic feeding chain in general by registering ALL news in a specific period. This kind of projects needed lots of money and many expert resources. In autumn 1999, I was given both, because the newly opened journalist studies in Odense needed trainee places for their students and because a parliamentary analysis group on the political power wanted to know more about the journalistic power in Denmark. Ever since, I have used week 46 for all kinds of media analyses, together with national and international research colleagues…1

The World Wide Web is more than twenty years old. The IIPC thinks it is time to include web news in this “extraordinary ordinary news week.”

Who?

We know we might not reach our lofty goal instantly, and that it will take some time to identify news sites from around the world. We plan to start gradually with a goal of news sites from IIPC member countries, at first.  But, here is where you fit in. The Content Development Group needs your help! Please nominate 10 news sites from your country to our nomination tool: http://digital2.library.unt.edu/nomination/iipc-news/. Once we receive nominations, the Content Development Group will review the list to determine what set will be archived.

For more information about the project and to find out more about how to help, please contact the Project Team at online-news-project@iipc.simplelists.com or reply to this blog post with your questions!

References:

1 Citation from: Anker Brink Lund: Analysis – An extraordinary ordinary news week. In: Journalisten.dk, 2014-11-14.

Is my web archive usable?

by Fernando Melo and Daniel Gomes, Arquivo.pt – the Portuguese Web Archive

Arquivo.pt has preserved pages from the Web archived since 1996 and provided a public search service to access such information. Fulfilling Arquivo.pt goals relies on the continuous quality assurance of the provided service.

arquivo-pt-figure1
Figure 1: Desktop version of the ReplayBar User Interface of Arquivo.pt

A new version of Arquivo.pt was released on May 2016. This release was based on PyWb wayback software and included major design changes in the user interfaces for the replay of archived Web pages (ReplayBar), namely:

  • a drill-down navigation bar that enables fast browsing across several archived versions of a given web page;
  • a top banner that enables the users to take full-page screenshots, print and share archived Web pages.

We conducted usability testing sessions for quality assurance. The obtained results showed an effectiveness average rate of 91% and satisfaction of 80%. Usability testing showed that the new user interfaces were well-accepted by participants of the test which enabled us to identify and prioritise issues in order to plan future developments.

Methodology

The methodology applied for usability testing was based on the observation and registration of user behaviours while performing designated tasks. A pre-test was first conducted to identify issues with task descriptions.

Sessions were conducted by a facilitator and notes registered by an observer. Each session began with an explanation of its purpose by the facilitator who also asked the user to perform the tasks in the most natural way possible while thinking-aloud.

The users were asked to perform the tasks using their own favourite device (e.g. tablet, smartphone, laptop). The objective was to maximise the detection of unforeseen compatibility issues and to prevent difficulties stemming from the unfamiliarity with the test device.

arquivo-pt-figure2
Figure 2: Tablet version of the ReplayBar User Interface of Arquivo.pt
arquivo-pt-figure3
Figure 3: Smartphone version of the ReplayBar User Interface of Arquivo.pt

The observer took notes of the difficulties and comments from the users. Rates were assigned to the effectiveness while performing each task:

  • 0: The user could not complete the task;
  • 1: The user completed the task with difficulties or help;
  • 2: The user completed the task without any difficulty.

At the end of each session, the users were asked to fill an anonymous satisfaction questionnaire.

Devices and Tasks

The users chose to use the following combinations of device, operating system and web browser:

  • Laptop w/ Windows 8 + Chrome Web Browser
  • IPhone 6 plus w/ iOS + Safari Web Browser
  • Tablet w/Android + Chrome Web Browser
  • Laptop w/ Windows 10 + Chrome Web Browser
  • Laptop w/ MacOS 10 + Chrome Web Browser

The users were asked to execute the following tasks:

  1. Find in Arquivo.pt the website of the school/university where you studied.
  2. Check if there is an archived version of the site when you attended the school/university. Write the date of this version.
  3. Write the date of the earliest archived version of the Website of your school/university.
  4. Share this version with your former colleagues.
  5. Print an archived Web page of your choice to show at home.
  6. You were asked to write an article about your old school/ university. Save the home page of that site as image to be included in your article.
  7. Use Arquivo.pt freely and search for something you find interesting.

Effectiveness and Satisfaction results

arquivo-pt-table1
Table 1. Tasks and average effectiveness rate for their completion.

Our results show that the users were able to browse and perform the defined tasks with an average effectiveness rate of 91% (Table 1). Each usability session ended with the completion of a satisfaction questionnaire, where the users were asked to assess the degree of agreement with the following statements:

arquivo-pt-table2
Table 2. Results of the Satisfaction Questionnaire.

The overall average satisfaction was 80%. The statements to which users most agreed were “The information provided for the system was easy to understand” (86%), and “This system has all the functions and capabilities I expect it to have” (86%). The statement “The organization of information on the system screens was clear” was the one with a lower agreement percentage (69%).

Lessons learned

The obtained results helped us identify the issues to be addressed in the future developments. For instance, regarding the new ReplayBar user interface:

  • the meaning of the magnifying glass icon seems confusing to the users;
  • the size and contrast of some texts needs to be increased;
  • a progress indicator after user actions becomes crucial in low-bandwidth conditions;
  • users should be able to select the file location of the generated screenshots;
  • the function “Search in other archives”, that links to the Memento Time Travel portal when the users reach a content that was not preserved by Arquivo.pt, needs to be emphasised.
Figure 4. Arquivo.pt links to the Memento Time Travel portal when users browse an unavailable content so that they can find it in other web archives.
Figure 4. Arquivo.pt links to the Memento Time Travel portal when users browse an unavailable content so that they can find it in other web archives.

It was very important to allow users to perform the usability tasks with their own devices because it enabled us to test a wider variety of Web Browsers and Operating Systems and detect unforeseen compatibility issues. We identified, for instance, that the new ReplayBar used a CSS element that prevented it from working on any iOS device, which was a major drawback that had remained undetected until usability testing. The definition of broad but realistic tasks for the usability sessions, even if we could not have anticipated the archived pages that the users would access, was fundamental to avoid influencing user answers, observe unexpected behaviours, as well as, understanding user’s  expectations and difficulties.

Once again, the obtained results showed that usability testing is a mandatory step in the quality assurance of a web archive. Not performing usability testing is an unnecessary risk that jeopardises the success of the significant investments required to build a web archive.

Find out more

Visit us at the Web Archiving Conference 2017 in Lisbon!

Update:

The new version of Arquivo.pt named Hercules presents design improvements and addresses usability issues detected outlined above. For more information, see: http://sobre.arquivo.pt/news-1/news/arquivo.pt-new-version

Wanted: New Leaders for OpenWayback

By Kristinn Sigurðsson, National and University Library of Iceland

The IIPC is looking for one or two people to take on a leadership role in the OpenWayback project.

The OpenWayback project is responsible not only for the widely used OpenWayback software, but also for the underlying webarchive-commons library. In addition the OpenWayback project has been working to define access related APIs.

The OpenWayback project thus plays an important role in the IIPCs efforts to foster the development and use of common tools and standards for web archives.

openwayback-bannerWhy now?

The OpenWayback project is at a cross roads. The IIPC first took on this project three years ago with the initial objective to make the software easier to install, run and manage. This included cleaning up the code and improving documentation.

Originally this work was done by volunteers in our community. About two years ago the IIPC decided to fund a developer to work on it. The initial funding was for 16 months. With this we were able to complete the task of stabilizing the software as evidenced by the release of OpenWayback 2.0.0 through 2.3.0.

We then embarked on a somewhat more ambitious task to improve the core of the software. A significant milestone that is now ending as a new ‘CDX server’ or resource resolver is being introduced. You can read more about that here.

This marks the end of the paid position (at least for time being). The original 16 months wound up being spread over somewhat longer time frame, but they are now exhausted. Currently, the National Library of Norway (who hosted the paid developer) is contributing, for free, the work to finalize the new resource resolver.

I’ve been guiding the project over the last year since the previous project leader moved on. While I was happy to assume this role to ensure that our funded developer had a functioning community, I felt like I was never able to give the project the kind of attention that is needed to grow it. Now it seems to be a good time for a change.

With the end of the paid position we are now at a point where there either needs to be a significant transformation of the project or it will likely die away, bit by bit, which is a shame bearing in mind the significance of the project to the community and the time already invested in it.

Who are we looking for?

While a technical background is certainly useful it is not a primary requirement for this role. As you may have surmised from the above, building up this community will definitely be a part of the job. Being a good communicator, manager and organizer may be far more important at this stage.

Ideally, I’d like to see two leads with complementary skill sets, technical and communications/management. Ultimately, the most important requirement is a willingness and ability to take on this challenge.

You’ll not be alone, aside from your prospective co-lead, there is an existing community to build on. Notably when it comes to the technical aspects of the project. You can get a feel for the community on the OpenWayback Google Group and the IIPC GitHub page.

It would be simplest if the new leads were drawn from IIPC member institutions. We may, however, be willing to consider a non-member, especially as a co-lead, if they are uniquely suited for the position.

If you would like to take up this challenge and help move this project forward, please get in touch. My email is kristinn (at) landsbokasafn (dot) is.

There is no deadline, as such, but ideally I’d like the new leads to be in place prior to our next General Assembly in Lisbon next March.