We want YOUR ideas for the IIPC General Assembly 2016

NatLibIcelandYou will be pleased to hear that preparations for the IIPC General Assembly 2016 in Reykjavik, Iceland (11-15 April) are under way and we are aiming to make it the best one yet.

The program team have been hard at work looking at potential themes, topics and areas for discussion and debate. We would, however, love to have your input into this too!

So far, we’ve outlined the following areas:

  • Nuts and bolts of web archiving (management, metrics, organisation, programs)
  • De-duplication 
  • Researcher use cases (of web archives)
  • Big Data usage and potential
  • Web Archiving policies and frameworks / Preservation policies, Collection policies 
  • API’s
  • Web Archiving Tool development 
  • Legal deposit, copyright, data protection (EU wide perspective?)

help_wantedWhat have we missed, what should we focus on, what would YOU like to see and hear about?

Please use the comments below and tell us what you would like from the conference? This will help frame the call for papers due to go out at the end of October.

Thank you.

Jason Webber, IIPC Program and Communications Officer

Open letter by IIPC Chair

Greetings IIPC Memebers,

I hope that your summer is going very well and that you are all able to take some time off to recharge and spend time with family and friends.  It is hard to believe that more than 3 months have passed since many of us were together at Standford University in Palo Alto for our 2015 General Assembly (GA)!

I want to take this opportunity to  once again say how impressed I was at the quality of the event.  Everything from the organization of the entire event to the excellent interactions that our members engaged in brought significant value to the week.

I want to focus in on the Member’s Day that we had at the Internet Archive offices.  At one point in the day, you were asked to break off into groups to discuss some of the important issues and challenges facing the IIPC in the near future.  The Steering Committee met on the Saturday following the GA to discuss how we can better serve you – our members – and to ensure that we focus our limited resources what brings the greatest value to the global Web Archiving community.  I want to assure you that YOUR feedback was taken very seriously and thanks to the leadership of Birgit Nordsmark Henriksen (Netarchive.dk) and Barbara Sierman (National Library of the Netherlands) the Steering Committee was able to distill your comments and input into 4 manageable work packages:

  1. Researcher Involvement
  2. Tools
  3. Connectedness
  4. Practicalities

Work on each of these elements has begun (thanks to dedicated teams looking at each individual area) and each group is coming prepared to our upcoming in-person Steering Committee meeting in September.  I will update you right after that meeting to let you know what you can expect from the IIPC in the coming year(s).

What I can tell you is that you can count on the IIPC continuing on being a robust and vibrant community and that your contributions will become even more important as we move forward.  Your Steering Committee remains commited to ensuring the value of Your membership to the Consortium.

I welcome any comments or questions at paul.wagner@bac-lac.gc.ca

Stay tuned for more updates in September.

PaulWagnerPaul N. Wagner, Chair, IIPC

Directeur général principal et DPI, Direction générale d’innovation et du Dirigeant principal de l’information – Senior Director General & CIO, Innovation and Chief Information Officer Branch

Bibliothèque et Archives Canada / Gouvernement du Canada – Library and Archives Canada / Government of Canada

What do the New York Times, Organizational Change, and Web Archiving all have in common?

MatthewWeberThe short answer is Matthew Weber. Matthew is an Assistant Professor at Rutgers in the School of Communication and Information. His research focus is on organizational change; in particular he’s been looking at how traditional organizations such as companies in the newspaper business have responded to major technological disruption such as the Internet, or mobile phone applications.

In order to study this type of phenomenon, you need web archives. Unfortunately, however, using web archives as a source for research can be challenging. This is where high performance computing (HPC) and big data come into the picture.

RutgersHPC
https://oirt.rutgers.edu/research-computing/hpc-resources/

Luckily for Matthew, at Rutgers they have HPC and lots of it. He’s working with powerful computer clusters built on complex java script and Hadoop code to crack into Internet Archive (IA) data. Matthew first started working with the IA in 2008 through a summer research institute at Oxford University. More recently, Matthew, working with colleagues at the Internet Archive and Northeastern University, received funding from the National Science Foundation to build tools that enable research access to Internet Archive data.When Matthew says he works with big data, he means really big data, like 80 terabytes big. Matthew works in close partnership with PhD students in the computer science department who maintain the backend that allows him to run complex queries. He is also training PhD students in Communication and other social science disciplines to work with Rutgers HPC system. In addition, Matthew has taught himself basic Pig, to be more exact Pig Latin, a programming language for running queries on data stored in Hadoop.

Intimidated yet? Matthew says don’t be. A researcher can learn some basic tech skills and do quite a bit on his or her own. In fact, Matthew would argue that researchers must learn these skills because we are a long way off from point-and-click systems where you can find exactly the data you want. But there is help out there.

For example, IA’s Senior Data Engineer, Vinay Goel, provided the materials from a recent workshop to walk you through setting up and doing your own data analysis. Also, Professors Ian Milligan and Jimmy Lin from Waterloo University have pulled together some useful code and commentary that is relatively easy to follow. Finally, a good basic starting point is Code Academy:

Challenges Abound

Even though Matthew has access to HPC and is handy with basic Pig, there are still plenty of challenges.

Metadata

One major challenge is metadata; mainly, there isn’t enough of it. In order to draw valid conclusions from data, researchers need a wealth of contextual data, such as the scope of the crawl, how often it was run, why those sites where chosen and not others, etc. They also need the metadata to be complete and consistent across all of the collections they’re analyzing.

As a researcher conducting quantitative analysis, Matthew has to make sure he’s accounting for any and all statistical errors that might creep into the data. In his recent research, for example, he was seeing consistent error patterns in hyperlinks within the network of media websites. He now has to account for this statistical error in his analysis.

To begin to tackle this problem, Matthew is working with researchers and web curators from a group of institutions, including Columbia University Libraries & Information Service’s Web Resources Collection Program, California Digital Library, International Internet Preservation Consortium (IIPC), and Waterloo University to create a survey to learn from researchers, across a broad spectrum of disciplines, what are the essential metadata elements that they need. Matthew intends to share the results of this survey broadly with the web archiving community.

The Holes

Related to the metadata issues is the need for better documentation for missing data.

Matthew would love to have complete archives (along with complete descriptions). He recognizes, however, that there are holes in the data, just as there are with print archives. The difference is that holes in a print archive are easier to know and define than the holes for web archive data, where you need to be able to infer the holes.

The Issue of Size

Matthew explained that for a recent study of news media between 1996 – 2000, you start with transferring the data – and one year of data from Internet Archive took three days to transfer. You then need another two days to process and run the code. That’s a five-day investment just to get data for a single year. And then you discover that you need another data point, so it starts all over again.

To help address this issue at Rutgers and to provide training datasets to help graduate students get started, they are creating and sharing derivative datasets. They have taken large web archive datasets, extracted out small subsets (e.g., U.S. Senate data from the last five sessions), processed them, and produced smaller datasets that others can easily export to do their own analysis. This is essentially a public repository of data for reuse!

A Cool Space to Be In

As tools and collections develop, more and more researchers are starting to realize that web archives are fertile ground for research. Even though challenges remain, there’s clearly a shift toward more research based on web archives.

As Matthew put it, “Eight years ago when I started nobody cared… and now so many scholars are looking to ask critical questions about the way the web permeates our day-to-day lives… people are realizing that web archives are a key way to get at those questions. As a researcher, it’s a cool space to be in right now.”

RosalieLack

By Rosalie Lack, Product Manager, California Digital Library

This blog post is the first in an upcoming series of interviews with researchers to learn about their research using web archives, and the challenges and opportunities.

So You Want to Get Started in Web Archiving?

web3_0The web archiving community is a great one, but it can sometimes be a bit confusing to enter. Unlike communities such as the Digital Humanities, which has developed aggregation services like DH Now, the web archiving community is a bit more dispersed. But fear not, there are a few places to visit to get a quick sense of what’s going on.

Social Media

twitter-logo_1A substantial amount of web archiving scholarship happens on-line. I use Twitter (I’m at @ianmilligan1), for example, as a key way to share research findings and ideas that I have as my project comes together. I usually try to hashtag them with: #webarchiving. This means that all tweets that people use “#webarchiving” with will show up in that specific timeline. For best results, linkedInusing a Twitter client like TweetdeckTweetbot, or Echofon can help you keep aprised of things. There may be Facebook groups – I actually don’t use Facebook (!) so I can’t provide much guidance there. On LinkedIn there are a few relevant groups: IIPC, Web ArchivingPortuguese Web Archive

Blogs

I’m wary of listing blogs, because I will almost certainly leave some out. Please accept my apologies in advance and add your name in the comments below! But a few are on my recurring must-visit list (in addition to this one, of course!):

  • Web Archiving Roundtable: Every week, they have a “Weekly web archiving roundup.” I don’t always have time to keep completely caught up, but I visit roughly weekly and once in a while make sure to download all the linked resources. Being included here is an honour.
  • The UK Web Archive Blog: This blog is a must-have on my RSS feed, and it keeps me posted on what the UK team is doing with their web archive. They do great things, from inspiring outreach, to tools development (i.e. Shine), to researcher reflections. A lively cast of guest bloggers and regulars.
  • Web Science and Digital Libraries Research Group: If you use web archiving research tools, chances are you’ve used some stuff from the WebSciDL group! This fantastic blog has a lively group of contributors, showcasing conference reports, research findings, and beyond. Another must visit.
  • Web Archives for Historians: This blog, written by Peter Webster and myself, aims to bring together scholarship on how historians can use web archives. We have guest posts as well as cross-posts from our own sites.
  • Peter Webster’s Blog: Peter also has his own blog, which covers a diverse range of topics including web archives.
  • Ian Milligan’s Blog: It feels weird including my own blog here, but what the heck. I provide lots of technical background to my own investigations into web archives.
  • The Internet Archive Blog: Almost doesn’t need any more information! It’s actually quite a diverse blog, but a go-to place to find out about cool new collections (the million album covers for example) or datasets that are available.
  • The Signal: Digital Preservation Blog: A diverse blog that occasionally covers web archiving (you can actually find the subcategory here). Well worth reading – and citing, for that matter!
  • Kris’s Blog: Kristinn Sigurðsson runs a great technical blog here, very thought provoking and important for both those who create web archives as well as those who use them.
  • DSHR’s Blog: David Rosenthal’s blog on digital preservation has quite a bit about web archiving, and is always provocative and mind expanding.
  • Andy Jackson’s blog  – Web Archiving Technical Lead at the British Library
  • BUDDAH project – Big UK Domain Data for the Arts and Humanities Research Project
  • Dépôt légal web BnF
  • Stanford University Digital Library blog
  • Internet Memory Foundation blog
  • Toke Eskildsen blog – IT developer at the National Library of Denmark.

Again, I am sure that I have missed some blogs so please accept my sincerest apologies.

1354116111_webIn-Person Events

The best place to learn is in-person events, of course, which are often announced at places like this blog or in many of the above mediums! I hope that the IIPC blog can become a hub for these sorts of things.

Conclusions

Imilligan_-_picture_0 hope this is helpful for people that are starting out in this wonderful field. I’ve just provided a small slice: I hope that in the comments below people can give other suggestions which can help us all out!

By Ian Milligan (University of Waterloo)

LANL’s Time Travel Portal, Part 2

Architecturally, the Time Travel portal operates in a manner similar to a distributed search. Hence, it faces challenges related to query routing, response time optimization, and response freshness. The new infrastructure includes some rule-based mechanisms for intelligent routing but a thorough solution is being investigated in the IIPC-funded Web Archive Profiling project. A background cache continuously fetches TimeMap information from distributed archives, both natively or by-proxy compliant with the Memento protocol. Its collection consists of a seed list of popular URIs augmented with URIs requested by Memento clients. Whenever possible, responses are delivered from a front-end cache that remains in sync with the background cache using the ResourceSync protocol. If a request can not be delivered from cache, because cached content is unavailable or stale, realtime TimeGate requests are sent to Memento-compliant archives only. This setup achieves a satisfactory balance between response times, response completeness, and response freshness. If needed, the front-end cache can be bypassed and a realtime query can explicitly be initiated using the regular browser refresh approach, e.g. Shift-Reload in Chrome.

The Time Travel logo that can be used to advertise the portal.
The Time Travel logo that can be used to advertise the portal.

The development of the Time Travel portal was also strongly motivated by the desire to lower the barrier for developing Memento related functionality, especially at the browser side. Memento protocol information is – appropriately – communicated in HTTP headers. However, browser-side scripts typically do not have header access. Hence, we wanted to bring Memento capabilities within the realm of browser-side development. To that end, we introduced several RESTful APIs:

We are thrilled by the continuous growth in the usage of these APIs and would be interested to learn which kind of applications people out there are building on top of our infrastructure. We know that the new version of the Mink browser extension uses the new APIs. Also, the Time Travel’s Reconstruct service, based on pywb, leverages our own APIs. Memento for Chrome now obtains its list of archives from the Archive Registry. Also, the Robust Links approach to combat reference rot is based on API calls, but that will be the subject of another blog post.

IIPC members that operate public web archives that are not yet Memento compliant are reminded Open Wayback and pywb natively support Memento. From the perspective of the Time Travel portal, compliance means that we don’t have to operate a Memento proxy, that archive holdings can be included in realtime queries, and that both Original URIs and Memento URIs can be used to Find/Reconstruct. From a broader perspective, it means that the archive becomes a building block in a global, interoperable infrastructure that provides a time dimension to the web.

By Herbert Van de SompelDigital Library Researcher at Los Alamos National Laboratory

LANL’s Time Travel Portal, Part 1

Early February 2015, we launched the Time Travel portal, which provides cross-system discovery of Mementos.

The design and development of the Time Travel portal was a significant investment and took about a year from conception to release. It involved work directly related to the portal itself, but also a fundamental redesign of the Memento Aggregator, the introduction of several RESTful APIs, the transfer of the Memento infrastructure from LANL’s network to the Amazon cloud, and operating the new environment as an official service of the LANL Research Library.

The team that designed and implemented the Time Travel portal, from left to right: Luydmila Balakireva, Harihar Shankar, Martin Klein, Ilya Kremer, James Powell, and Herbert Van de Sompel
The team that designed and implemented the Time Travel portal, from left to right: Luydmila Balakireva, Harihar Shankar, Martin Klein, Ilya Kremer, James Powell, and Herbert Van de Sompel

A major motivation for the development of the new portal was to lower the barrier for experiencing Memento’s web time travel. Our flagship Memento for Chrome extension remains the optimal way to experience cross-system time travel. But, we wanted some of the power of Memento to be accessible without the need for an extension.

The Time Travel portal has a basic interface that allows entering a URI and a datetime. It offers a Find and a Reconstruct service:

  • The Find service looks for the Mementos in systems covered by the Memento Aggregator. For each archive that holds Mementos for the requested URI, the Memento that is temporally closest to the submitted date-time is listed, with a clear indication of the archive’s name. Results are ordered by temporal proximity to the requested date-time. For each archive, the first/last/previous/next Memento are also shown when that information is available. For all listed Mementos, a link leads straight into the holding archive. A Find URI can also be constructed. Its syntax follows the convention introduced by Wayback software, e.g. http://timetravel.mementoweb.org/list/20081128230827/http://apple.com.
  • The Reconstruct service reassembles a page using the best Mementos from various Memento-compliant archives. Hereby, “best” means temporally closest to the requested date-time. Hence, in a Reconstruct result page, the archived HTML, images, style sheets, JavaScript, etc. can originate from different archives. Many times, the assembled pages look more complete and the temporal spread of components is smaller, when compared with corresponding pages in distinct archives. As such, the Reconstruct service provides a nice illustration of the cross-archive interoperability introduced by the Memento protocol. A Reconstruct URI is available using the same Wayback URI convention, e.g. http://timetravel.mementoweb.org/reconstruct/20081128230827/http://apple.com.

While the Time Travel portal has been received enthusiastically, usage remains modest. Since its launch, we have seen about 4000 unique visitors, 7000 visits, per month. We have capacity for much more and would appreciate some promotion of our service by IIPC members. Also, we are very open to suggestions about additional portal functionality. For example, we have reached out to IIPC members that operate dark archives because we are interested in including their holding information in Time Travel responses, in order to increase response completeness and to make the existence of these archives more visible. As a first step in that direction, we have proposed Memento-based access to dark archive holdings information as a new functionality for Open Wayback.

By Herbert Van de SompelDigital Library Researcher at Los Alamos National Laboratory

Non-print Legal Deposit Law approved in Spain

By Mar Pérez Morillo
Jefe del Área de Gestión del Depósito de Publicaciones en Línea, Biblioteca Nacional de España

Last Friday the Spanish Council of Ministers approved the royal decree to regulate the legal deposit of online publications.

In the Legal deposit law of 2011 the online documents were considered objects of legal deposit for the first time in Spain.

The variety and complexity of this kind of publications led to the writing of a legal text (a royal decree) that developed the law and regulated the procedures and details to manage their legal deposit.

In the current technological environment, being the World Wide Web the main way for the dissemination of information, national libraries and archives along with university libraries and research institutions all over the world have been preserving for years the huge documentary heritage that is in internet. The legal deposit has been the instrument used along the centuries to build this documentary heritage on physical formats. Since years, many countries have legislated on the legal deposit of online publications, considering them part of this heritage to be preserved.

Given their special characteristics, the huge amount of them and thus the inability of exhaustiveness when capturing, storing and preserving them, the royal decree just approved in Spain introduces some important differences with the print legal deposit:

  • The publishers are not the ones to deposit the publications but the deposit libraries are the ones to demand from publishers the publications to be deposited.
  • No legal deposit number will be assigned to online publications.
  • The main way to deposit is the automated crawl of the web.
  • When the information is not publicly available online, but is part of a database or is protected by user and password, the curator centres –deposit libraries- (national and regional libraries with competence on legal deposit) will request publishers to deliver the publications.

In advance, the National Library of Spain has been crawling and archiving the Spanish web from 2009 to 2013 thanks to a contract with Internet Archive. The results were eight .es domain crawls and two selective crawls on Humanities and General Elections in 2011. In 2014, the Library adopted and installed NetarchiveSuite as its web archiving tool, and since then several selective crawls have been run on historical and cultural events in Spain, like the death of the President Suárez, the abdication of the King Juan Carlos I, the proclamation of Felipe VI, the European Elections in 2014 and the regional and local elections in May 2015, among others.

Although this was possible under the umbrella of the previous legal deposit law (1957), the royal decree now approved specifically enhances the regional deposit libraries and the National Library of Spain to crawl the web and to request every online publication considered part of the Spanish documentary heritage, to fulfill their mission of preserving it for future generations.
This is the end of a long and winding road, since the first version of the royal decree was drafted in 2012. Since then, many governmental institutions, publishers, individuals and all the sectors involved have sent their comments and allegations to the text.

This would not be a reality today without the support of all of them, but specially the public entity Red.es and the Secretary of State for Telecommunications in Spain, and the IIPC and the NAS community.

This is also the beginning of a long road (hopefully not winding). The success of our preserving mandate relies greatly on the collaboration between libraries and all the stakeholders.

BNE blog post

Internet Memory Research launches MemoryBot

IMRInternet Memory Research is pleased to announce that our new crawler, MemoryBot, has now gained full maturity.

With it, we completed a first map of the Web and we would like to share some early results on this experiment.

With only few small servers and 4 weeks time, we were able to crawl over 2+ billions resources with the objective to discover as many domains as possible. Overall, over 60+ millions of domains have been discovered, which represent about half of active domains in the world (the rest is mostly composed of parking sites and other types of empty domains).

In addition, we were able to process several types of analysis on this material thanks to the current Hadoop and Flink based architecture of our archive.
Among other things, we used machine learning to classify domains by type or genre (News, Forums, Blogs, E-commerce, etc.).

The other good news is that, thanks to many improvements in both the overall efficiency and stability, the cost of such crawls has been divided by two. Accompanied by the fall of storage costs, global crawls are becoming much more affordable and we hope it will benefit to more and more institutions.

More details will be published later on this, but we wanted to share this early update with the web archiving community.

By Chloé Martin (COO) of Internet Memory Research

10 years anniversary of the Netarchive (Netarkivet), the Danish national web archive

The Royal Library in Copenhagen and the State and University Library in Aarhus are happy to announce the 10 years anniversary of the Netarchive (Netarkivet), the Danish national web archive.

netarkivet

In July 2005, a new legal deposit law came into force: materials “published in electronic communication networks” became part of the legal deposit, that is to say, collecting and preserving “the Danish part of the Internet” now was issued by law. In the same year, the Netarchive joined the IIPC.

In the early years of the Netarchive, we focused on collection building and strategies: how to manage 4 broad crawls a year and choosing about 100 sites to be harvested selectively.  At the end of 2005, we finished our first broad crawl – it took almost a year. In 2007, our first systematic set of selective crawls was in place, we had a first dialogue with Facebook about harvesting Danish open profiles, we released NetarchiveSuite as an open source web curator tool and we gave access to the archived material to the first researchers.

In 2008, we started harvesting e-books, and the French National Library  and the Austrian National Library joined the NetarchiveSuite development project. In 2009, the first Ph.D. student graduated with a project based on the Netarchive. In 2010, we participated in the first IIPC collaborative collection (Winter Olympic Games). In 2011, we established access through the Wayback Machine and started a special collection on online games.

In 2012, we fulfilled our objective of carrying out four broad crawls a year and began with a special collection on YouTube videos. In 2013, we established access on the premises for eligible master students in their final year and we developed a solution, which makes selected electronic publications from ministries and official agencies accessible to the public via persistent links from The Administrative Library’s catalogue. In 2014, we started indexing for full text search in the whole archive and performed our largest event harvest ever of the European Song Contest, hosted in Denmark.

Erland Kolding Nielsen (director of the Royal Library), cutting the ribbon to the full text search
Erland Kolding Nielsen (director of the Royal Library), cutting the ribbon to the full text search

The birthday gift to our users and to ourselves is the full text searchable archive!

Netarkivet har fejret 10 års fødselsdag

Thank you for cooperation and feedback during all these years.

On behalf of the Netarchive Team

Sabine Schostag, Web curator, NETARCHIVE, STATE AND UNIVERSITY LIBRARY

Results of the Web Archiving API Survey of IIPC Members

If you attended the recent GA, or read some of the many blog posts about it, you probably heard about the potential benefits of standardized web archiving APIs. This was a common theme that came up in multiple presentations and informal discussions. During a conversation over lunch mid-week one person suggested that the IIPC form a new working group to focus on web archiving APIs. Clearly some institutions were interested in this, but how many? And are they interested enough to participate in a new working group? A group of us at Harvard decided to find out. We developed a short survey and advertised it on the IIPC mailing list.

The survey was open from May 14 through June 1 and was filled out 18 times, by 17 different institutions from 8 different countries.

Country Institutions
Czech Republic National Library of the Czech Republic
Denmark Netarkivet.dk
France Bibliothèque nationale de France
Iceland National and University Library of Iceland
New Zealand National Library of New Zealand
Spain National Library of Spain
United Kingdom The British Library, The National Archives
United States Stanford University Libraries, Old Dominion University, Internet Archive, LANL, California Digital Library, Harvard Library, Library of Congress, UCLA, University of North Texas

Table 1: The institutions that responded to the survey

The survey asked “Is the topic of web archiving APIs of interest to your institution?” The answer was overwhelmingly “Yes”. All 17 institutions are interested in web archiving APIs. Personally this was the first unanimous survey question I have ever seen.

api1Figure 1: A rare unanimous response

When asked “Why are web archiving APIs of interest to your institution?” the responses (see Figure 2) were varied but had common themes. Many of the reasons were from the perspective of an institution providing or maintaining web archiving programs or infrastructure, for example:

  • “The sustainability of our program depends on the web archiving community as a whole better aligning itself to collaboratively maintain and augment a core set of interoperable systems…”
  • “…appreciate how this would reduce our technical spend in the long term…”
  • “… APIs should ease the maintenance and evolution of the complex set of tools we are using to complete the document cycle: selection, collect, access and preservation.”

Another common response was from the perspective of providing a better service for researchers, for example:

  • “…a common/standard API would make it easier for researchers to work with multiple web archives with standard methodologies.”
  • “To help researchers explore our collection, including within our catalogue system, to link with other web archive collections and potentially to interface with different components of our infrastructure.”
  • “We often do aggregation and want to have a way to archive resources of interest with the help of scripts, in both of these cases an API would be ideal.”

api2Figure 2: A word cloud generated from the free text responses to why the institution is interested in web archiving APIs [Used Word Cloud Generator by Jason Davies]

The respondents were asked “If we organized a new working group within the IIPC to work on web archiving APIs would your institution be willing to participate?” All but one institution said “Yes”. The institution that said “No” said that they were interested but did not have enough staff resources currently to actively participate.

api3Figure 3: Most of the institutions are willing to participate in the new working group.

We asked “In what specific ways could your institution participate? Please select all that apply.” The results are shown in Table 2. Most of the respondents would like to help define the functional requirements, but a good amount would also like to contribute use cases and help design the technical details. Importantly, there are institutions willing to help run the meetings.

Specific Way % of Respondents Count of Respondents
Help define the functional requirements for a web archiving API 94% 15
Contribute curatorial, researcher or management requirements and use cases 81% 13
Help design the technical details of a web archiving API 69% 11
Help schedule and run the working group meetings 19% 3
Other* 6% 1

Table 2: The specific ways institutions would participate in the working group
* One institution said that they would be willing to implement and test web archiving APIs where appropriate and aligned with local needs

So the answer to our original question is a clear YES! There are enough IIPC institutions that are interested and willing to participate in meaningful ways in this new working group. Stay tuned while we work through the logistics of how to start. One of the first steps will be to identify co-chairs for the group. If you are interested in this please let me know! And thanks everyone for taking the time to fill out this survey.

By Andrea Goethals, Manager of Digital Preservation and Repository Services, Harvard Library