Web Archiving at the National Library of Ireland

National Library of Ireland Reading Room © National Library of Ireland.

The National Library of Ireland has a long-standing tradition of collecting, preserving and making accessible the published and printed output of Ireland. The library is over 140 years old and we now also have rich digital collections concerning the political, cultural and creative life of Ireland. The NLI has been archiving the Irish web on a selective basis since 2011. We have over 17 TB of data in the selective web archive, openly available for research through our website.  A particular strength of our web archive is the coverage of Irish politics including a representation of every election and referendum since 2011. No longer in its infancy, the NLI has made some exciting developments in recent years. This year we have begun working with Internet Archive for our selective web archive and are looking forward to the new opportunities that this partnership will bring. We have also begun working closely with an academic researcher from a Higher Education institute in Ireland, who is carrying out network analysis on a portion of our selective data.

In 2007 and 2017, the NLI undertook domain crawling projects and there is now over 43TB of data archived from these crawls. The National Library of Ireland is a legal deposit library, entitling it to a copy of everything published in Ireland. However, unlike many countries in Europe, legal deposit legislation does not currently extend to online material so we cannot make these crawls available. Despite these barriers, the library remains committed to preserving the online story of Ireland in whatever way we can.

Revisions to the legislation are currently before the Irish parliament and if passed will result in the addition of e-publications, such as e-books, journals etc. The addition of websites to that list is currently being considered.

In 2017, the National Library of Ireland became members of the IIPC and we are excited to be attending our first General Assembly in Wellington. While we had anticipated talking about our newly available domain web archive portal and how this had impacted our selective crawls, we are looking forward to discussing the challenges we continue to face, including with Legal Deposit, and how we are developing the web archive as a whole. We may also hopefully be able to update on progress with the legislative framework.  We look forward to seeing you there in Wellington!

Advertisements

Human scale web collecting for individuals and institutions (Webrecorder workshop)

By Anna Perricci, Rhizome

Web archiving ‘at scale’ is usually equated to collecting with automated software (a web crawler) but an assumption that more information is equated to more value is not always right, especially with web archives. A massive scope or scale isn’t required to make meaningful, useful web archives. Collecting at a ‘human scale’ can be as good or better for forming certain collections.

Webrecorder is a free, easy to use, browser based web archiving tool set provided by Rhizome. Rhizome, an affiliate of the New Museum in New York City, champions born-digital art and culture through commissions, exhibitions, digital preservation, and software development. Webrecorder’s development has been generously supported by the Andrew W. Mellon Foundation.

With Webrecorder you can make high fidelity interactive captures of web content as you browse web pages. A “high fidelity capture” means that from a user’s perspective there is a complete or high level of similarity between the original web pages and the archived copies, including the retention of important characteristics and functionality such as: video or audio that requires a user to press ‘play’, or resources that require entry of login credentials for access (e.g. social media accounts). Webrecorder can capture most types of media files, JavaScript and user-triggered actions, which are things that most crawlers struggle with or are unable to obtain.

Workshop attendees will be given an overview of Webrecorder’s features, then engage in hands-on activities and discussions. Further instruction will alternate with opportunities for participants to use the tools introduced and share their thoughts or questions. Instructions on how to manage the collected materials, download them (as a WARC file), and open a local copy offline using Webrecorder Player will also be covered in this workshop.

Human scale web collecting with Webrecorder is not expected to meet all the requirements of a large web archiving program but can satisfy many needs of researchers or smaller web collecting initiatives. Webrecorder can be a great tool for personal digital archiving projects as well. Larger web archiving programs can benefit from using Webrecorder to capture dynamic content and user-triggered behaviors on websites. The WARC files created with Webrecorder can be downloaded and ingested to join WARCs that have been created using crawler-based systems.

With a tool like Webrecorder anyone can get started with web archiving quickly at no cost, which is empowering both to any information professionals and their stakeholders.

On November 14th you can also learn more about Webrecorder in an afternoon session entirely focused on Webrecorder and high fidelity web archiving. This time will start with a 30 minute presentation on Python Wayback (pywb), a core component of Webrecorder, by pywb’s creator and Webrecorder’s lead developer, Ilya Kreymer. Then there will be a 1 hour panel on capturing complex websites and publications using Webrecorder with Jasmine Mulliken, Sumitra Duncan, Nicole Coleman, and me (Anna Perricci).

Whether you are a seasoned expert or newer to web archiving I hope you will be able to join us for the session and this workshop on November 14th at the IIPC WAC. The limit on the number of workshop attendees has been removed so please feel welcome to register.

IIPC Steering Committee Election 2019

The nomination process for IIPC Steering Committee is now open.

The Steering Committee is the executive body of the IIPC, currently comprising 15 member organisations, that take a leadership role in the high-level strategic planning, development and management of programs, policy creation, overall administration, and contribution to IIPC Portfolios and other activities.

What is at stake?

Serving on the Steering Committee is an opportunity for motivated members to help guide the IIPC’s mission of improving the tools, standards and best practices of web archiving while promoting international collaboration and the broad access and use of web archives for research and cultural heritage. Steering Committee members are expected to take an active role in leadership, contribute to SC and Portfolio activities, and help guide and administer the organisation.

Who can run for election?

Serving on the Steering Committee is open to any current IIPC member and we strongly encourage any organisation interested in serving on the Steering Committee to nominate themselves for election. SC members are elected for 3 years and meet twice a year in person, once during the General Assembly, once in September and two or more additional times by teleconference.

Please note that the nomination should be on behalf of an organisation, not an individual. Once elected, the member organisation designates a representative to serve on the Steering Committee. The list of current SC member organisations is available on the IIPC website.

How to run for election?

All nominee institutions, both new and existing members whose term is expiring but are interested in continuing to serve, are asked to write a short statement (max 200 words) outlining their vision for how they would contribute to IIPC via serving on the Steering Committee. Statements can point to past contributions to the IIPC or the SC, relevant experience or expertise, new ideas for advancing the organisation, or any other relevant information.

All statements will be posted online and emailed to members prior to the election with ample time for review by all membership. The results will be announced in mid-May and the three-year term on the Steering Committee will start on 1 June.

Below you will find the election calendar. We are very much looking forward receiving your nominations. If you have any questions, please contact the IIPC PCO.

.


Election Calendar

  •  12 November to 1 March: Members are invited to nominate themselves by sending an email including a statement to the IIPC Programme and Communications Officer.
  • 1 April: Nominees statements are published on the Netpreserve Blog and Members mailing list. Nominees are encouraged to campaign through their own networks.
  • 1 April to  30 April: Members are invited to vote online. An online voting tool will be used to conduct the vote. The PCO will monitor the vote, ensuring that each organisation votes only once for all nominated seats and that the vote is cast by the organisation’s official representative. People will be encouraged to cast their vote before, during, and after the GA.
  • 30 April: Voting ends.
  • 1 May: The results of the vote are announced officially on the Netpreserve blog and Members mailing list.
  • 1 June: end/start of SC members terms. The newly elected SC members start their term on the 1st of June and are invited to attend a first meeting (by teleconference) by the end of June. The next face to face SC meeting will take place in Zagreb in June 2019.

 

Online Hours: Supporting Open Source

By Andrew Jackson, Web Archiving Technical Lead at the British Library

At the UK Web Archive, we believe in working in the open, and that organisations like ours can achieve more by working together and pooling our knowledge through shared practices and open source tools. However, we’ve come to realise that simply working in the open is not enough – it’s relatively easy to share the technical details, but less clear how to build real collaborations (particularly when not everyone is able to release their work as open source).

To help us work together (and maintain some momentum in the long gaps between conferences or workshops), we were keen to try something new, and hit upon the idea of Online Hours. It’s simply a regular web conference slot (organised and hosted by the IIPC, but open to all) which can act as a forum for anyone interested in collaborating on open source tools for web archiving. We’ve been running for a while now, and have settled on a rough agenda:

Full-text indexing:
– Mostly focussing on our Web Archive Discovery toolkit so far.

Heritrix3:
– including Heritrix3 release management, and the migration of Heritrix3 documentation to the GitHub wiki.

Playback:
– covering e.g. SolrWayback as well as OpenWayback and pywb.

AOB/SOS:
– for Any Other Business, and for anyone to ask for help if they need it.

This gives the meetings some structure, but is really just a starting point. If you look at the notes from the meetings, you’ll see we’ve talked about a wide range of technical topics, e.g.

  • OutbackCDX features and documentation, including its API;
  • web archive analysis, e.g. via the Archives Unleashed Toolkit;
  • summary of technologies so we can compare how we do things in our organisations, to find out which tools and approaches are shared and so might benefit from more collaboration;
  • coming up with ideas for possible new tools that meet a shared need in a modular, reusable way and identify potential collaborative projects.

The meeting is weekly, but we’ve attempted to make the meetings inclusive by alternating the specific time between 10am and 4pm (GMT). This doesn’t catch everyone who might like to attend, but at the moment I’m personally not able to run the call at a time that might tempt those of you on Pacific Standard Time. Of course, I’m more than happy to pass the baton if anyone else wants to run one or more calls at a more suitable time.

If you can’t make the calls, please consider:

My thanks go to everyone who as come along to the calls so far, and to IIPC for supporting us while still keeping it open to non-members.

Maybe see you online?

Web Archivists, Assemble!

By Alex Thurman, Columbia University Libraries, Member of the IIPC Steering Committee and the WAC Program Committee (2016-2018), Co-Chair of the Content Development Group

The IIPC General Assembly & Web Archiving Conference is the professional gathering I anticipate most eagerly each year. In an energizing atmosphere of international cooperation, web curators, librarians, archivists, tool developers, computer scientists, and academic researchers from member organizations and beyond meet to share experiences and best practices and plan projects to tackle the collective challenge of preserving web resources.

I’ve had the good fortune of attending each year since 2012, and for the past three years I’ve also had the rewarding experience of serving on the program committees planning these events. As we look forward to the exciting upcoming 2018 conference in Wellington, New Zealand, here is some background on the recent evolution of the GA/WAC and the work of the 2018 WAC Program Committee.

Recent background

2018 marks the fifteenth anniversary of the IIPC, and the twelfth consecutive year that members of the IIPC will come together in an annual General Assembly. The IIPC Steering Committee has striven to cycle (loosely, as dependent on members volunteering to host the event) the venue of the GA/WAC in alternate years between Europe, North America and Australasia. And from the start, the GA event programs have combined days reserved for IIPC members (focused on Consortium planning and working group activities) with one or more open days to welcome the perspectives and expertise of the wider web archiving community and of researchers.

To emphasize this aspect of outreach to researchers and promoting awareness of web archiving, the Steering Committee has in recent years opted to formalize the “open days” as a distinct event—the IIPC Web Archiving Conference. The 2016 event was the first to thus distinguish the General Assembly from the Web Archiving Conference, and thereafter, at the suggestion of that PC’s Chair (Kristinn Sigurðsson, National and University Library of Iceland), planning responsibility for the different event components became more distributed: the GA program would be determined by the Steering Committee Officers and Portfolio Leads and the Working Group Chairs; a mostly local Organizing Committee would see to the logistical planning of securing a venue and catering and possible sponsors; and the Web Archiving Conference program would be developed by a Program Committee. The 2017 Program Committee (chaired by Nicholas Taylor, Stanford University) was the first to include some non-IIPC members, and their CFP was the first to attract more relevant submissions than we had space to accept, a milestone in the maturation of the conference.

Work of the 2018 Program Committee

Co-chaired by Jan Hutař (Archives New Zealand) and Paul Koerbin (National Library of Australia), the 12-member 2018 Program Committee started work in November 2017. Our first task was drafting a call for papers, which involved first discussing whether the conference would have a stated theme and the types (presentations, panels, workshops, tutorials) of submission proposals we’d ask for and the nature of the submission (abstracts? full papers?). We needed a flexible theme that would acknowledge the IIPC’s milestone 15th anniversary and the value of our collective work preserving the web so far, while embracing creative new approaches to the evolving challenges we face. In his draft CFP, Paul Koerbin hit on “Web Archiving Histories and Futures and we ran with that. And as the Wellington event will be the first GA/WAC held in Australasia in 10 years, we especially encouraged submissions related to Asia/Pacific web archiving activities.

To encourage submissions from all types of web archiving practitioners and users, in the CFP we further listed some suggested topics, under the rubrics of “building web archives,” “maintaining web archive content and operations,” “using and researching web archives,” and “web archive histories and futures.” And we opted to ask applicants to submit abstracts only rather than full papers, both to lower the barriers to application in order to get more submissions, and to allow all Program Committee members to consider (and vote on) all submissions, rather than assigning reviewers to specific papers. Once the CFP was ready, PC members worked hard to distribute it to a wide selection of mailing lists, reaching beyond IIPC members and other cultural memory institutions to also get submissions from independent researchers.

This strategy worked (boosted no doubt by the intrinsic appeal of visiting Wellington!), as we received a record number of submissions for the WAC, submitted through EasyChair. The breadth and depth of interesting submissions allowed us to build a strong program–while unfortunately having to reject some relevant proposals. Each committee member read all the submitted abstracts and rated each one on a 3-point scale, yielding cumulative point averages for each submission from which the committee could decide which submissions would be accepted for the conference. In order to know how many submissions could be accepted we first had to consider how much conference schedule time we had available, which would depend in part on whether we would have multiple tracks.

We decided the program would have a mix of plenary talks and usually two tracks of presentations or workshops, and Olga Holownia (IIPC Program & Communication Officer) provided a range of detailed schedule templates for us to use to figure out how many individual presentations, panels, and workshops we’d have room for. We then began grouping accepted proposals into thematic sessions, loosely conceived as more-technical and less-technical tracks, in order to reduce (though not eliminate) the frustration of attendees wishing they could be in both tracks at once. Committee members then divided up the responsibility of serving as session chairs, to introduce the speakers and keep the sessions running on time.

Between the tasks of preparing the CFP and evaluating the submissions and shaping them into a program, the committee had the additional enjoyable responsibility of brainstorming possible keynote speaker candidates. Committee members suggested over two dozen possible keynoters, voted on them, and eventually submitted a few outstanding candidates to the Organizing Committee for their consideration. The Organizing Committee took these suggestions and added others based on their familiarity with the Australasian digital library and academic scene and delivered two exciting keynote speakers – Wendy Seltzer (World Wide Web Consortium) and Rachael Ka’ai-Mahuta (Te Ipukarea, the National Maori Language Institute, Auckland Institute of Technology) – and an additional plenary talk from Vint Cerf (Google). With these and many other talented contributors from within and beyond IIPC member institutions, the 2018 IIPC Web Archiving Conference looks to be a rich and stimulating event.

Register now!

Serving on the WAC Program Committee is a great opportunity to work directly with IIPC colleagues and other web archiving enthusiasts. And the work continues – you can volunteer now to serve on the Program Committee and start shaping the 2019 IIPC WAC.

A personal reflection on the IIPC WAC

By Gillian Lee, Coordinator, Web Archives at the National Library of New Zealand, Member of the IIPC Steering Committee and the WAC Program Committee

This year I’ve had the privilege of being part of the programme committee for IIPC WAC. Reading through the abstracts that many of you sent in gave me a real sense of excitement about the work that we are all involved in. That caused me to reflect on the benefits of the IIPC conference and what it means to us as members. Some of you might attend these conferences on a regular basis, others may never have had that opportunity.

I’ve been web archiving for 11 years and have been fortunate to attend 3 IIPC conferences during that time. It’s rare for me to attend a conference that’s actually about the work I do, so I really value those times! It’s an opportunity to finally meet people, who were formerly just names on mailing lists and blog posts. Getting together with other web archivists is invaluable, whether it’s talking to someone who is just starting out in the web archiving world, sharing the struggles of budget constraints, or learning more about what members are doing. You can’t beat that!

Even in this digital age it’s easy to feel isolated here in New Zealand when we hear so much about web archiving developments, especially in Europe and the States. There’s only so much you can learn from emails, blog posts and the odd webinar that’s not scheduled for 2am NZ time!!

Despite the distance we have collaborated with other IIPC members over the years. Back in 2006 the National Library of New Zealand worked with the British Library to build Web Curator Tool (WCT). The BL have moved on and developed other tools since then, and this year we’ve collaborated with National Library of the Netherlands in a major upgrade to WCT. Kees Teszelszky blogged about this recently. You can find out more about it during the IIPC conference in Wellington in November.

We’ve also been involved with the Content Development Working Group by submitting seed lists to collaborative collections such as the Olympic Games, World War One Commemoration and the News around the World project. If you’re new to IIPC, do consider getting involved in one of the IIPC groups.

We’re really excited to be hosting IIPC this year and look forward to meeting you all in person! A number of my colleagues have never had the chance to attend an IIPC conference, so they’re in for a treat! See you soon!

Mark_Beatty-NLNZ
National Library of New Zealand, Photo by Mark Beatty / CC BY-NC 3.0 NZ.

Welcome to WAC in Wellington

By Peter McKinney, Digital Preservation Policy Analyst at the National Library of New Zealand and the Chair of the IIPC 2018 General Assembly and Web Archiving Conference Organising Committee

National Library of New Zealand Te Puna Mātauranga o Aotearoa.

I remember my first time in New Zealand. It was wonderful. But I do remember commenting to my partner, as we sat on the tarmac in Auckland, that I couldn’t live here as it was too far away from anything (I lived in Scotland at the time).

Just over a year later I moved to Wellington.

I’m not sure whether this shows my unerring ability to change my mind at a whim, or the strength of what I found over here. I hope the latter. The travel for visitors is well worth it. Wellington and New Zealand are amazing. And while the work of the National Library has attracted a number of us to come and live our lives here, it is the country that makes it home.

It is therefore my great honour to be part of the team that is welcoming you here. The National Library of New Zealand feels greatly priveleged to be hosting this year’s IIPC General Assembly and Web Archiving Conference. The Library has received great benefit from being a member of the IIPC over the years and to be able to entice members and the wider web archiving community all the way down to the South Pacific is an amazing opportunity for us. We can open up participation to those who just have not been able to travel those distances up to the northern hemisphere. It is also a great chance for us to show off what we have down here.

I have two primary responsibilities in my role as Chair of the Organising Committee. The first is to ensure that IIPC members have a productive week. This means providing a comfortable environment where members can get their business done and enjoy everything Wellington has to offer. My second responsibility is that “locals” (New Zealanders and our pacific neighbours) are able to take advantage of the experience and expertise that will be converging at the Library; this is a precious opportunity that will not come round again in the foreseeable future.

The website has a host of information about the GA and WAC, and I encourage you to check it out (and get in touch if need more information). Alex Thurman has written about the work of the programme committee pulling together what is a brilliant selection of papers, panels, posters, tutorials and workshops. Gillian Lee has also covered off what it means to staff in the National Library to be able to have the IIPC event down here in Wellington.

Personally, I can’t wait to hear from our keynote speakers (Rachael Ka’ai-Mahuta and Wendy Seltzer). They have been asked to challenge us and make us pause and consider what the future of web archiving may look like. Vint Cerf needs no introduction and we are incredibly grateful that he has accepted our invitation to share his current thinking with us. We’re also having a public event on Tuesday, which we will be announcing in the next few weeks.

The week will be busy and hopefully, productive and inspiring. I also can’t encourage you enough to explore Wellington and beyond if you have time. There is, of course plenty of time to sleep on the plane on the way back!