Results of the Steering Committee Election 2019

The following IIPC member organisations have been elected to serve for a period of three years starting on 1st of June 2019 –

On behalf of the membership I would like to thank all of those who have taken part in this election.

IIPC PCO

 

IIPC Steering Committee Election 2019: nomination statements

The Steering Committee is the executive body of the IIPC, currently comprising 15 member organisations. This year five seats are up for election/re-election. In response to the call for nominations  to serve on the IIPC Steering Committee for a three-year term commencing 1 June 2019, seven IIPC member organisations have put themselves forward:

An election will be held from 3 March to 31 March. The IIPC designated representatives from all member organisations will receive an email with instructions on how to vote. Each member will be asked to cast five votes. The representatives should ensure that they read all the nomination statements before casting their votes. The results of the vote will be announced on the Netpreserve blog and Members mailing list on 1 April. The first Steering Committee meeting will be held before the General Assembly in Zagreb, on 4 June.

If you have any questions, please contact the IIPC Programme and Communications Officer.


Nomination statements in alphabetical order:

Deutsche Nationalbibliothek / German National Library

As a member of the IIPC since 2007, the German National Library has always been particularly interested in preservation aspects and the representative Tobias Steinke is co-lead of the Preservation Working Group. The selective web archive of the German National Library started in 2012. Its workflow is based on a co-operation with the service provider oia and does not include the common open source tools, which could give the IIPC a different perspective and help to represent the various members.

 

Internet Archive

Internet Archive seeks to continue its role on the IIPC Steering Committee. As the oldest and largest publicly-available web archive in the world, a creator and ongoing developer of many of the core technologies used in web archiving, and an original founding member of the IIPC, Internet Archive plays a key role in advancing web archiving and fostering broad community participation in preserving and providing access to the web-published records that document our shared cultural heritage. Internet Archive has also served in a variety of leadership and program roles within the Steering Committee since IIPC’s formation. In continuing this active role on the IIPC Steering Committee, Internet Archive will contribute to furthering the IIPC’s strategic initiatives building a collaborative framework to advance web archiving and grow and diversify the IIPC’s membership. The web is the most significant communication platform of our era — it is also one that can only be preserved and made accessible through broad-based, multi-institutional efforts lead by organizations such as the IIPC. By extending our role on the IIPC Steering Committee, Internet Archive will continue its participation in the knowledge-sharing and leadership that supports the IIPC and the broader community in its ongoing efforts to preserve the web.


 

Landsbókasafn Íslands – Háskólabókasafn / National and University Library of Iceland

The National and University Library of Iceland is interested in serving another term on the IIPC Steering Committee. The library has had an active web archiving effort for nearly two decades. Our participation in the IIPC has been instrumental in its success.

As one of the IIPC‘s smaller members, we are keenly aware of the importance of collaboration to this specialized endeavor. The knowledge and tools that this community has given us access to are priceless.

We believe that in this community active engagement ultimately brings the greatest rewards. As such we have participated in projects, including Heritrix and OpenWayback. We have hosted IIPC events, including the 2016 GA/WAC and an upcoming hackathon in April. And we have provided leadership in various areas, including in working groups, SC chair (2008) and our SC representative is currently in charge of the tools portfolio.

If re-elected to the SC, we will aim to continue on in the same spirit.


 

Library of Congress

The Library of Congress (LC) has been involved in web archiving for almost 20 years, building a variety of thematic and event-based collections for its web archives. LC has worked collaboratively with national and international organizations on collections, preservation tools and workflow processes, while developing in-house expertise and curatorial tools to enable effective collection and management of over 1.7 petabytes of web content collected to date. As a founding member of IIPC, LC has served in a variety of leadership roles, currently as SC member, Preservation WG and Training WG co-chair, and in prior years as SC Chair, Communications Officer, Content Development Group co-chair, and on the Membership Engagement portfolio, and helped secure a new fiscal agent. If re-elected, the LC looks forward to continuing to focus on developing a web archiving training program, encouraging new opportunities for membership engagement and funding opportunities for member projects. We will continue to participate in discussions around preservation, tools, and processes that will enable us all to work more efficiently and collaboratively as a community, and look forward to engaging in activities and discussions that will help strengthen the IIPC for the future and next membership agreement.


 

National Library of Australia

The National Library of Australia (NLA) was a founding IIPC member and Steering Committee member until 2009, hosting the second general committee meeting in Canberra in 2008. In 2004 the NLA organized the first major international conference on web archiving for cultural institutions. The NLA’s experience and leadership in web archiving goes back to 1996 with the establishment of PANDORA, one of the first collaborative web archiving programs.  The NLA has been a continuous IIPC member and has actively contributed expertise to the preservation working group.

The NLA strengths include experience in operational maturity, sustainability and open access through its web archiving program which embraces selective, domain and bulk collecting methods. The NLA has a strong commitment to, and experience with, collaborative web archiving through PANDORA.  The NLA has a demonstrated record with innovation, building the first selective web archiving workflow systems (PANDAS) and the recent ‘outbackCDX’ tool providing efficiency for managing indexing. In March 2019 the NLA launched the Australian Web Archive, which made the whole .au web archive fully accessible and openly searchable in Trove.  The NLA believes it is time for Australia to rejoin the IIPC leadership adding southern hemisphere representation and experience to the Steering Committee.


 

National Library of New Zealand / Te Puna Mātauranga o Aotearoa

National Library of New Zealand’s mandate to preserve New Zealand’s social and cultural history includes:

  1. A legal mandate to perform web harvests under the National Library of New Zealand Act 2003)
  2. A social responsibility to develop collections (including digital collections) reflecting the social, cultural, economic and other endeavours of New Zealanders.

The Library has a programme of selective web harvesting and has conducted eight whole of domain ‘snapshots’ since 2008. We are also experimenting with Twitter, focusing on hashtag crawls of major NZ events or activities considered culturally important (e.g. Kaikoura Earthquake, GE2017, Moko Kauae, Grace Millane, Te Matatini, Nelson Fires). The Library is also collaborating with the National Library of the Netherlands on the ongoing enhancement and development of the Web Curator Tool.

National library has been a continuous member of IIPC since 2007 and has previously been a member of the IIPC Steering Committee. Having recently appointed a dedicated web archiving role to the Library’s digital preservation team we now feel that we are able to contribute more fully to the work of the IIPC, and we feel that membership of the IIPC Steering Committee is one of the ways that we can contribute.


 

Stanford University Libraries

We have concluded our three-year term on the Steering Committee and appreciate your consideration for serving another term. IIPC has progressed notably in these three years. Our private, member-focused GA has been eclipsed by an increasingly visible and rigorously-curated WAC. IIPC as an organization has befittingly matured as well, re-administering itself under CLIR’s fiscal sponsorship. These changes reflect opportunities to continue to evolve IIPC from its start as a largely inward-looking, homogeneous cadre of collaborating member institutions to a professionalized organization more keenly focused on the diversification of participating stakeholders and advancement of web archiving practice broadly.

We are interested in continuing to move IIPC in this direction, in keeping with the vision presented by Jefferson Bailey as outgoing Chair. As a consistent contributor to IIPC activities and goals, we can be counted on to “do the work.” Our tangible contributions to date include serving as Treasurer, serving as Training Working Group co-chair, chairing the 2017 WAC Program Committee, organizing and co-hosting the 2015 GA and WAC, and serving on every WAC Program Committee since 2015.

A New Release of Heritrix 3

By Andy Jackson, Web Archiving Technical Lead at the British Library

One of the outcomes of the Online Hours meetings has been an increase in activity around Heritrix 3. Most of us rely on Heritrix to carry out our web crawls, but recognise that to keep this large, complex crawler framework sustainable we need to try and get more people use the most recent versions, and make it easier for new users to get on board.

The most recent ‘formal’ release of Heritrix 3 was version 3.2.0 back in 2014, but a lot has happened since then. Numerous serious bugs have been discovered and resolved, and some new features added, but only those of us running the very latest code were able to take advantage of these changes.

Those of us who would rather base our crawling on a software release rather than building from source have been relying on the stable releases built by Kristinn Sigurðsson, and hosted on the NetarchiveSuite Maven Repository. This worked well for ‘those in the know’, but did little to make things easier for new users.

In an attempt to resolve this, and in coordination with the Internet Archive, we have started releasing ‘formal’ versions of Heritrix, culminating in the 3.4.0-20190207 Interim Release. This new release believed to be stable, and is recommended over previous releases of Heritrix 3. As well as being released on GitHub, it is also available through the Maven Central Repository, which should make it easier for others to re-use Heritrix.

You may notice we’ve added a date to the version tag. Traditionally, Heritrix 3 has used a tag of the form “X.X.X”, which gives the impression we are using a form of Semantic Versioning. However, that does not reflect how Heritrix is evolving. Heritrix is a broad framework of modules for building a crawler, and has lots of different components of different ages, at different levels of maturity and use. Given there are only a small number of developers working on Heritrix, we don’t have the resources to guarantee that a breaking change won’t slip into a minor release, so it’s best not to appear to be promising something we cannot deliver.

This means that, when you are upgrading your Heritrix 3 crawler, we recommend that you thoroughly test each release using your configuration (your ‘crawler beans’ in Hertrix3 jargon) under a realistic workload. If you can, please let us know how this goes, to help us understand how reliable the different parts of Heritrix 3 are.

As well as making new releases, we have also moved the Heritrix 3 documentation over to GitHub to populate the Heritrix3 wiki, and shifted the API documentation to a more modern platform. We hope this will help those who have been frustrated by the available documentation, and we encourage you to get in touch with any ideas for improving the situation, particularly when it comes to helping new users get on board.

If you want to know more, please drop into the Online Hours calls or use the archive-crawler mailing list or IIPC Slack to get in touch. To join IIPC Slack, submit a request through this form.

Passing the Torch

By Jefferson Bailey, Internet Archive

Dear IIPC Community,

As of January 1, 2019, my term as Chair of the IIPC came to a close. Having served as Chair since September 2017 and, prior to that, as Vice Chair from April 2016 (during the excellent leadership of Emmanuelle Bermès of BNF), I have seen the IIPC continue to grow and evolve. It has been a privilege to serve in these roles during this exciting time. While I will continue to serve as a regular Steering Committee (SC) member, I wanted to take this transitional moment to reflect on the successes and ongoing work of both the SC and the IIPC. The centrality of the web as a communication and publication platform only increases by the day and the work of the IIPC and its members becomes ever more critical in documenting history, preserving knowledge, and interrogating privilege and power. There is always more work to be done.

Before reflecting on recent progress and future directions, I want to give a big thanks to my co-Officers. Vice Chair Sylvain Bélanger of Library Archive Canada and Treasurer Tom Cramer of Stanford University Libraries both worked to advance IIPC’s mission and operations. As well, Program and Communications Officer Olga Holownia worked, and continues to work, tirelessly to support the overall activities of the consortium. Thanks go as well to the SC members that volunteer their time and to the many regular members that actively contribute to Working Groups (WGs), committees, portfolios, etc, and who keep the IIPC a dynamic forum for sharing knowledge and practices. Lastly, I look forward to the great team of new SC Officers, Chair Hansueli Locher of the Swiss National Library, Vice Chair Mark Phillips of University of North Texas, and Sylvain serving as Treasurer. The near-term future of IIPC is in good hands.

In my time as Vice Chair and Chair, IIPC has continued to add new members and expand its activities. Here is my reflection on areas of recent progress and further effort:

Areas of Recent Progress

A Maturing Organization

It is well known that IIPC faced many financial and operational difficulties related to the unforeseen inability of BNF to continue to provide financial and accounting support for IIPC in 2017, after many years of admirably providing this service for IIPC without recompense. We all owe thanks to British Library and to Olga for enabling the 2017 conference to happen, even in a moment of financial uncertainty. From crisis came positive change, as myself and Abbie Grotke of Library of Congress were able to arrange an agreement with the Council of Library and Information Resources (CLIR) to provide professional fiscal sponsor services for IIPC. CLIR is a wonderful supporter of the library community, has proven an excellent fiscal agent, and we are excited to establish this relationship and expect it to be a foundation for further collaborations.

Much work was also done by Officers to implement a suite of protocols and procedures around invoicing, member onboarding, financial tracking, vendor and expense payments, and other basic budgeting and organization management. Many of these processes were previously unenforced or nonexistent and caused a notable strain on IIPC’s limited staffing. Professionalization of finances and many operations should allow IIPC to focus more on its core mission – delivering member value and advancing preservation of the internet!

Premier Events

The past few years also featured improvements in the planning and management of the GA and WAC conference, including more seamless planning workflows, more budgetary autonomy for hosts, the exploration of sponsorships, registration fees, and event planning services, and other efficiency and sustainability approaches. The IIPC WAC continues to be the premier event for web archiving, and many attendees noted that the 2018 GA/WAC hosted by National Library of New Zealand was one of the best conferences so far. Proposal submissions, sessions, and attendance all continue to grow and the quality of the event remains superlative. The 2019 event at the  National and University Library in Zagreb, Croatia will continue the trend. Other workshops, forums, and programming also continued IIPC’s essential role in providing the best venue for discussion of web preservation and access issues.

Member Activity

A number of new initiatives, as well as growth in existing projects, signaled that member engagement and contribution remains high. From the new Training Working Group, to an extensive Member Engagement Survey, to the growing collaborative collections of the Content Development Group, to many other formal and informal activities, IIPC members remain active in the organization. We are hoping the stability mechanisms of the past few years have enabled even more ways for members to participate and contribute.

Areas of Further Effort

Organizational Maturity

Though, per above, great strides were made in professionalizing many activities, other areas of operations also need to evolve to account for IIPC’s growth and strategic aspirations. The challenges related the fiscal agent transition illuminated broader circumstances related to IIPC’s growth over the years – namely that critical operational and administrative functions can no longer be dependent on the the unpredictable contributions or internal decisions of individual member institutions. The model of member-contributed operational support made sense when IIPC was one or two dozen members. With over 50 members, a growing portfolio of activities, and nearly 200,000 EUR in annual member dues, IIPC has outgrown such an arrangement. All core functions of IIPC – from finances to operations to staffing – need to operate autonomously and independent of individual members to ensure a successful, ongoing provision and continuation of services and obviate conflicts of interest. There are many arrangements that can be pursued to support this self-sufficiency and IIPC is blessed with a large financial reserve that can help advance this effort. Work to achieve this self-reliance will no doubt be a focus of the SC in the coming years.

Scaling Participation

As I noted in my Chair’s address at the General Assembly, IIPC is poised to pivot to focusing on resiliency, member benefits, and strategic investment. I had fantastic conversations over the years with members about ideas for IIPC to deliver value to members via new activities and investments. As part of these conversations, I devised with feedback from SC, a “Discretionary Funding Program” (see link above) to invest a significant portion of IIPC’s reserve funds to support member-proposed and member-managed projects. Expect more news about this program soon.

IIPC also needs to invest resources to encourage a broader involvement of members in leadership positions. There is very little turnover in institutional representation on the SC. As well, Officer roles have also been held by an even smaller number of institutions historically, and there was no self-nomination for Chair during this year’s nomination period (thanks go to Hansueli for stepping up after the year started with the role vacant). To remain vibrant and reflective of its community, representation of more members is needed at the Steering, Officer, Working Group chairs, and other elected and self-nominated positions. Term limits, limitations on consecutive terms served by an institution, leadership stipends, more clearly defined expectations of service, or other formal or informal inducements are ideas that could bring fresh perspectives and new ideas to SC or WG leadership roles. Like with operations, IIPC’s governance needs to evolve and adapt to introduce new voices and vibrancy to our growing organization.

Member Diversity

The web is a global and, in some ways, borderless phenomenon. Yet one only need to look at the IIPC membership map to recognize that vast portions of the globe are underrepresented in IIPC and, likely, in the global web collection we are all working to build. As well, web preservation is increasingly a concern of institutions beyond just national libraries and research universities. There is surely momentum and engagement to be found in scaling IIPC membership and activities both vertically (inclusive of organizations of differing size, mandates, and missions) and horizontally (inclusive of underrepresented regions and nations). Building a truly global organization, as well as a diverse, inclusive preserved record of the web, will require participation far beyond North America and Europe. Subsidized membership rates, diversity scholarships or travel funding, and targeted partnerships, outreach, or participation tools  are just some ideas or activities that can started or scaled. I am excited to work with colleagues over the coming year to propose a number of ideas for improving diversity within IIPC.

In summary, I had four goals when assuming SC Officer roles: get IIPC’s house in order, improve operations, scale support of member-driven projects, and diversify membership and leadership. I think notable progress was made on three of those four and more time will allow diversity initiatives to gain traction. While some of this progress was behind-the-scenes or is soon-to-be-released, hopefully it has helped IIPC grow and thrive. The new leadership team will no doubt continue this trend.

Keep on crawlin’!

Jefferson Bailey
Internet Archive
IIPC Steering Committee

IIPC – Meet the Officers, 2019

The IIPC is governed by the Steering Committee, formed of representatives from fifteen Member Institutions who are each elected for three year terms. The Steering Committee designates the Chair, Vice-Chair and the Treasurer of the Consortium. Together with the Programme and Communications Officer (PCO, based at the British Library), the Officers are responsible for dealing with the day-to-day business of running the IIPC.

The Steering Committee has designated Hansueli Locher, Swiss National Library, to serve as Chair, Mark Phillips, University of North Texas, to serve as Vice-Chair and Sylvain Bélanger, Library and Archives Canada, to serve as Treasurer in 2019. CLIR (the Council on Library and Information Resources) remains the Consortium’s fiscal host.

The Members and the Steering Committee of the IIPC would like to thank Jefferson Bailey (IIPC Chair, September 2017 – January 2019 and Vice-Chair, April 2016 – September 2017), Internet Archive, Sylvain Bélanger (IIPC Vice-Chair, September 2017 – January 2019) and Tom Cramer (IIPC Treasurer, September 2017 – January 2019), Stanford University Libraries, for contributing their time and expertise to support the Consortium during their extended terms of office.

The nomination process for the IIPC Steering Committee is still open and five seats will become available as of June 1st 2019. IIPC Members are invited to nominate themselves by sending an email including a statement to the IIPC Programme and Communications Officer by March 1st 2019.


IIPC CHAIR

Hansueli LocherAfter being a teacher for several years Hansueli Locher decided to turn his hobby – computer science – into his profession. He worked at the Swiss Federal Statistical Office where he was responsible for database supported evaluations of statistical data. He also developed a library system and supervised information projects with strong IT-links. Since 2000 he is working at the National Library. As Project Manager “Archiving” he was responsible for the technical aspects of long-term preservation of digital objects. As Head of ICT Services he is now responsible for IT at the Swiss National Library and the Federal Office of Culture. Swiss National Library joined IIPC in 2007 and Hansueli has represented the Library in the Steering Committee since 2013. Hansueli has also been the Lead and is now one of the Co-Leads of the IIPC Partnerships and Outreach Portfolio and was on the Organising Committee for the General Assembly and the Web Archiving Conference in Wellington, New Zealand.

 

IIPC VICE-CHAIR

Mark PhillipsMark Phillips is Associate Dean for Digital Libraries at the University of North Texas (UNT) in Denton, Texas. Mark has been involved with all stages in the development of the digital library access and preservation infrastructure at the UNT Libraries. The UNT Libraries’ Digital Collection manages over 2.5 million digital resources made available through the interfaces of The Portal to Texas History, the UNT Digital Library, and the Gateway to Oklahoma History. In addition to digital library infrastructure development, Mark has been involved in the web archiving activities at the UNT Libraries since 2004 including the 2008, 2012, and 2016 End of Term Web Archive activities and the development of the URL Nomination Tool. He has been active in IIPC since the UNT Libraries joined the Consortium in 2008 and has served as the UNT representative for the IIPC Steering Committee since 2015. Mark has been one of the Co-Leads of the IIPC Partnerships and Outreach Portfolio and is currently the Portfolio’s Lead.

 

IIPC TREASURER

Sylvain BélangerSylvain Bélanger is Director General of the Digital Operations and Preservation Branch for Library and Archives Canada since February 2014. In this role Sylvain is responsible for leading and supporting LAC’s digital business operations, and all aspects of preservation for digital and analog collections. Prior to accepting this role, Sylvain was Director of the Holdings Management Division since 2010, and previously Corporate Secretary and Chief of Staff for Library and Archives Canada. Library and Archives Canada is one of the founding members of the IIPC.

Collaborate to develop web archive collections with Cobweb!

By Kathryn Stine, Manager, Digital Content Development and Strategy at the California Digital Library

Cobweb is a recently launched collaborative collection development platform for web archives, now available for anyone to use to establish and participate in web archiving collecting projects at https://cobwebarchive.org. A cross-institutional team from UCLA, the California Digital Library (CDL), and Harvard University has developed Cobweb, which was made possible in part by funding from the United States Institute for Museum and Library Services and initially hosted by CDL. We’ve been encouraged by the enthusiasm and engagement that’s met Cobweb and look forward to supporting a range of collaborative and coordinated web archiving collecting projects with this new platform.

Peter Broadwell & Kathryn Stine introducing CobWeb at the Web Archiving Conference in Wellington (slides).

At the 2018 IIPC Web Archiving Conference in New Zealand, Cobweb tutorial attendees played with Cobweb functionality and provided useful feedback and ideas for platform refinements and future feature options. Thank you to all who have shared their suggestions for advancing Cobweb! A number of demonstration projects are now on the platform that showcase how Cobweb supports web archiving collection development activities, including nominating web resources to a project and claiming intentions for, and following through with, archiving nominated web content. Additionally, the extensive Archive of the California Government Domain (CA.gov) has been established as a Cobweb collecting project and the CA.gov team is considering how to integrate Cobweb into its collection development workflows.

Cobweb centralizes the often distributed activities that go into developing web archive collections, allowing for multiple contributors and organizations to work together towards realizing common collecting goals. The coordinated activities that result in rich, useful web archive collections can draw upon distinct areas of expertise or capacity including subject specialization, technical facility with content capture, and resources for storing and managing content. The Cobweb platform is well-suited to supporting curated and crowdsourced collection building, from complex, multi-partner initiatives to local efforts that require coordination, such as that between digital archivists and library subject selectors.

If you have web archiving collecting goals that can benefit from engaging in collaborative and/or coordinated participation, learn more about getting started with Cobweb by visiting https://cobwebarchive.org/getting_started, checking out the Cobweb presentation from the IIPC WAC, or by emailing cobwebarchive[at]gmail.com.

Web Archiving Down Under: Relaunch of the Web Curator Tool at the IIPC conference, Wellington, New Zealand

Kees Teszelszky, Curator Digital Collections at the National Library of the Netherlands/Koninklijke Bibliotheek (with input of Hanna Koppelaar, Jeffrey van der Hoeven – KB-NL, Ben O’Brien, Steve Knight and Andrea Goethals – National Library of New Zealand)

Hanna Koppelaar, KB & Ben O'Brien, NLNZ. IIPC WAC 2018.
Hanna Koppelaar, KB & Ben O’Brien, NLNZ. IIPC Web Archiving Conference 2018. Photo by Kees Teszelszky

The Web Curator Tool (WCT) is a globally used workflow management application designed for selective web archiving in digital heritage collecting organisations. Version 2.0 of the WCT is now available on Github. This release is the product of a collaborative development effort started in late 2017 between the National Library of New Zealand (NLNZ) and the National Library of the Netherlands (KB-NL). The new version was previewed during a tutorial at the IIPC Web Archiving Conference on 14 November 2018 at the National Library of New Zealand in Wellington, New Zealand. Ben O’Brien (NLNZ) and Hanna Koppelaar (KB-NL) presented the new features of the WCT and showed how to work collaboratively on opposite sides of the world in front of an audience of more than 25 spectators.

The tutorial highlighted that part of our road map for this version has been dedicated to improving the installation and support of WCT. We recognised that the majority of requests for support were related to database setup and application configuration. To improve this experience we consolidated and refactored the setup process, correcting ambiguities and misleading documentation. Another component to this improvement was the migration of our documentation to the readthedocs platform (found here), making the content more accessible and the process of updating it a lot simpler. This has replaced the PDF versions of the documentation, but not the Github wiki. The wiki content will be migrated where we see fit.

A guide on how to install WCT can be found here, a video can be found here.

1) WCT Workflow

One of the objectives in upgrading the WCT, was to raise it to a level where it could keep pace with the requirements of archiving the modern web. The first step in this process was decoupling the integration with the old Heritrix 1 web crawler, and allowing the WCT to harvest using the more modern Heritrix 3 (H3) version. This work started as a proof-of-concept in 2017, which did not include any configuration of H3 from within the WCT UI. A single H3 profile was used in the backend to run H3 crawls. Today H3 crawls are fully configurable from within the WCT, mirroring the existing profile management that users had with Heritrix 1.

2) 2018 Work Plan Milestones

The second step in this process of raising the WCT up is a technical uplift. For the past six or seven years, the software has fallen into a period of neglect, with mounting technical debt. The tool is sitting atop outdated and unsupported libraries and frameworks. Two of those frameworks are Spring and Hibernate. The feasibility of this upgrade has been explored through a proof-of-concept which was successful. We also want to make the WCT much more flexible and less coupled by exposing each component via an API layer. In order to make that API development much easier we are looking to migrate the existing SOAP API to REST and changing components so they are less dependent on each other.

Currently the Web Curator Tool is tightly coupled with the Heritrix crawler (H1 and H3). However, other crawl tools exist and the future will bring more. The third step is re-architecting WCT to be crawler agnostic. The abstracting out of all crawler-specific logic allows for minimal development effort to integrate new crawling tools. The path to this stage has already been started with the integration of Heritrix 3, and will be further developed during the technical uplift.

More detail about future milestones can be found in the Web Curator Tool Developer Guide in the appropriately titled section Future Milestones. This section will be updated as development work progresses.

3) Diagram showing the relationships between different Web Curator Tool components

We are conscious that there are long-time users on various old versions of WCT, as well as regular downloads of those older versions from the old Sourceforge repository (soon to be deactivated). We would like to encourage those users of older versions to start using WCT 2.0 and reaching out for support in upgrading. The primary channels for contact are the WCT Slack group and the Github repository. We hope that WCT will be widely used by the web archiving community in future and will have a large development and support base. Please contact us if you are interested in cooperating! See the Web Curator Tool Developer Guide for more information about how to become involved in the Web Curator Tool community.

WCT facts

The WCT is one of the most common, open-source enterprise solutions for web archiving. It was developed in 2006 as a collaborative effort between the National Library of New Zealand and the British Library, initiated by the International Internet Preservation Consortium (IIPC) as can be read in the original documentation. Since January 2018 it is being upgraded through collaboration with the Koninklijke Bibliotheek – National Library of the Netherlands. The WCT is open-source and available under the terms of the Apache Public License. The project was moved in 2014 from Sourceforge to Github. The latest release of the WCT, v2.0, is available now. It has an active user forum on Github and Slack.

Further reading on WCT:

Reaction on twitter: