Contribute to CDG’s AI Collection!

By Tiiu Daniel, Web Archive Leading Specialist, National Library of Estonia

“Trurl” by Daniel Mróz, from The Cyberiad by Stanisław Lem (Wydawnictwo Literackie, Kraków, 1972). Illustration copyright © 1972 Daniel Mróz. Reprinted by permission.

After significant breakthroughs at the end of the 20th and at the beginning of 21st centuries, artificial intelligence (AI) has played a greater role in our daily lives. Although AI has a huge positive impact on a variety of fields such as manufacturing, healthcare, art, transportation, retail and so on, the use of new technologies also raises ethical issues as well as security risks. One critical and hotly debated issue is the impact of ongoing automation on labor markets, to include changing educational requirements for jobs, job elimination, and various models for transitions.

The IIPC Content Development Group invites curators and web archivists around the world to contribute websites to a new “Artificial Intelligence” web collection.

The purpose of this collection is to bring together and record web content related to use of AI and its impact on any possible aspect of life, reflecting attitudes and thoughts towards it, future predictions etc.

The content can be in any language focusing on specific countries or cultures or have a global scope.

We especially welcome contributions from underrepresented countries, cultures, languages and other groups, or those countries without IIPC members. Curators currently building AI related collections at their own institutions are welcome to contribute their seeds (matching below criteria) to aid in the development of a collection with an international perspective.

The collection aims to cover the following subtopics:

  • Machine learning, natural language processing, robotics, automation;
  • AI in literature, visual arts (e.g. ceramics, drawing, painting, sculpture, design, photography, filmmaking, architecture) and performing arts (e.g. theater, public speech, dance, music etc.); AI in emerging art forms;
  • AI and law/legislation;
  • Social and economic impact (e.g. impact on behavior/interaction, bias in AI, unemployment, inequality, changes in labor markets);
  • Ethical issues (e.g. weaponization of AI, security, robot rights);
  • Future predictions/scenarios concerning AI.

Types of web content to include are personal forms such as blogs, forum posts, and artist websites; trend reports, statements, and analyses (i.e. from government agencies, NGOs, scientific or academic institutions, advocacy groups, businesses).

Time frame covered by content: from the 1990s onwards.

Out of scope are: full social media feeds and channels (Facebook, twitter, Instagram, YouTube, WhatsApp), user’ video channels (YouTube, Vimeo), apps and other content which is difficult or impossible to crawl.

That said, if you locate individual social media posts of unique value, such as an Instagram post by a bot or a particularly relevant and ephemeral individual video, please submit them for consideration.

Nominations are welcomed using the following form.

The call for nominations will close on the 30th of June 2019. Crawls will be run during the summer 2019. Collection will be made available at the end of 2019.

 For more information about this collection, contact Tiiu Daniel (tiiu.daniel[at]nlib.ee).


Lead-Curators of CDG Artificial Intelligence Collection
Tiiu Daniel, Web Archive Leading Specialist, National Library of Estonia
Liisi Esse, Ph.D. Associate Curator for Estonian and Baltic Studies Stanford University Libraries
Rashi Joshi, Reference Librarian /Collections Specialist, Library of Congress

CDG Co-Chairs
Nicola Bingham, Lead Curator Web Archiving, British Library
Alex Thurman, Web Resources Collection Coordinator, Columbia University Libraries

Contribute to CDG’s Climate Change Collection!

By Kees Teszelszky, Curator Digital Collections, Koninklijke Bibliotheek – National Library of The Netherlands and Lead Curator, CDG Climate Change Collection

Climate change is one of the most urgent and hotly debated issues on the web in recent years. The IIPC Content Development Group is inviting all curators and web archivists from around the world to contribute websites to a new collaborative “Climate Change” collection.

Breiðamerkurlón
Breiðamerkurlón, Iceland

In recent decades there is has been strong evidence that the earth is experiencing rapid climate change, characterized by global temperature rise, warming oceans, shrinking ice sheets, glacial retreat, decreased snow cover, sea level rise, declining arctic sea ice, extreme weather events, and ocean acidification. Ninety-seven percent of climate scientists agree that these climate-warming trends over the past century are very likely due to human activities, and most of the leading scientific organizations worldwide have issued public statements endorsing this position (source: climate.nasa.gov/evidence). Global and local action to mitigate this crisis has been complicated by political, economic, technical, cultural, and religious debates.

Many people feel the urge to reflect on this topic on the web. We would like to take an international snapshot of born digital culture relating to documentation of and social debate on the challenging issue of climate change. You can contribute to this collection by nominating web content about any aspect of climate change, and the content can be focused on specific countries or cultures or have a global focus, and can be in any language.

We especially welcome contributions from underrepresented countries, cultures, languages and other groups, or those countries without IIPC members. Curators currently building climate change related collections at their own institutions are welcome to contribute their seeds (matching below criteria) to help us build a collection with an international perspective.

Examples of subtopics might include climatology, climate change denial, climate refugees, religious reflections on climate change, etc. Eligible types of web content include organizational reports or statements (i.e. from government agencies, NGOs, scientific or academic institutions, advocacy groups, political parties/platforms, businesses, religious groups) or more personal forms such as blogs or artistic projects.

Out of scope are: social media feeds (Facebook, Twitter, Instagram, YouTube channels, WhatsApp), video (YouTube, Vimeo), apps and other content which is difficult or impossible to crawl.

Collecting seeds started on 1 April 2019 and more nominations can be added to this spreadsheet. Crawls will be run during the summer of 2019, to conclude shortly after the upcoming UN Climate Action Summit on 23 September 2019.

Organized by the IIPC and supported by web archivists around the world, the special web collection ‘Climate Change’ is one of the ways the IIPC helps raise awareness of the strategic, cultural and technological issues which make up the web archiving and digital preservation challenge.

For more information about this collection contact Kees Teszelszky for more details: kees.teszelszky[at]kb.nl

IIPC Content Development Group: 2019 collections

By Nicola Bingham, Lead Curator Web Archiving, British Library and Co-Chair of the Content Development Working Group

During 2019, the Content Development Group (CDG) will continue to work on several established collections: 

New for 2019, the CDG is undertaking a Climate Change Collection, led by Kees Teszelszky of  the National Library of the Netherlands. The first crawl will take place before the General Assembly & the Web Archiving Conference in June, with a final crawl shortly after the next UN Climate summit in September. This collection has sparked a lot of interest on the CDG mailing list and many curators have expressed an interest in contributing.

We are also planning an Artificial Intelligence Collection, led by Tiiu Daniel of the National Library of Estonia, Liisi Esse of Stanford University Libraries and Rashi Joshi of Library of Congress. The details are still to be firmed up.

We are planning to crawl one of our collections, or a subset of a collection, in order that it can be used by researchers.

Results of the Steering Committee Election 2019

The following IIPC member organisations have been elected to serve for a period of three years starting on 1st of June 2019 –

On behalf of the membership I would like to thank all of those who have taken part in this election.

IIPC PCO

 

IIPC Steering Committee Election 2019: nomination statements

The Steering Committee is the executive body of the IIPC, currently comprising 15 member organisations. This year five seats are up for election/re-election. In response to the call for nominations  to serve on the IIPC Steering Committee for a three-year term commencing 1 June 2019, seven IIPC member organisations have put themselves forward:

An election will be held from 3 March to 31 March. The IIPC designated representatives from all member organisations will receive an email with instructions on how to vote. Each member will be asked to cast five votes. The representatives should ensure that they read all the nomination statements before casting their votes. The results of the vote will be announced on the Netpreserve blog and Members mailing list on 1 April. The first Steering Committee meeting will be held before the General Assembly in Zagreb, on 4 June.

If you have any questions, please contact the IIPC Programme and Communications Officer.


Nomination statements in alphabetical order:

Deutsche Nationalbibliothek / German National Library

As a member of the IIPC since 2007, the German National Library has always been particularly interested in preservation aspects and the representative Tobias Steinke is co-lead of the Preservation Working Group. The selective web archive of the German National Library started in 2012. Its workflow is based on a co-operation with the service provider oia and does not include the common open source tools, which could give the IIPC a different perspective and help to represent the various members.

 

Internet Archive

Internet Archive seeks to continue its role on the IIPC Steering Committee. As the oldest and largest publicly-available web archive in the world, a creator and ongoing developer of many of the core technologies used in web archiving, and an original founding member of the IIPC, Internet Archive plays a key role in advancing web archiving and fostering broad community participation in preserving and providing access to the web-published records that document our shared cultural heritage. Internet Archive has also served in a variety of leadership and program roles within the Steering Committee since IIPC’s formation. In continuing this active role on the IIPC Steering Committee, Internet Archive will contribute to furthering the IIPC’s strategic initiatives building a collaborative framework to advance web archiving and grow and diversify the IIPC’s membership. The web is the most significant communication platform of our era — it is also one that can only be preserved and made accessible through broad-based, multi-institutional efforts lead by organizations such as the IIPC. By extending our role on the IIPC Steering Committee, Internet Archive will continue its participation in the knowledge-sharing and leadership that supports the IIPC and the broader community in its ongoing efforts to preserve the web.


 

Landsbókasafn Íslands – Háskólabókasafn / National and University Library of Iceland

The National and University Library of Iceland is interested in serving another term on the IIPC Steering Committee. The library has had an active web archiving effort for nearly two decades. Our participation in the IIPC has been instrumental in its success.

As one of the IIPC‘s smaller members, we are keenly aware of the importance of collaboration to this specialized endeavor. The knowledge and tools that this community has given us access to are priceless.

We believe that in this community active engagement ultimately brings the greatest rewards. As such we have participated in projects, including Heritrix and OpenWayback. We have hosted IIPC events, including the 2016 GA/WAC and an upcoming hackathon in April. And we have provided leadership in various areas, including in working groups, SC chair (2008) and our SC representative is currently in charge of the tools portfolio.

If re-elected to the SC, we will aim to continue on in the same spirit.


 

Library of Congress

The Library of Congress (LC) has been involved in web archiving for almost 20 years, building a variety of thematic and event-based collections for its web archives. LC has worked collaboratively with national and international organizations on collections, preservation tools and workflow processes, while developing in-house expertise and curatorial tools to enable effective collection and management of over 1.7 petabytes of web content collected to date. As a founding member of IIPC, LC has served in a variety of leadership roles, currently as SC member, Preservation WG and Training WG co-chair, and in prior years as SC Chair, Communications Officer, Content Development Group co-chair, and on the Membership Engagement portfolio, and helped secure a new fiscal agent. If re-elected, the LC looks forward to continuing to focus on developing a web archiving training program, encouraging new opportunities for membership engagement and funding opportunities for member projects. We will continue to participate in discussions around preservation, tools, and processes that will enable us all to work more efficiently and collaboratively as a community, and look forward to engaging in activities and discussions that will help strengthen the IIPC for the future and next membership agreement.


 

National Library of Australia

The National Library of Australia (NLA) was a founding IIPC member and Steering Committee member until 2009, hosting the second general committee meeting in Canberra in 2008. In 2004 the NLA organized the first major international conference on web archiving for cultural institutions. The NLA’s experience and leadership in web archiving goes back to 1996 with the establishment of PANDORA, one of the first collaborative web archiving programs.  The NLA has been a continuous IIPC member and has actively contributed expertise to the preservation working group.

The NLA strengths include experience in operational maturity, sustainability and open access through its web archiving program which embraces selective, domain and bulk collecting methods. The NLA has a strong commitment to, and experience with, collaborative web archiving through PANDORA.  The NLA has a demonstrated record with innovation, building the first selective web archiving workflow systems (PANDAS) and the recent ‘outbackCDX’ tool providing efficiency for managing indexing. In March 2019 the NLA launched the Australian Web Archive, which made the whole .au web archive fully accessible and openly searchable in Trove.  The NLA believes it is time for Australia to rejoin the IIPC leadership adding southern hemisphere representation and experience to the Steering Committee.


 

National Library of New Zealand / Te Puna Mātauranga o Aotearoa

National Library of New Zealand’s mandate to preserve New Zealand’s social and cultural history includes:

  1. A legal mandate to perform web harvests under the National Library of New Zealand Act 2003)
  2. A social responsibility to develop collections (including digital collections) reflecting the social, cultural, economic and other endeavours of New Zealanders.

The Library has a programme of selective web harvesting and has conducted eight whole of domain ‘snapshots’ since 2008. We are also experimenting with Twitter, focusing on hashtag crawls of major NZ events or activities considered culturally important (e.g. Kaikoura Earthquake, GE2017, Moko Kauae, Grace Millane, Te Matatini, Nelson Fires). The Library is also collaborating with the National Library of the Netherlands on the ongoing enhancement and development of the Web Curator Tool.

National library has been a continuous member of IIPC since 2007 and has previously been a member of the IIPC Steering Committee. Having recently appointed a dedicated web archiving role to the Library’s digital preservation team we now feel that we are able to contribute more fully to the work of the IIPC, and we feel that membership of the IIPC Steering Committee is one of the ways that we can contribute.


 

Stanford University Libraries

We have concluded our three-year term on the Steering Committee and appreciate your consideration for serving another term. IIPC has progressed notably in these three years. Our private, member-focused GA has been eclipsed by an increasingly visible and rigorously-curated WAC. IIPC as an organization has befittingly matured as well, re-administering itself under CLIR’s fiscal sponsorship. These changes reflect opportunities to continue to evolve IIPC from its start as a largely inward-looking, homogeneous cadre of collaborating member institutions to a professionalized organization more keenly focused on the diversification of participating stakeholders and advancement of web archiving practice broadly.

We are interested in continuing to move IIPC in this direction, in keeping with the vision presented by Jefferson Bailey as outgoing Chair. As a consistent contributor to IIPC activities and goals, we can be counted on to “do the work.” Our tangible contributions to date include serving as Treasurer, serving as Training Working Group co-chair, chairing the 2017 WAC Program Committee, organizing and co-hosting the 2015 GA and WAC, and serving on every WAC Program Committee since 2015.

A New Release of Heritrix 3

By Andy Jackson, Web Archiving Technical Lead at the British Library

One of the outcomes of the Online Hours meetings has been an increase in activity around Heritrix 3. Most of us rely on Heritrix to carry out our web crawls, but recognise that to keep this large, complex crawler framework sustainable we need to try and get more people use the most recent versions, and make it easier for new users to get on board.

The most recent ‘formal’ release of Heritrix 3 was version 3.2.0 back in 2014, but a lot has happened since then. Numerous serious bugs have been discovered and resolved, and some new features added, but only those of us running the very latest code were able to take advantage of these changes.

Those of us who would rather base our crawling on a software release rather than building from source have been relying on the stable releases built by Kristinn Sigurðsson, and hosted on the NetarchiveSuite Maven Repository. This worked well for ‘those in the know’, but did little to make things easier for new users.

In an attempt to resolve this, and in coordination with the Internet Archive, we have started releasing ‘formal’ versions of Heritrix, culminating in the 3.4.0-20190207 Interim Release. This new release believed to be stable, and is recommended over previous releases of Heritrix 3. As well as being released on GitHub, it is also available through the Maven Central Repository, which should make it easier for others to re-use Heritrix.

You may notice we’ve added a date to the version tag. Traditionally, Heritrix 3 has used a tag of the form “X.X.X”, which gives the impression we are using a form of Semantic Versioning. However, that does not reflect how Heritrix is evolving. Heritrix is a broad framework of modules for building a crawler, and has lots of different components of different ages, at different levels of maturity and use. Given there are only a small number of developers working on Heritrix, we don’t have the resources to guarantee that a breaking change won’t slip into a minor release, so it’s best not to appear to be promising something we cannot deliver.

This means that, when you are upgrading your Heritrix 3 crawler, we recommend that you thoroughly test each release using your configuration (your ‘crawler beans’ in Hertrix3 jargon) under a realistic workload. If you can, please let us know how this goes, to help us understand how reliable the different parts of Heritrix 3 are.

As well as making new releases, we have also moved the Heritrix 3 documentation over to GitHub to populate the Heritrix3 wiki, and shifted the API documentation to a more modern platform. We hope this will help those who have been frustrated by the available documentation, and we encourage you to get in touch with any ideas for improving the situation, particularly when it comes to helping new users get on board.

If you want to know more, please drop into the Online Hours calls or use the archive-crawler mailing list or IIPC Slack to get in touch. To join IIPC Slack, submit a request through this form.

Passing the Torch

By Jefferson Bailey, Internet Archive

Dear IIPC Community,

As of January 1, 2019, my term as Chair of the IIPC came to a close. Having served as Chair since September 2017 and, prior to that, as Vice Chair from April 2016 (during the excellent leadership of Emmanuelle Bermès of BNF), I have seen the IIPC continue to grow and evolve. It has been a privilege to serve in these roles during this exciting time. While I will continue to serve as a regular Steering Committee (SC) member, I wanted to take this transitional moment to reflect on the successes and ongoing work of both the SC and the IIPC. The centrality of the web as a communication and publication platform only increases by the day and the work of the IIPC and its members becomes ever more critical in documenting history, preserving knowledge, and interrogating privilege and power. There is always more work to be done.

Before reflecting on recent progress and future directions, I want to give a big thanks to my co-Officers. Vice Chair Sylvain Bélanger of Library Archive Canada and Treasurer Tom Cramer of Stanford University Libraries both worked to advance IIPC’s mission and operations. As well, Program and Communications Officer Olga Holownia worked, and continues to work, tirelessly to support the overall activities of the consortium. Thanks go as well to the SC members that volunteer their time and to the many regular members that actively contribute to Working Groups (WGs), committees, portfolios, etc, and who keep the IIPC a dynamic forum for sharing knowledge and practices. Lastly, I look forward to the great team of new SC Officers, Chair Hansueli Locher of the Swiss National Library, Vice Chair Mark Phillips of University of North Texas, and Sylvain serving as Treasurer. The near-term future of IIPC is in good hands.

In my time as Vice Chair and Chair, IIPC has continued to add new members and expand its activities. Here is my reflection on areas of recent progress and further effort:

Areas of Recent Progress

A Maturing Organization

It is well known that IIPC faced many financial and operational difficulties related to the unforeseen inability of BNF to continue to provide financial and accounting support for IIPC in 2017, after many years of admirably providing this service for IIPC without recompense. We all owe thanks to British Library and to Olga for enabling the 2017 conference to happen, even in a moment of financial uncertainty. From crisis came positive change, as myself and Abbie Grotke of Library of Congress were able to arrange an agreement with the Council of Library and Information Resources (CLIR) to provide professional fiscal sponsor services for IIPC. CLIR is a wonderful supporter of the library community, has proven an excellent fiscal agent, and we are excited to establish this relationship and expect it to be a foundation for further collaborations.

Much work was also done by Officers to implement a suite of protocols and procedures around invoicing, member onboarding, financial tracking, vendor and expense payments, and other basic budgeting and organization management. Many of these processes were previously unenforced or nonexistent and caused a notable strain on IIPC’s limited staffing. Professionalization of finances and many operations should allow IIPC to focus more on its core mission – delivering member value and advancing preservation of the internet!

Premier Events

The past few years also featured improvements in the planning and management of the GA and WAC conference, including more seamless planning workflows, more budgetary autonomy for hosts, the exploration of sponsorships, registration fees, and event planning services, and other efficiency and sustainability approaches. The IIPC WAC continues to be the premier event for web archiving, and many attendees noted that the 2018 GA/WAC hosted by National Library of New Zealand was one of the best conferences so far. Proposal submissions, sessions, and attendance all continue to grow and the quality of the event remains superlative. The 2019 event at the  National and University Library in Zagreb, Croatia will continue the trend. Other workshops, forums, and programming also continued IIPC’s essential role in providing the best venue for discussion of web preservation and access issues.

Member Activity

A number of new initiatives, as well as growth in existing projects, signaled that member engagement and contribution remains high. From the new Training Working Group, to an extensive Member Engagement Survey, to the growing collaborative collections of the Content Development Group, to many other formal and informal activities, IIPC members remain active in the organization. We are hoping the stability mechanisms of the past few years have enabled even more ways for members to participate and contribute.

Areas of Further Effort

Organizational Maturity

Though, per above, great strides were made in professionalizing many activities, other areas of operations also need to evolve to account for IIPC’s growth and strategic aspirations. The challenges related the fiscal agent transition illuminated broader circumstances related to IIPC’s growth over the years – namely that critical operational and administrative functions can no longer be dependent on the the unpredictable contributions or internal decisions of individual member institutions. The model of member-contributed operational support made sense when IIPC was one or two dozen members. With over 50 members, a growing portfolio of activities, and nearly 200,000 EUR in annual member dues, IIPC has outgrown such an arrangement. All core functions of IIPC – from finances to operations to staffing – need to operate autonomously and independent of individual members to ensure a successful, ongoing provision and continuation of services and obviate conflicts of interest. There are many arrangements that can be pursued to support this self-sufficiency and IIPC is blessed with a large financial reserve that can help advance this effort. Work to achieve this self-reliance will no doubt be a focus of the SC in the coming years.

Scaling Participation

As I noted in my Chair’s address at the General Assembly, IIPC is poised to pivot to focusing on resiliency, member benefits, and strategic investment. I had fantastic conversations over the years with members about ideas for IIPC to deliver value to members via new activities and investments. As part of these conversations, I devised with feedback from SC, a “Discretionary Funding Program” (see link above) to invest a significant portion of IIPC’s reserve funds to support member-proposed and member-managed projects. Expect more news about this program soon.

IIPC also needs to invest resources to encourage a broader involvement of members in leadership positions. There is very little turnover in institutional representation on the SC. As well, Officer roles have also been held by an even smaller number of institutions historically, and there was no self-nomination for Chair during this year’s nomination period (thanks go to Hansueli for stepping up after the year started with the role vacant). To remain vibrant and reflective of its community, representation of more members is needed at the Steering, Officer, Working Group chairs, and other elected and self-nominated positions. Term limits, limitations on consecutive terms served by an institution, leadership stipends, more clearly defined expectations of service, or other formal or informal inducements are ideas that could bring fresh perspectives and new ideas to SC or WG leadership roles. Like with operations, IIPC’s governance needs to evolve and adapt to introduce new voices and vibrancy to our growing organization.

Member Diversity

The web is a global and, in some ways, borderless phenomenon. Yet one only need to look at the IIPC membership map to recognize that vast portions of the globe are underrepresented in IIPC and, likely, in the global web collection we are all working to build. As well, web preservation is increasingly a concern of institutions beyond just national libraries and research universities. There is surely momentum and engagement to be found in scaling IIPC membership and activities both vertically (inclusive of organizations of differing size, mandates, and missions) and horizontally (inclusive of underrepresented regions and nations). Building a truly global organization, as well as a diverse, inclusive preserved record of the web, will require participation far beyond North America and Europe. Subsidized membership rates, diversity scholarships or travel funding, and targeted partnerships, outreach, or participation tools  are just some ideas or activities that can started or scaled. I am excited to work with colleagues over the coming year to propose a number of ideas for improving diversity within IIPC.

In summary, I had four goals when assuming SC Officer roles: get IIPC’s house in order, improve operations, scale support of member-driven projects, and diversify membership and leadership. I think notable progress was made on three of those four and more time will allow diversity initiatives to gain traction. While some of this progress was behind-the-scenes or is soon-to-be-released, hopefully it has helped IIPC grow and thrive. The new leadership team will no doubt continue this trend.

Keep on crawlin’!

Jefferson Bailey
Internet Archive
IIPC Steering Committee