Resilience, renewal and creating future pasts through web archiving: interview with Dr Paul Koerbin

Paul Koerbin is the Assistant Director of web archiving at the National Library of Australia (NLA). He is one of the pioneers of web archiving and has been involved in the IIPC since its inception in 2003, including the early meetings hosted in Canberra in 2004 and 2008 and the first Researchers Requirements Working Group meeting in London in 2004. He has represented NLA on the IIPC Steering Committee (SC) since 2018, served as Vice-Chair in 2020, chaired the WAC Program Committee (PC) in 2018 and represented the SC on the PC in the following years. Paul is also one of the custodians of the oral history of web archiving.


“Resilience is really fundamental to a sustainable web archiving programme. And then renewal. You know what’s important now, more than ever, is innovative approaches to the challenges we face whether that’s how programmes are organised within institutions or collaborations that we can form.”

From serials to social media

Olga Holownia: The pretext for this interview is your retirement and the IIPC 20th anniversary. You are one of the “oral historians” of web archiving. You have been involved in the web archiving program at the National Library of Australia since its inception in the mid-1990s and in the IIPC since its early days. If I remember correctly, the first web archivists at NLA were curators of the serials collections and this was one of the approaches used for the earliest web archive collections. In our anniversary video spotlight, you say that in the early days, web archiving was “simpler because the materials we were dealing with were simpler, the web was simpler.” As you look back, what would you consider to be the most significant moments in this journey from serials to social media?

Paul Koerbin: Yes, the web archiving team (at first known as the ‘electronic unit’ and later as ‘digital archiving section’) was established within the serials cataloguing team, where I began at the NLA. That was the ‘Australian Serials and Electronic Unit’. This was established before we actually did any harvesting. It started by selecting, cataloguing, and scoping future harvests (when we had the infrastructure to do it). So, in this context, I would suggest as one of the most significant moments – although more than a single moment – being the first actual harvesting we did, getting that very early content. But, perhaps more important, since it goes to the whole idea of resilience and sustainability, was building the specifications and then the application that became our web archiving workflow management tool (PANDAS). This, I believe, was one of, if not the, first such bespoke workflow tools for web archiving. The significance of this was to turn web archiving, quite quickly, into an operational activity, not simply a project. A quarter of a century on, the renewal of our web archiving workflow system as the fourth generation PANDAS in the past three years I think is just as significant for our web archiving operation by renewing and making the tool ever more adaptable, sustainable and fit for purpose.

On the matter of archiving being ‘simpler’ in the early days. I think I mean it was less complex, not necessarily easier. We were starting from ground zero then, building workflows, policy, and infrastructure. And the harvesting tools were obviously not as sophisticated. So, that was not simple, but a website in the early days could in many practical respects be treated like publications we were already familiar with, so perhaps what we had to deal with was conceptually simpler. However, with Web 2.0 and, of course, the later emergence of social media, the target medium has become so much more complex and both conceptually and practically more challenging to archive. Add to that the issues of preserving, managing, and making accessible huge complex collections, I think in many ways the past does look ‘simpler’. I would add that these changes are what has kept many of us involved in this enterprise for so long. It is constantly challenging and never dull!

NLA-Pandora-1998
This is how PANDORA was described in a page from February 1998. https://webarchive.nla.gov.au/awa/19980205191902/http://www.nla.gov.au/pandora/

Resilience and renewal

OH: The Australian Web Archive is one of the oldest ones in the world and it’s listed in the Australian Register UNESCO Memory of the World Program. What would you consider to be the crucial decision in the development and renewal of the web archiving programme?

PK: I think a crucial decision made at the very beginning of the NLA’s web archiving initiative was to treat it as a programme rather than a project. There was a 6 month period at the very beginning when it was considered a scoping project, but by the end of 1996, it was a programme incorporated into operation of the NLA’s core business to comprehensively collect Australia’s cultural heritage. I think that was visionary for the time and a crucial decision. The other decision I would  mention, perhaps more a way of working rather than a stated decision, was to take a radical incremental approach to the development of infrastructure. By that I mean we did not try and solve all the problems and issues before building the applications to get operational. They were not perfect but they gave us experience and, importantly, allowed us to collect material from as early as late 1996. I think this approach built renewal – by increments – and resilience – by experience – into our web archiving programme.

OH: What have been the most rewarding experiences for you in your tenure in web archiving?

PK: Of course so much of the experience of being part of a small team over the years building the programme and building the collection of web content has been immensely rewarding. But I will highlight a couple of things: firstly, I was very involved in the framing of legislation extending legal deposit to online materials that came into force in 2016. This was the culmination of 20 years of the NLA trying to get this amendment to the legislation. The final days of preparing the bill with drafting experts in our portfolio department were exhilarating. What we emerged with was, on the whole, a broadly applicable, workable, and very successful piece of legislation. The passing of that legislation led in turn to the NLA reviewing its management of risk and the opening of access to our entire web archiving collection in 2019 as the Australian Web Archive through the Trove discovery service.

The other personally rewarding experience I would highlight would be the opportunity and privilege I have had in representing the NLA’s achievements in web archiving to the international community, through the IIPC. The NLA was a founding member of the IIPC and, despite Australia’s remoteness from the majority of the web archiving activities in the northern hemisphere, we have tried to maintain an international presence. I have found it very rewarding to have been given the responsibility by the NLA over the past 20 years to explain, promote, and represent our web archiving achievements and experiences  both internationally and in Australia.

“We can’t solve all the problems alone nor recognise all the opportunities by ourselves”

OH: You mentioned the importance for NLA to be part of the international community right from the start. The IIPC has also greatly benefited from NLA’s engagement, not least with respect to creating the model for a curator workflow tool in PANDAS in the early years and more recently with open source tools, notably OutbackCDX developed by Alex Osborne. In what ways did the international collaboration contribute to the advancement of the web archiving programme in Australia?

PK: I think being part of the international web archiving community is important for the renewal of our web archiving program. At one time the NLA was a world leader and at other times we could see that there were areas where we were, perhaps, not doing so well. We can’t solve all the problems alone nor recognise all the opportunities by ourselves. I think this is the greatness of the IIPC forum. It is perhaps less about adopting tools others have developed or contributing tools to the community, though there is that too; rather it is so much about the sharing of and awareness of the diversity of approaches and the opportunities and ideas we can bring back to our own organizations and programmes. Personally, I have felt that international engagement, and seeing the achievements of others, has helped to keep me enthusiastic and driven over a long period of time to improve and promote our activities.

NLA2024-IIPC-meeting
IIPC Steering Committee meeting, Canberra, 15 November 2004. Clockwise from right: Julien Masanes (Bibliotheque nationale de France); Caroline Wiegandt (Bibliotheque nationale de France); Pam Gatenby (NLA); Margaret Phillips (NLA); Hans Kristian Mikkelsen (Royal Danish Library); Martha Anderson (Library of Congress); Yvette Hackett (Library and Archives Canada); Svein Arne Solbakk (National Library of Norway) and Mark Middleton (British Library). 
Photo: Damian McDonald, National Library of Australia

Web archiving and its processes can be understood as  a methodically and purposefully constructed taphonomy of the web.

Future pasts

OH: “Future pasts” was one of the topics you suggested for the 2022 Web Archiving Conference. Let me ask you your own question: how are web archives framing future perceptions of the past?

PK: Web archiving is very much an activity of taking a snapshot of the dynamic, ephemeral and relentless ‘now’ that is the world wide web, and undertaking to preserve that for a future we are yet to know. Those who will come after us will only have those fragments of the web that we have archived to understand that past. Web archiving and its processes can be understood as a methodically and purposefully constructed taphonomy of the web. The nature of the web is that it is not going to leave a trace of itself at any given time without this intervention. That is how important our work is. So much of our social discourse happens online and so much of the important ‘grey literature’ that forms the basis of social policy is published online. Without access to this, those who come after us will have no historical perspective. So, of course, this is also our tremendous responsibility, since what we choose to collect and what we are able to collect and preserve – let us remember the persistent technical, legal and resources constraints that continue to limit our activities – will frame how the future looks at the past through what we have collected and how we have preserved and provided access to it.

IIPC-WAC2018-NLNZ-keynote
IIPC Web Archiving Conference at the National Library of New Zealand. Wellington, 13 November 2018. Keynote by Dr Rachael Ka’ai-Mahuta titled Te Māwhai – te reo Māori, the Internet, archiving, and trust issues.
Photo: Mark Beatty.

OH: Thank you for taking the time to answer my questions today and over the past 7 years. I have always regarded you as one of the oral historians of web archiving and you’ve always helped me fill the gaps in the early history of the IIPC. To finish off on a less sombre note, does your retirement mean that you will now have more time to dedicate to composing music for “Gumboots and Consequences”?

PK: I deny everything! (Note to readers: you had to be involved in organizing the 2018 GA-WAC in Wellington to understand the reference here. Vale the great John Clarke.) On a serious note, however, I do hope that in the near future a major IIPC event can again be hosted in Australia. The major international conference that the NLA organized in 2004 was an important milestone and the General Assembly and Conference in Wellington in 2018 (which the NLA co-sponsored and which I was involved with as co-chair of the programme committee) was a great success for the region bringing the international web archiving community to the southern hemisphere. I am hopeful that there will be another IIPC event in this region sooner rather than later.

IMG_7167
Paul Koerbin’s closing remarks at the 2018 General Assembly hosted by the National Library of New Zealand. Wellington, 12 November, 2018.
Photo: Olga Holownia, IIPC.

References:

Leave a comment