From pilot to portal: a year of web archiving in Hungary

National Széchényi Library started a web archiving pilot project in 2017. The aim of the pilot project was to identify the requirements of establishing the Hungarian Internet Archive. In the two years of the pilot phase, some hundred cultural and scientific websites were selected and published with the owners’ permission. The Hungarian Web Archive (MIA) was officially launched in 2017. The Library joined the IIPC in 2018 and the Hungarian Web Archive was first introduced at the General Assembly in Wellington in 2018. Last year, the achievements of the project were presented at the Web Archiving Conference (WAC) in Zagreb, in June 2019. This blog post offers a summary of some key developments since the 2019 conference.


By Márton Németh, Digital librarian at the National Széchényi Library, Hungary

In just about a year, we moved from a pilot project to officially launching our web archive, running a comprehensive crawl and creating special collections. In May 2020, the Hungarian parliament passed the modifications of the Cultural Law which allows us to run web archiving activities as a part of its basic service portfolio. Over the past year we have also organised training and participated in various collaborative initiatives.

Conferences and collaborations

In the summer just after the Zagreb conference, we could exchange experiences with our Czech and Slovak colleagues about the current status and major development points of web archiving projects in the Czech Republic, Slovakia and Hungary in the Visegrad 4 Library Conference in Bratislava. Our presentation is available from here. In the autumn, at the annual international conference of digital preservation in Bratislava, we could elaborate on our basic thoughts about the potential use of microdata in library environment. The presentation can be downloaded from here.

At the Digital Humanities 2020 conference in Budapest, Hungary, we organized a whole web archiving session with presentations and panel discussions together with Marie Haskovcová from the Czech National Library, Kees Teszelszky from the National Library of the Netherlands, Balázs Indig from the Digital Humanities Research Centre of Loránd Eötvös University and with Márton Németh from the National Széchényi Library. The main aim was to get a spotlight on Digital Humanities research activities in the web archiving context. Our presentation is available from here.

Training

Our annual workshop in the National Széchényi Library focused on the metadata enrichment of web archives, crawling and managing local web content in university library and city library environments, crawling and managing online newspaper articles and setting the limits of web archiving in research library environments.

We also run several accredited training courses for Hungarian librarians and summarized our experiences in web archiving education field in an article published by Emerald. The membership in the IIPC Training Working Group has offered us valuable experiences in this field.

Domain crawl and new portal

We had run our second comprehensive harvest about a large segment of the Hungarian web domain in the end of 2019. The robot had started on 246.819 seed addresses and crawled 110 million URL-s in less than eight days with 6,4 TB storage.

Our original project website was the first repository of resources related to web archiving in Hungarian. In 2019 we built a new portal. This new website serves as a knowledgebase in web archiving field in Hungary. Beyond the introduction to the web archive and to the project, separate groups of resources (info-materials, documents etc.) are available for every-day users, for content-owners, for professional experts and for journalists. It is available at https://webarchivum.oszk.hu.

https://webarchivum.oszk.hu
webarchivum.oszk.hu

We created a new sub-collection in 2019-2020 on the Francis II Rákóczi Memorial Year at the National Széchényi Library (NSZL), within the framework of the Public Collection Digitization Strategy. Its primary goal was present the technology of web archiving and the integration of the web archive with other digital collections through a demo application. The content focuses on the webpages and websites related to the Memorial Year, to the War of Independence, to the Prince and to his family. Furthermore, it contains born digital or digitized books from the Hungarian Electronic Library, articles from the Electronic Periodical Archives, photos, illustrations and other visual documents from the Digital Archive of Pictures. The service is available on the following address: http://rakoczi2019.webarchivum.oszk.hu.

OSZK-figure2
rakoczi2019.webarchivum.oszk.hu

Legislation and new collections

In May 2020 the Hungarian parliament passed the modifications of the Cultural Law that entitles the National Széchényi Library to run web archiving activities as a part of its basic service portfolio. Legal deposit of web materials will also be established. The corresponding governmental and ministerial decrees will appear soon, all the law modifications and decrees will be in effect from 1 January 2021.

We made our first experiment of harvesting various materials from 700 pages with more than 100.000 posts from Instagram using the Webrecorder software. We are running event-based harvests too about COVID-19, Summer Olympic Games, Paris Peace Conference (1919-1920). We are joining also to the corresponding international IIPC collaborative collection development projects.

Next steps

Supported by the framework of the Public Collection Digitization Strategy we could start to develop a collaboration network with various regional libraries in Hungary in order to collect local materials for the Hungarian Web Archive. Hopefully, we will summarize our first experiences during our next annual workshop in the autumn and we can further develop our joint collection activities.

One thought on “From pilot to portal: a year of web archiving in Hungary

Leave a comment