Non-print Legal Deposit Law approved in Spain

By Mar Pérez Morillo
Jefe del Área de Gestión del Depósito de Publicaciones en Línea, Biblioteca Nacional de España

Last Friday the Spanish Council of Ministers approved the royal decree to regulate the legal deposit of online publications.

In the Legal deposit law of 2011 the online documents were considered objects of legal deposit for the first time in Spain.

The variety and complexity of this kind of publications led to the writing of a legal text (a royal decree) that developed the law and regulated the procedures and details to manage their legal deposit.

In the current technological environment, being the World Wide Web the main way for the dissemination of information, national libraries and archives along with university libraries and research institutions all over the world have been preserving for years the huge documentary heritage that is in internet. The legal deposit has been the instrument used along the centuries to build this documentary heritage on physical formats. Since years, many countries have legislated on the legal deposit of online publications, considering them part of this heritage to be preserved.

Given their special characteristics, the huge amount of them and thus the inability of exhaustiveness when capturing, storing and preserving them, the royal decree just approved in Spain introduces some important differences with the print legal deposit:

  • The publishers are not the ones to deposit the publications but the deposit libraries are the ones to demand from publishers the publications to be deposited.
  • No legal deposit number will be assigned to online publications.
  • The main way to deposit is the automated crawl of the web.
  • When the information is not publicly available online, but is part of a database or is protected by user and password, the curator centres –deposit libraries- (national and regional libraries with competence on legal deposit) will request publishers to deliver the publications.

In advance, the National Library of Spain has been crawling and archiving the Spanish web from 2009 to 2013 thanks to a contract with Internet Archive. The results were eight .es domain crawls and two selective crawls on Humanities and General Elections in 2011. In 2014, the Library adopted and installed NetarchiveSuite as its web archiving tool, and since then several selective crawls have been run on historical and cultural events in Spain, like the death of the President Suárez, the abdication of the King Juan Carlos I, the proclamation of Felipe VI, the European Elections in 2014 and the regional and local elections in May 2015, among others.

Although this was possible under the umbrella of the previous legal deposit law (1957), the royal decree now approved specifically enhances the regional deposit libraries and the National Library of Spain to crawl the web and to request every online publication considered part of the Spanish documentary heritage, to fulfill their mission of preserving it for future generations.
This is the end of a long and winding road, since the first version of the royal decree was drafted in 2012. Since then, many governmental institutions, publishers, individuals and all the sectors involved have sent their comments and allegations to the text.

This would not be a reality today without the support of all of them, but specially the public entity Red.es and the Secretary of State for Telecommunications in Spain, and the IIPC and the NAS community.

This is also the beginning of a long road (hopefully not winding). The success of our preserving mandate relies greatly on the collaboration between libraries and all the stakeholders.

BNE blog post

Internet Memory Research launches MemoryBot

IMRInternet Memory Research is pleased to announce that our new crawler, MemoryBot, has now gained full maturity.

With it, we completed a first map of the Web and we would like to share some early results on this experiment.

With only few small servers and 4 weeks time, we were able to crawl over 2+ billions resources with the objective to discover as many domains as possible. Overall, over 60+ millions of domains have been discovered, which represent about half of active domains in the world (the rest is mostly composed of parking sites and other types of empty domains).

In addition, we were able to process several types of analysis on this material thanks to the current Hadoop and Flink based architecture of our archive.
Among other things, we used machine learning to classify domains by type or genre (News, Forums, Blogs, E-commerce, etc.).

The other good news is that, thanks to many improvements in both the overall efficiency and stability, the cost of such crawls has been divided by two. Accompanied by the fall of storage costs, global crawls are becoming much more affordable and we hope it will benefit to more and more institutions.

More details will be published later on this, but we wanted to share this early update with the web archiving community.

By Chloé Martin (COO) of Internet Memory Research

10 years anniversary of the Netarchive (Netarkivet), the Danish national web archive

The Royal Library in Copenhagen and the State and University Library in Aarhus are happy to announce the 10 years anniversary of the Netarchive (Netarkivet), the Danish national web archive.

netarkivet

In July 2005, a new legal deposit law came into force: materials “published in electronic communication networks” became part of the legal deposit, that is to say, collecting and preserving “the Danish part of the Internet” now was issued by law. In the same year, the Netarchive joined the IIPC.

In the early years of the Netarchive, we focused on collection building and strategies: how to manage 4 broad crawls a year and choosing about 100 sites to be harvested selectively.  At the end of 2005, we finished our first broad crawl – it took almost a year. In 2007, our first systematic set of selective crawls was in place, we had a first dialogue with Facebook about harvesting Danish open profiles, we released NetarchiveSuite as an open source web curator tool and we gave access to the archived material to the first researchers.

In 2008, we started harvesting e-books, and the French National Library  and the Austrian National Library joined the NetarchiveSuite development project. In 2009, the first Ph.D. student graduated with a project based on the Netarchive. In 2010, we participated in the first IIPC collaborative collection (Winter Olympic Games). In 2011, we established access through the Wayback Machine and started a special collection on online games.

In 2012, we fulfilled our objective of carrying out four broad crawls a year and began with a special collection on YouTube videos. In 2013, we established access on the premises for eligible master students in their final year and we developed a solution, which makes selected electronic publications from ministries and official agencies accessible to the public via persistent links from The Administrative Library’s catalogue. In 2014, we started indexing for full text search in the whole archive and performed our largest event harvest ever of the European Song Contest, hosted in Denmark.

Erland Kolding Nielsen (director of the Royal Library), cutting the ribbon to the full text search
Erland Kolding Nielsen (director of the Royal Library), cutting the ribbon to the full text search

The birthday gift to our users and to ourselves is the full text searchable archive!

Netarkivet har fejret 10 års fødselsdag

Thank you for cooperation and feedback during all these years.

On behalf of the Netarchive Team

Sabine Schostag, Web curator, NETARCHIVE, STATE AND UNIVERSITY LIBRARY