Non-print Legal Deposit Law approved in Spain

By Mar Pérez Morillo
Jefe del Área de Gestión del Depósito de Publicaciones en Línea, Biblioteca Nacional de España

Last Friday the Spanish Council of Ministers approved the royal decree to regulate the legal deposit of online publications.

In the Legal deposit law of 2011 the online documents were considered objects of legal deposit for the first time in Spain.

The variety and complexity of this kind of publications led to the writing of a legal text (a royal decree) that developed the law and regulated the procedures and details to manage their legal deposit.

In the current technological environment, being the World Wide Web the main way for the dissemination of information, national libraries and archives along with university libraries and research institutions all over the world have been preserving for years the huge documentary heritage that is in internet. The legal deposit has been the instrument used along the centuries to build this documentary heritage on physical formats. Since years, many countries have legislated on the legal deposit of online publications, considering them part of this heritage to be preserved.

Given their special characteristics, the huge amount of them and thus the inability of exhaustiveness when capturing, storing and preserving them, the royal decree just approved in Spain introduces some important differences with the print legal deposit:

  • The publishers are not the ones to deposit the publications but the deposit libraries are the ones to demand from publishers the publications to be deposited.
  • No legal deposit number will be assigned to online publications.
  • The main way to deposit is the automated crawl of the web.
  • When the information is not publicly available online, but is part of a database or is protected by user and password, the curator centres –deposit libraries- (national and regional libraries with competence on legal deposit) will request publishers to deliver the publications.

In advance, the National Library of Spain has been crawling and archiving the Spanish web from 2009 to 2013 thanks to a contract with Internet Archive. The results were eight .es domain crawls and two selective crawls on Humanities and General Elections in 2011. In 2014, the Library adopted and installed NetarchiveSuite as its web archiving tool, and since then several selective crawls have been run on historical and cultural events in Spain, like the death of the President Suárez, the abdication of the King Juan Carlos I, the proclamation of Felipe VI, the European Elections in 2014 and the regional and local elections in May 2015, among others.

Although this was possible under the umbrella of the previous legal deposit law (1957), the royal decree now approved specifically enhances the regional deposit libraries and the National Library of Spain to crawl the web and to request every online publication considered part of the Spanish documentary heritage, to fulfill their mission of preserving it for future generations.
This is the end of a long and winding road, since the first version of the royal decree was drafted in 2012. Since then, many governmental institutions, publishers, individuals and all the sectors involved have sent their comments and allegations to the text.

This would not be a reality today without the support of all of them, but specially the public entity Red.es and the Secretary of State for Telecommunications in Spain, and the IIPC and the NAS community.

This is also the beginning of a long road (hopefully not winding). The success of our preserving mandate relies greatly on the collaboration between libraries and all the stakeholders.

BNE blog post

Advertisements

IIPC GA2015 – Videos of the Conference

All of the presentations from the Open Conference (Mon 27 Apr 2015) and most from the Open Workshop (Tue 28 Apr 2015) are available at the IIPC Youtube Channel.

The Opening Keynote by Vinton Cerf, Chief Internet Evangelist, Google and Mahadev Satyanarayanan, Carnegie Group Professor, Carnegie Mellon University.

Note: We were unable to capture a few of the sessions at the Open Workshop due to sound issues.

The History of the IIPC, through Web Archives

By Nicholas Taylor, Web Archiving Service Manager, Stanford University

Web archives have now been around long enough that the web content they’ve preserved may never have been previously experienced by full-grown adults today; to this cohort, some websites were only ever “historical.” Web archives represent an increasingly vital and singular body of cultural heritage and a tool for understanding both the past and social phenomena. They’re also a handy tool for understanding the evolution of the IIPC itself.

netpreserve.org_2015

home page of the IIPC website, 16 March 2015

While I trust that our own programmatic record-keeping would be sufficient to reconstruct some of the following findings, they would also be thankfully self-evident to a future historian (one unusually interested in the history of the history of the Web) from the web archives themselves. Consulting the UK Web Archive front-end for the IIPC-funded, LANL-developed and -hosted Memento Aggregator shows that Internet Archive has the greatest number of snapshots of the entire history of the IIPC’s web presence.

Here’s some of what I learned, exploring the timeline:

netpreserve.org_2004

home page of IIPC website, 3 june 2004

I imagine that these latter three points especially will be interesting to consider in the context of our forthcoming discussions for a new membership agreement to replace the one expiring this year (PDF) and to inform refined IIPC mission and goals. Here’s hoping that the most exciting history of the history of the Web is still ahead of us!

What’s Next for OpenWayback

By Kristinn Sigurðsson, Head of IT at National and University Library Iceland. Cross posted from his own blog

About one month ago, OpenWayback 2.1.0 was released. This was mostly a bug-fix release with a few new features merged in from Internet Archive’s Wayback development fork. For the most part, the OpenWayback effort has focused on ‘fixing’ things. Making sure everything builds and runs nicely and is better documented.

I think we’ve made some very positive strides.

Work is now ongoing for version 2.2.0. Finally, we are moving towards implementing new things! 2.2.0 still has some fixing to do. For example, localization support needs to be improved. But, we’re also planning to implement something new, support for internationalized domain names.

We’ve tentatively scheduled the 2.2.0 release for “spring/early summer”.

After 2.2.0 is released, the question will be which features or improvements to focus on next. The OpenWayback issue tracker on GitHub has (at the time of writing) about 60 open issues in the backlog (i.e. not assigned to a specific release).

We’re currently in the process of trying to prioritize these. Our current resources are nowhere sufficient to resolve them all. Prioritization will involve several aspects, including how difficult they are to implement, how popular they are and, not least, how clearly they are defined.

This is where you, dear reader, can help us out by reviewing the backlog and commenting on issues you believe to by relevant to your organization. We also invite you to submit new issues if needed.

It is enough to just leave a comment that this is relevant to your organization. Even better would be to explain why it is relevant (this helps frame the solution). Where appropriate we would also welcome suggestions for how to implement the feature. Notably in issues like the one about surfacing metadata in the interface.

If you really want to see a feature happen, the best way to make it happen is, of course, to pitch in.

Some of the features and improvements we are currently reviewing are:

  • Enable users to ‘diff’ different captures of an HTML page. Issue 15.
  • Enable search results with a very large number of hits. Issue 19.
  • Surface more metadata. Issue 28and 29.
  • Enable time ranged exclusions. Issue 212.
  • Create a revisit test dataset. Issue 117.
  • Using CDX indexing as the default instead of the BDB index. Issue 132.

As I said, these are just the ones currently being considered. We’re happy to look at others if there is someone championing them.

If you’d like to join the conversation, go to the OpenWayback issue tracker on GitHub and review issues without a milestone.

If you’d like to submit a new issue, please read the instructions on the wiki. The main thing to remember is to provide ample details.

We only have so many resources available. Your input is important to help us allocate them most effectively.

 

 

 

 

IIPC Technical Training Workshop – 14th – 16th January 2015

2015-Jan_IIPC Technical WorkshopThe idea of running a training workshop focusing on technical matters was formed during the 2014 IIPC General Assembly in Paris. It became apparent that there is so much transferrable experience among the members and that some institutions are more advanced than others in using the key software for web archiving. Having a forum to exchange ideas and discuss common issues would be extremely useful and welcomed.

Consortium of memory organisations

Kristinn Sigurðsson gave an accurate account of how the idea developed from a thought, to exciting sessions of discussion, and eventually a proposal supported by the IIPC Steering Committee in his blog. Staff development and training is one of the key areas of work for the IIPC. As a consortium of memory organisations sharing the mission of preserving the Internet for posterity, there is great advantage to collaborate, help each other and not to reinvent the wheel. The IIPC has an Education and Training Programme and allocates each year a certain amount of funding for the purpose of collective learning and development. The National Library of France for example organised a week-long workshop in 2012, to offer training for organisations planning to embark into web archiving.

AndyJackson

TokeEskilden

KristinnSigurdsonRogerCoram

Joint expertise

The British Library and the National and University Library of Iceland joint training workshop was the first one dedicated to technical issues, covering the three key applications for web archiving: Heritrix, OpenWayback and Solr. The speakers mainly came from both libraries’ capable technical teams, including Kristinn Sigurðsson, Andy Jackson, Roger Coram and Gil Hoggarth. Their expertise was strengthened by Toke Eskildsen of the State and University Library in Denmark, who has worked extensively on the Danish Web Archive’s large-scale Solr index. Toke also reported on his visit to the British Library in his blog, regarding his experience of “being embedded in tech talk with intelligent people for 5 days” as “exhausting and very fulfilling”. The British Library also took advantage of Toke’s presence and picked his brain on performance issues related to Solr, a perfect example of what other good things can come out of putting techies together.

For the future

Evaluation of the workshop indicates overall satisfaction from the attendees. More people seemed to favour the presentations on day one and desired more structure to the hands-on sessions on day two and three, with more real world examples to be solved together. The presence of strong technical expertise and the opportunity to talk to peers were appreciated the most. From the organiser’s perspective, there are a few things we could have done better: software could have been pre-installed to avoid network congestion and save time; and for the catering we will remember for future occasions that brilliant minds need adequate and varied fuels to be kept well-oiled and running up to speed.

Training is vital for any organisation that aims at progressing. It is not a cost but an investment which safeguards our continuous capability of doing our job. It is worth to consider establishing technical training as a fix element of the Education and Training Programme. The British Library’s Web Archiving crew are happy to contribute.

Helen Hockx-Yu, Head of Web Archiving, The British Library, 17th Feb 2015

IIPC – Meet the Officers, 2015

The IIPC is governed by the Steering Committee, formed by representatives of 15 member organisations who are each elected for three years.

The IIPC Officers include the Chair and Vice-Chair who are elected by the Steering Committee plus the standing officers of Treasurer and the Program and Communications team.

 They invest their expertise and more importantly their time to dealing with the day-to-day business of running the IIPC. The IIPC secretariat – so to speak – is based at the British Library  and the Bibliothèque nationale de France. At the BL the two Programme and Communication Officers ensure that the IIPC runs smoothly and that all of the projects and programs are completed. The BnF is the treasurer of the IIPC and oversees all financial transactions. One of the main tasks each year for the secretariat  is organising a successful annual General Assembly, this year hosted by Stanford University, California.

Chair

PaulWagnerPaul N. Wagner, Senior Director General, Innovation & Chief Information Officer, Chief Information Officer Branch, Library and Archives Canada

Paul Wagner is the Senior Director General, Innovation and Chief Information Officer for Library and Archives Canada.  In this role Paul provides the leadership for the Digital Agenda as it pertains to Canada’s Documentary Heritage.

Previous to this role Paul was Director General, Client Relationships and Business Intake Directorate, Projects and Client Relationships Branch, at Shared Services Canada (SSC).  In this role, Paul built the first enterprise Partnership Management function for technology in the Government of Canada.

Paul joined SSC from the Department of Justice (DoJ) where he held the positions of Chief Information Officer.  As CIO for the department, he developed and led an aggressive IM/IT transformation program.  Prior to that, Paul was the Chief Technology Officer at DoJ where he was responsible for all technology operations. Paul also held several leadership positions at Services Canada, Human Resources and Skills Development Canada and the Department of Public Works and Government Services Canada in the areas of Business Planning, Relationship Management and IT Product/Service Management.

Paul holds a B.A. with a major in Economics from McGill University and his MBA through the University of Ottawa’s Executive MBA program.

 Vice-Chair

CathyHartmanCathy Hartman is the Associate Dean of Libraries at the University of North Texas in Denton, Texas (University Profile).  Her interests have long been in digital libraries, collection building, and digital preservation.

She first began capturing U.S. government websites in 1997 as government agencies closed and their websites were taken down.  With this early start in web archiving, the University of North Texas (UNT) continued to capture such websites and joined the IIPC in 2007.

Hartman serves as the current Steering Committee co-chair, and served as chair of the IIPC Steering Committee in 2013.  UNT participates in many IIPC initiatives including Steering Committee membership, the Access Working Group, the new Collaborative Collections group, and the Education Committee.

Our Nomination Tool is offered for use by any IIPC member organization to support collaborative collection building, and UNT is currently contributing to the Open Wayback development effort.

 Treasurer

ClementOuryClément Oury is head of Digital Legal Deposit at the Bibliothèque nationale de France (BnF). This service is in charge of collecting and preserving a large part of BnF’s born-digital heritage: web archives, e-newspapers and e-books.

Clément Oury also serves as convenor of two ISO working groups (on the “WARC archiving file format” and on “Statistics and quality issues for web archiving”).

He is a graduate of the École nationale des Chartes and has a PhD in early modern history at the University of Paris-Sorbonne.

As Clément will be leaving the BnF and therefore the IIPC in 2015, the position of treasurer is in transition. To ease this situation Peter Stirling has agreed to be second in command and act as interim treasurer until the BnF has decided who is going to follow in Clément’s very competent footsteps.

 Interim-Treasurer

PeterStirlingPeter Stirling is a digital curator in the Digital Legal Deposit team at the BnF. He is responsible for services for users of the web archives, and is currently working on developing data mining services for researchers.

He also works on day-to-day web archiving activity and the international activity of the team in the context of the IIPC.

He holds an M.A. in English Literature and an M.Sc. in Information and Library Studies, and previously worked for an online information portal for health professionals in the UK and in online information monitoring for the French National Cancer Institute before joining the BnF in 2009.

Programme & Communication Officers

The PCOs both split their time evenly between Program and Communication for the IIPC and Engagement and Liaison for the UK Web Archive. 

JasonWebberJason Webber is Web Archiving Engagement & Liaison Manager at the British Library in London. He is responsible for bringing the UK Web Archive to as wide an audience as possible as well as finding and maintaining partnerships and co-operation in research and technology.

 Previously he has worked on various collections based digital projects at the Museum of London and as a Web Content Manager at the Natural History Museum, London.

SabineHartmannSabine Hartmann is Web Archiving Engagement & Liaison Officer at the British Library in London. During her career Sabine has worked in museum, archives and heritage organisations in Germany, Belgium and the Netherlands before moving to the UK in 2014.

With a Master’s degree in History of Art and Archaeology she has a keen interest in digital applications and research connecting history and ICT. Sabine has managed various heritage projects including geo-location apps and websites, oral history and other heritage websites.