Signing Off

Colleagues,

Today marks my final day as Chair of our Consortium. It has been an exciting and busy 17 months since I took on this role. I leave my post with a sense of accomplishment and pride in how the organization ‎has evolved.

When I took over the role in January 2015, I made the commitment to work with the Steering Committee to ensure we modernized the governance and management structure of the IIPC to create a foundation that would allow us to grow and extend our reach.  I am happy to say that we have accomplished just that.

As most of you know I am not a career Archivist or Librarian but I have been privileged to work with and learn from professionals within my home organization (Library and Archives Canada) as well as many of you from across the globe. I am pleased to hand over the reins to Emmanuelle Bermès from the National Library of France. She will bring not only deep management and leadership skills to the role, but also (and maybe more importantly) significant experience in the business of the IIPC.  I think this balance of experience and competencies is what we need now.

I had the privilege of being involved in three General Assemblies (GA) and the associated conferences. I was continuously amazed with the level of engagement and interaction between the members. Based on the feedback I have received, this last GA and WAC set the bar – this is in no small part to the leadership of Kristinn Sigurðsson.

As with any organization, the goal is to keep that level of engagement going virtually after the face-to-face meetings have ended. We still have much work to do on that front, but I am pleased that our new portfolio structure ensures that there will be dedicated resources and leadership for Birgit Nordsmark Henriksen (Netarchive.dk) and the Membership and Engagement Portfolio.  Stay tuned for some steps to facilitate that year-long engagement.

The ecosystem that our respective organizations work in, and the one that the IIPC is trying to foster, is  very complex  and continues to include new players. Working alongside of other organizations and associations will be key in delivering our mandate. Again we have ensured that we leverage ‎partnerships with complimentary organizations. Listen out for more from Hansueli Locher (Swiss National Library) and the group supporting the Partnership and Outreach Portfolio.

‎One of the areas that we heard loud and clear was that our members wanted help with tools. At some point I am sure that there will be more and more commercially available solutions for Web harvesting and archiving, but for now it is up to us as a community to rally together to build the tools to support our work led by Tom Cramer (Stanford University Libraries) and the Tools Development Portfolio.

I can say that one of the best ways to support our organization is to get involved. Whether you decide to apply for a position on the Steering Committee, or if you support one of the portfolios, or if you simply ‘lean in’ on some of the discussions that circulate via email –  the goal is the same: get involved!

‎I want to thank my colleagues on the Steering Committee for supporting me  (and putting up with me) over the past year and a half. As IIPC members, you can be confident that you have a steering committee which has your best interest at heart. Many excellent and passionate discussions have brought us to where we are today.

I also want to thank the Program and Communications team. In particular, I want to thank Jason Webber from the British Library. He and I worked closely together and spoke almost weekly in an effort to move the agenda forward. Jason (and now Olga) are the glue between the various activities of the Steering Committee and it is often a thankless job.

Lastly, I want to thank all of you – from the emails I received to the one-on-one discussions you have made sure that we heard your needs and expectations.

As they say, the best is yet to come…. so let’s step forward together.

Regards.

PnW
Paul N. Wagner
Chair, International Internet Preservation Consortium
Advertisements

Five Takeaways from AOIR 2015

aoirI recently attended the annual Association of Internet Researchers (AOIR) conference in
Phoenix, AZ. It was a great conference that I would highly recommend to anyone interested in learning first hand about research questions, methods, and studies broadly related to the Internet.

Researchers presented on a wide range of topics, across a wide range of media, using both qualitative and quantitative methods. You can get an idea of the range of topics by looking at the conference schedule.

I’d like to briefly share some of my key takeaways. I apologize in advance for oversimplifying what was a rich and deep array of research work, my goal here is to provide a quick summary and not an in-depth review of the conference.

  1. Digital Methods Are Where It’s At

I attended an all-day, pre-conference digital methods workshop. As a testament to the interest in this subject, the workshop was so overbooked they had to run three concurrent sessions. The workshops were organized by Axel Bruns, Jean Burgess, Tim Highfield, Ben Light, and Patrik Wikstrom (Queensland University of Technology), and Tama Leaver (Curtin University).

Researchers are recognizing that digital research skills are essential. And, if you have some basic coding knowledge, all the better.

At the digital methods workshop, we learned about the “Walkthrough” method for studying software apps, tools for “web scraping” to gather data for analysis, Tableau to conduct social media analysis, and “instagrammatics,” analyzing Instagram.

FYI: The Digital Methods Initiative from Europe has tons of great information, including an amazing list of tools.

  1. Twitter API Is also Very Popular

There were many Twitter studies, and they all used the Twitter API to download tweets for analysis. Although researchers are widely using the Twitter API, they expressed a lot of frustration over its limitation. For example, you can only download for free up to 1% of the total Twitter volume. If you’re studying something obscure, you are probably okay, but if you’re studying a topic like #jesuischarlie, you’ll have to pay to get the entire output. Many researchers don’t have the funds for that. One person pointed out that it would be ideal to have access to the Library of Congress’s Twitter archive. Yes, agreed!

  1. Social Media over Web Archives

Researchers presented conclusions and provided commentary on our social behavior through studies of social media such as Snapchat, Twitter, Facebook, and Instagram. There were only a handful of presentations using web archived materials. If a researcher used websites, they viewed them live or conducted “web scraping” with tools such as Outwit and Kimono. Many also used custom Python scripts to gather the data from the sites.

  1. Fair Use Needs a PR Movement

There’s still much misunderstanding about what researchers can and cannot do with digital materials. I attended a session where the presenter shared findings from surveys conducted with communication scholars about their knowledge of fair use. The results showed that there was (very!) limited understanding of fair use. Even worse, the findings showed that those scholars who had previously attended a fair use workshop were even more unlikely to understand fair use! Moreover, many admitted that they did not conduct particular studies because of a (misguided) fear of violating copyright. These findings were corroborated by the scholars from a variety of fields who were in the room.

  1. Opportunities for Collaboration

I asked many researchers if they were concerned that they were not saving a snapshot of websites or Apps at the time of their studies. The answer was a resounding “yes!” They recognize that sites and tools change rapidly, but they are unaware of tools or services they can use and/or that their librarians/archivists have solutions.

Clearly there is room for librarians/archivists to conduct more outreach to researchers to inform them about our rich web archive collections and to talk with them about preservation solutions, good data management practices and copyright.

Who knew?

Let me end with sharing one tidbit that really blew my mind. In her research on “Dead Online: Practices of Post-Mortem Digital Interaction,” Paula Kiel presented on the “digital platforms designed to enable post-mortem interactions.” Yes, she was talking about websites where you can send posthumous messages via Facebook and email! For example, https://www.safebeyond.com/, “Life continues when you pass… Ensure your presence – be there when it counts. Leave messages for your loved ones – for FREE!”

RosalieLack

 

By Rosalie Lack, Product Manager, California Digital Library

Web Archives: Preserving the Everyday Record

milligan_-_picture_0In talking with Ian Milligan, Assistant Professor of Digital and Canadian History at the University of Waterloo, you are immediately impressed by his excitement for web archives and how web archiving is fundamentally changing research.

Ian uses web archives for his historical research to demonstrate their relevance and importance. While he clearly sees the value of web archives, he also recognizes the need to improve access in order to increase usage. To that end, he recently launched Webarchives.ca, an archive dedicated to Canadian politics. Ian is also providing pedagogical support for students using digital materials, including web archives.

I interviewed Ian recently to get his thoughts about these and other web archiving topics.

Remembering Geocities: A Community on the Web

Among Ian’s research projects is the study of Geocities. Remember Geocities? It was a user generated web-hosting community that flourished in the late 1990s and 2000s. Unlike other lost civilizations, we know the cause of Geocities’s demise – Yahoo shut it down in 2009. If it were not for the Internet Archive and Jason Scott’s Archive Team, Geocities would be lost forever.

For those who might ask if it was worth saving, Ian would offer a resounding YES! For Ian, Geocities provides a rich historical source for gaining insight into a pivotal moment in time. It is one of the first examples of democratized web access, when average people could reach bigger audiences than ever before. At its height, Geocities featured more than 38 million pages.

Source: Internet Archive's Wayback Machine, December 1, 2009 capture
Source: Internet Archive’s Wayback Machine, December 1, 2009 capture

Some of the research questions Ian is asking about the Geocities corpus include:

  • How was community enacted?
  • How was community lived in a place like Geocities?
  • Was there actually a sense of community on the web?

While these questions might sound like standard research questions, they are only now being recast over “untraditional” sources, such as Geocities.

Archiving Politics

In an effort to improve access to web archives, Ian worked on a project to launch Webarchives.ca, a research corpus containing Canadian Political Parties and Political Interest Groups sites collected since 2005 by the University of Toronto using the Internet Archive’s Archive-It service. Ian teamed up with researchers from the University of Maryland, York University in Toronto, and Western University in London, Ontario to build this massive collection of more than 14 million “documents.”  To help navigate this large collection, UK Web Archive’s Shine front-end was implemented.

Once I got started looking at Webarchives.ca, I couldn’t stop myself from digging further into such a wealth of information. I particularly liked the graphing of terms over time feature, which allows you to see when terms go in and out of use by political parties.

In sharing his takeaways from working with these data, Ian observed that it is equally interesting to see when terms do not appear as when they do.

A Pivotal Shift for Scholarship

Ian shared some concrete examples of how the rise of web archives represents a pivotal shift for scholarship. Let’s take, for instance, particular segments of the population, such as young people, who have traditionally been left out of the historical record.

When Ian was researching the 1960s in order to understand the voice of young activists, he found the sources to be scarce. Conversations among activists tended to happen in coffeehouses, bars, and other places where records were not kept. So, a historian can only hope that a young activist back then kept a diary and that it has survived, or she or he needs to find them and interview them.

Contrast this to today’s world. With the explosion of social media, young people are writing things down and leaving records that we never would have had in the past. Web archiving tools can capture this information, which is a very rich and exciting development for historians, but only if these important records of daily life have been archived.

Is More Better?

The increase in information can be a double-edged sword. As Ian says, “there used to be such a scarcity of historical sources, now we have more information than we know what to do with.”

Ian is concerned that digital and digitized materials will be privileged as sources and/or misinterpreted. He conducted a study when materials were first digitized. He learned that scholars cited more often digital materials vs analog. Basically, content that was more easily available online was getting used more.

Ian is also worried that there is not a deep understanding of how to critically use digital resources. Many are unaware, for example, of the limitations of simple keyword searching. Add to the mix web archives and you have increased the scale of the problem.

So Ian wrote a pedagogical book.

exploringBigHistoricalDAtaThe Historian’s Macroscope: Exploring Big Historical Data, written along with Shawn Graham and Scott Weingart, will be out later this year. The book is a sort of toolbox for upper division history undergraduates to teach them how to think critically about digital resources and to avoid common pitfalls. It also includes “how to” information for analyzing data, such as basic data visualization and network analysis.

Always pushing the envelope, Ian and his co-authors wrote the first draft of their book online.

No “Do Overs”

Ian closed our interview by sharing a provocative statement that he made at the recent IIPC General Assembly. “You cannot study the history of the 90s unless you use web archives. It is a significant part of the record of the 1990s and 2000s for everyday people. When historians write the history of 9/11 or Occupy Wall Street, they are going to have to use web archives.”

As exciting as it is for historians to have access to these rich new resources, Ian also shared his biggest concern, which is that we need to ensure that we are saving websites. “Every day we are losing considerable amounts of our digital heritage. Gathering is critical. There are no ‘do overs.’”

RosalieLack

This blog post is the second in a series of interviews with researchers to learn about their use of web archives.

By Rosalie Lack, Product Manager, California Digital Library

We want YOUR ideas for the IIPC General Assembly 2016

NatLibIcelandYou will be pleased to hear that preparations for the IIPC General Assembly 2016 in Reykjavik, Iceland (11-15 April) are under way and we are aiming to make it the best one yet.

The program team have been hard at work looking at potential themes, topics and areas for discussion and debate. We would, however, love to have your input into this too!

So far, we’ve outlined the following areas:

  • Nuts and bolts of web archiving (management, metrics, organisation, programs)
  • De-duplication 
  • Researcher use cases (of web archives)
  • Big Data usage and potential
  • Web Archiving policies and frameworks / Preservation policies, Collection policies 
  • API’s
  • Web Archiving Tool development 
  • Legal deposit, copyright, data protection (EU wide perspective?)

help_wantedWhat have we missed, what should we focus on, what would YOU like to see and hear about?

Please use the comments below and tell us what you would like from the conference? This will help frame the call for papers due to go out at the end of October.

Thank you.

Jason Webber, IIPC Program and Communications Officer

Open letter by IIPC Chair

Greetings IIPC Memebers,

I hope that your summer is going very well and that you are all able to take some time off to recharge and spend time with family and friends.  It is hard to believe that more than 3 months have passed since many of us were together at Standford University in Palo Alto for our 2015 General Assembly (GA)!

I want to take this opportunity to  once again say how impressed I was at the quality of the event.  Everything from the organization of the entire event to the excellent interactions that our members engaged in brought significant value to the week.

I want to focus in on the Member’s Day that we had at the Internet Archive offices.  At one point in the day, you were asked to break off into groups to discuss some of the important issues and challenges facing the IIPC in the near future.  The Steering Committee met on the Saturday following the GA to discuss how we can better serve you – our members – and to ensure that we focus our limited resources what brings the greatest value to the global Web Archiving community.  I want to assure you that YOUR feedback was taken very seriously and thanks to the leadership of Birgit Nordsmark Henriksen (Netarchive.dk) and Barbara Sierman (National Library of the Netherlands) the Steering Committee was able to distill your comments and input into 4 manageable work packages:

  1. Researcher Involvement
  2. Tools
  3. Connectedness
  4. Practicalities

Work on each of these elements has begun (thanks to dedicated teams looking at each individual area) and each group is coming prepared to our upcoming in-person Steering Committee meeting in September.  I will update you right after that meeting to let you know what you can expect from the IIPC in the coming year(s).

What I can tell you is that you can count on the IIPC continuing on being a robust and vibrant community and that your contributions will become even more important as we move forward.  Your Steering Committee remains commited to ensuring the value of Your membership to the Consortium.

I welcome any comments or questions at paul.wagner@bac-lac.gc.ca

Stay tuned for more updates in September.

PaulWagnerPaul N. Wagner, Chair, IIPC

Directeur général principal et DPI, Direction générale d’innovation et du Dirigeant principal de l’information – Senior Director General & CIO, Innovation and Chief Information Officer Branch

Bibliothèque et Archives Canada / Gouvernement du Canada – Library and Archives Canada / Government of Canada

So You Want to Get Started in Web Archiving?

web3_0The web archiving community is a great one, but it can sometimes be a bit confusing to enter. Unlike communities such as the Digital Humanities, which has developed aggregation services like DH Now, the web archiving community is a bit more dispersed. But fear not, there are a few places to visit to get a quick sense of what’s going on.

Social Media

twitter-logo_1A substantial amount of web archiving scholarship happens on-line. I use Twitter (I’m at @ianmilligan1), for example, as a key way to share research findings and ideas that I have as my project comes together. I usually try to hashtag them with: #webarchiving. This means that all tweets that people use “#webarchiving” with will show up in that specific timeline. For best results, linkedInusing a Twitter client like TweetdeckTweetbot, or Echofon can help you keep aprised of things. There may be Facebook groups – I actually don’t use Facebook (!) so I can’t provide much guidance there. On LinkedIn there are a few relevant groups: IIPC, Web ArchivingPortuguese Web Archive

Blogs

I’m wary of listing blogs, because I will almost certainly leave some out. Please accept my apologies in advance and add your name in the comments below! But a few are on my recurring must-visit list (in addition to this one, of course!):

  • Web Archiving Roundtable: Every week, they have a “Weekly web archiving roundup.” I don’t always have time to keep completely caught up, but I visit roughly weekly and once in a while make sure to download all the linked resources. Being included here is an honour.
  • The UK Web Archive Blog: This blog is a must-have on my RSS feed, and it keeps me posted on what the UK team is doing with their web archive. They do great things, from inspiring outreach, to tools development (i.e. Shine), to researcher reflections. A lively cast of guest bloggers and regulars.
  • Web Science and Digital Libraries Research Group: If you use web archiving research tools, chances are you’ve used some stuff from the WebSciDL group! This fantastic blog has a lively group of contributors, showcasing conference reports, research findings, and beyond. Another must visit.
  • Web Archives for Historians: This blog, written by Peter Webster and myself, aims to bring together scholarship on how historians can use web archives. We have guest posts as well as cross-posts from our own sites.
  • Peter Webster’s Blog: Peter also has his own blog, which covers a diverse range of topics including web archives.
  • Ian Milligan’s Blog: It feels weird including my own blog here, but what the heck. I provide lots of technical background to my own investigations into web archives.
  • The Internet Archive Blog: Almost doesn’t need any more information! It’s actually quite a diverse blog, but a go-to place to find out about cool new collections (the million album covers for example) or datasets that are available.
  • The Signal: Digital Preservation Blog: A diverse blog that occasionally covers web archiving (you can actually find the subcategory here). Well worth reading – and citing, for that matter!
  • Kris’s Blog: Kristinn Sigurðsson runs a great technical blog here, very thought provoking and important for both those who create web archives as well as those who use them.
  • DSHR’s Blog: David Rosenthal’s blog on digital preservation has quite a bit about web archiving, and is always provocative and mind expanding.
  • Andy Jackson’s blog  – Web Archiving Technical Lead at the British Library
  • BUDDAH project – Big UK Domain Data for the Arts and Humanities Research Project
  • Dépôt légal web BnF
  • Stanford University Digital Library blog
  • Internet Memory Foundation blog
  • Toke Eskildsen blog – IT developer at the National Library of Denmark.

Again, I am sure that I have missed some blogs so please accept my sincerest apologies.

1354116111_webIn-Person Events

The best place to learn is in-person events, of course, which are often announced at places like this blog or in many of the above mediums! I hope that the IIPC blog can become a hub for these sorts of things.

Conclusions

Imilligan_-_picture_0 hope this is helpful for people that are starting out in this wonderful field. I’ve just provided a small slice: I hope that in the comments below people can give other suggestions which can help us all out!

By Ian Milligan (University of Waterloo)

Non-print Legal Deposit Law approved in Spain

By Mar Pérez Morillo
Jefe del Área de Gestión del Depósito de Publicaciones en Línea, Biblioteca Nacional de España

Last Friday the Spanish Council of Ministers approved the royal decree to regulate the legal deposit of online publications.

In the Legal deposit law of 2011 the online documents were considered objects of legal deposit for the first time in Spain.

The variety and complexity of this kind of publications led to the writing of a legal text (a royal decree) that developed the law and regulated the procedures and details to manage their legal deposit.

In the current technological environment, being the World Wide Web the main way for the dissemination of information, national libraries and archives along with university libraries and research institutions all over the world have been preserving for years the huge documentary heritage that is in internet. The legal deposit has been the instrument used along the centuries to build this documentary heritage on physical formats. Since years, many countries have legislated on the legal deposit of online publications, considering them part of this heritage to be preserved.

Given their special characteristics, the huge amount of them and thus the inability of exhaustiveness when capturing, storing and preserving them, the royal decree just approved in Spain introduces some important differences with the print legal deposit:

  • The publishers are not the ones to deposit the publications but the deposit libraries are the ones to demand from publishers the publications to be deposited.
  • No legal deposit number will be assigned to online publications.
  • The main way to deposit is the automated crawl of the web.
  • When the information is not publicly available online, but is part of a database or is protected by user and password, the curator centres –deposit libraries- (national and regional libraries with competence on legal deposit) will request publishers to deliver the publications.

In advance, the National Library of Spain has been crawling and archiving the Spanish web from 2009 to 2013 thanks to a contract with Internet Archive. The results were eight .es domain crawls and two selective crawls on Humanities and General Elections in 2011. In 2014, the Library adopted and installed NetarchiveSuite as its web archiving tool, and since then several selective crawls have been run on historical and cultural events in Spain, like the death of the President Suárez, the abdication of the King Juan Carlos I, the proclamation of Felipe VI, the European Elections in 2014 and the regional and local elections in May 2015, among others.

Although this was possible under the umbrella of the previous legal deposit law (1957), the royal decree now approved specifically enhances the regional deposit libraries and the National Library of Spain to crawl the web and to request every online publication considered part of the Spanish documentary heritage, to fulfill their mission of preserving it for future generations.
This is the end of a long and winding road, since the first version of the royal decree was drafted in 2012. Since then, many governmental institutions, publishers, individuals and all the sectors involved have sent their comments and allegations to the text.

This would not be a reality today without the support of all of them, but specially the public entity Red.es and the Secretary of State for Telecommunications in Spain, and the IIPC and the NAS community.

This is also the beginning of a long road (hopefully not winding). The success of our preserving mandate relies greatly on the collaboration between libraries and all the stakeholders.

BNE blog post