Web Archives: Preserving the Everyday Record

milligan_-_picture_0In talking with Ian Milligan, Assistant Professor of Digital and Canadian History at the University of Waterloo, you are immediately impressed by his excitement for web archives and how web archiving is fundamentally changing research.

Ian uses web archives for his historical research to demonstrate their relevance and importance. While he clearly sees the value of web archives, he also recognizes the need to improve access in order to increase usage. To that end, he recently launched Webarchives.ca, an archive dedicated to Canadian politics. Ian is also providing pedagogical support for students using digital materials, including web archives.

I interviewed Ian recently to get his thoughts about these and other web archiving topics.

Remembering Geocities: A Community on the Web

Among Ian’s research projects is the study of Geocities. Remember Geocities? It was a user generated web-hosting community that flourished in the late 1990s and 2000s. Unlike other lost civilizations, we know the cause of Geocities’s demise – Yahoo shut it down in 2009. If it were not for the Internet Archive and Jason Scott’s Archive Team, Geocities would be lost forever.

For those who might ask if it was worth saving, Ian would offer a resounding YES! For Ian, Geocities provides a rich historical source for gaining insight into a pivotal moment in time. It is one of the first examples of democratized web access, when average people could reach bigger audiences than ever before. At its height, Geocities featured more than 38 million pages.

Source: Internet Archive's Wayback Machine, December 1, 2009 capture
Source: Internet Archive’s Wayback Machine, December 1, 2009 capture

Some of the research questions Ian is asking about the Geocities corpus include:

  • How was community enacted?
  • How was community lived in a place like Geocities?
  • Was there actually a sense of community on the web?

While these questions might sound like standard research questions, they are only now being recast over “untraditional” sources, such as Geocities.

Archiving Politics

In an effort to improve access to web archives, Ian worked on a project to launch Webarchives.ca, a research corpus containing Canadian Political Parties and Political Interest Groups sites collected since 2005 by the University of Toronto using the Internet Archive’s Archive-It service. Ian teamed up with researchers from the University of Maryland, York University in Toronto, and Western University in London, Ontario to build this massive collection of more than 14 million “documents.”  To help navigate this large collection, UK Web Archive’s Shine front-end was implemented.

Once I got started looking at Webarchives.ca, I couldn’t stop myself from digging further into such a wealth of information. I particularly liked the graphing of terms over time feature, which allows you to see when terms go in and out of use by political parties.

In sharing his takeaways from working with these data, Ian observed that it is equally interesting to see when terms do not appear as when they do.

A Pivotal Shift for Scholarship

Ian shared some concrete examples of how the rise of web archives represents a pivotal shift for scholarship. Let’s take, for instance, particular segments of the population, such as young people, who have traditionally been left out of the historical record.

When Ian was researching the 1960s in order to understand the voice of young activists, he found the sources to be scarce. Conversations among activists tended to happen in coffeehouses, bars, and other places where records were not kept. So, a historian can only hope that a young activist back then kept a diary and that it has survived, or she or he needs to find them and interview them.

Contrast this to today’s world. With the explosion of social media, young people are writing things down and leaving records that we never would have had in the past. Web archiving tools can capture this information, which is a very rich and exciting development for historians, but only if these important records of daily life have been archived.

Is More Better?

The increase in information can be a double-edged sword. As Ian says, “there used to be such a scarcity of historical sources, now we have more information than we know what to do with.”

Ian is concerned that digital and digitized materials will be privileged as sources and/or misinterpreted. He conducted a study when materials were first digitized. He learned that scholars cited more often digital materials vs analog. Basically, content that was more easily available online was getting used more.

Ian is also worried that there is not a deep understanding of how to critically use digital resources. Many are unaware, for example, of the limitations of simple keyword searching. Add to the mix web archives and you have increased the scale of the problem.

So Ian wrote a pedagogical book.

exploringBigHistoricalDAtaThe Historian’s Macroscope: Exploring Big Historical Data, written along with Shawn Graham and Scott Weingart, will be out later this year. The book is a sort of toolbox for upper division history undergraduates to teach them how to think critically about digital resources and to avoid common pitfalls. It also includes “how to” information for analyzing data, such as basic data visualization and network analysis.

Always pushing the envelope, Ian and his co-authors wrote the first draft of their book online.

No “Do Overs”

Ian closed our interview by sharing a provocative statement that he made at the recent IIPC General Assembly. “You cannot study the history of the 90s unless you use web archives. It is a significant part of the record of the 1990s and 2000s for everyday people. When historians write the history of 9/11 or Occupy Wall Street, they are going to have to use web archives.”

As exciting as it is for historians to have access to these rich new resources, Ian also shared his biggest concern, which is that we need to ensure that we are saving websites. “Every day we are losing considerable amounts of our digital heritage. Gathering is critical. There are no ‘do overs.’”


This blog post is the second in a series of interviews with researchers to learn about their use of web archives.

By Rosalie Lack, Product Manager, California Digital Library

One thought on “Web Archives: Preserving the Everyday Record

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s