IIPC Content Development Group: What’s on in 2018

by Nicola Bingham, Lead Curator, Web Archiving British Library and IIPC CDG Co-Chair

The co-chairs of the IIPC Content Development Group  (CDG) are pleased to submit the following update on the group’s activity so far this year and the major projects which will occupy the group going forward in 2018.

What do we do?

For those new to the IIPC or those who may be interested in either contributing to planned collections or thinking about submitting ideas for new ones, it is worth revisiting the CDG’s mandate.

The CDG was formed in 2014 and crawling began in early 2015. The Group is charged with building publicly accessible web collections on transnational themes or events. Collections are multinational, multilingual and cover a wide variety of perspectives. They are intended, not only to be of particular value to researchers now and in the future but also to promote awareness of web archiving globally, encouraging individuals and institutions not involved in web archiving, or wanting to become involved to find out more.

How to propose a collection?

New collections can be proposed on the CDG member’s mailing list, where the CDG co-chairs and the group (sometimes with consultation with researchers and others) develop a list of collections to pursue in line with pre-defined criteria in the collection policy and our capacity according to the budget approved by the Steering Committee. Each collection is supported by the co-chairs who serve as project admins while a lead curator, often the person who proposes the collection, but not necessarily, scopes the collection, determines the metadata, monitors the collection and leads on quality assurance. Each collection is open to all members to contribute to. We strive to open up the nomination procedure as widely as possible, to non-members and members of the public, to elicit as wide a coverage of particular topics as possible.

Collections developed so far, via the IIPC Archive-It account, can be viewed here https://archive-it.org/home/IIPC

2018 collecting

So far in 2018 we have completed the 2018 Winter Olympics & Paralympics Collection, which contains nearly 1,500 seeds and is 1.2TB of data. The collection covered 35 countries in 21 Languages. The nominations came from a mix of IIPC members and a public nomination form that was available through previous blog posts. For more information on this collection see lead curator, Helena Byrne’s blog posts.

In addition, we updated the National Olympic & Paralympic Committees collection with committees that were missing from the crawl in 2016. This collection was crawled again during the 2018 Winter Olympics & Paralympics. Not all National Committees have a website, but if you notice we are missing any websites get in touch (2018-winter-olympics [at] iipc.simplelists .com).

We are now turning our attention to resuming the World War I Commemoration and the ‘Online News around the World’ collections.

The World War I Commemoration project led by Peter Stirling, BnF, started in October 2015. It already includes over 2,000 seeds and covers a wide variety of different websites from official commemorations to amateur history websites, and the reporting of the centenary in the media. Websites from several different countries and many languages have been selected by the members’ of the IIPC. 2018 is an important year for this collection as we will be looking to capture activity leading up to and during the centenary of the armistice in November.

The ‘Online News around the World’ collection has been several years in planning, led by, Sabine Schostag, the Royal Danish Library, and will begin in earnest shortly. This ambitious project aims to document a selection of online news websites from as many countries as possible  in the world during one week of the year (likely to be in November 2018). Once the metadata has been finalised, we will post details of how to nominate content for this collection.  The IIPC has members in over 34 countries around the world which is already a good starting point but we hope to canvas much more widely than this to achieve our goal of global coverage!

This summer we will also be running new crawls of the seeds in the International Cooperation Organizations collection, led by Alex Thurman from Columbia University Libraries, which consists of all known active websites in the .int top-level domain (available only to organizations created by treaties). This collection was started in 2016 and includes important agencies in areas that require international cooperation, like environmental protection, economic development, and telecommunication.

In the meantime, we hope to see as many CDG members as possible for our session at the IIPC General Assembly on 12th November –  more details to follow shortly.

Advertisements

World Wide Webarchiving: Upgrading the Web Curator Tool

by Kees Teszelszky, Curator digital collections, National Library of the Netherlands

The Web Curator Tool (WCT) is a workflow management application designed for selective web archiving. It was created for use in libraries and other digital heritage collecting organisations, and supports collection by non-technical users while still allowing complete control of the web harvesting process. The WCT is a tool that supports the selection, harvesting and quality assessment of online material when employed by collaborating users in a library environment. The application is integrated with the existing Heritrix web crawler and supports key processes such as permissions, job scheduling, harvesting, quality review, and the collection of descriptive metadata. The WCT allows institutions to capture almost any online resource. These artefacts are handled with all possible care, so that their integrity and authenticity is preserved.

The WCT was developed in 2006 as a collaborative effort by the National Library of New Zealand (NLNZ) and the British Library (BL), initiated by the International Internet Preservation Consortium (IIPC) as can be read in the original documentation. The WCT is open-source and available under the terms of the Apache Public License. The project was moved in 2014 from Sourceforge to Github. The latest ‘binary’ release of the WCT, v1.6.3, was published in July 2017 on the Github page of NLNZ. Even after 12 years, the WCT still continues as one of the most common, open-source enterprise solutions for web archiving. It has an active user forum on Github and Slack.

From January 2018 onwards, NLNZ has been collaborating to upgrade the WCT with the Koninklijke Bibliotheek – National Library of the Netherlands (KB-NL) and adding new features to make the application future-proof. This involves learning the lessons from the previous development and recognising the advancements and trends occurring in the web archiving community. The objective is to get the WCT to a platform where it can keep pace with the requirements of archiving the modern web. Further, the Permission Request module will be extended to fit the Dutch situation which lacks a legal deposit for digital publications.

The first step in that process was decoupling the WCT from the old Heritrix 1.x web crawler, and allowing the WCT to harvest using the updated Heritrix 3.x version. A proof of concept for this change was successfully developed and deployed by the NLNZ, and has been the basis for a joint development work plan. The project will be extensively documented.

The NLNZ has been using the WCT for its selective web archiving programme since January 2007, KB-NL since 2009. In 2008 NLNZ published an article describing their experience using WCT in a production environment. However, the software had fallen into a period of neglect, with mounting technical debt: most notably its tight integration with an out-dated version of the Heritrix web crawler. While the last public release of the WCT is still used day-to-day in various institutions, this release has essentially reached its end-of-life as it has fallen further and further behind the requirements for harvesting the modern web. The community of users have echoed these sentiments over the last few years.

During 2016-2017 the NLNZ conducted a review of the WCT and how it fulfils business requirements, and compared the WCT to alternative software/services. The NLNZ concluded that the WCT was still the closest solution to meeting its requirements – provided the necessary upgrades could be done, namely a change to use the modern Heritrix 3 web crawler. Through a series of fortunate conversations the NLNZ discovered that another WCT user, KB-NL, was going through a similar review process and had reached the same conclusions. This led to collaborative development between the two institutions to uplift the WCT technically and functionally to be a fit for purpose tool within these institutions’ respective web archiving programmes.

Who are involved:

National Library of New Zealand:

Steve Knight
Andrea Goethals
Ben O’Brien
Gillian Lee
Susanna Joe
Sholto Duncan

Koninklijke Bibliotheek:

Peter de Bode
Jeffrey van der Hoeven
Hanna Koppelaar
Tymen Kwant
Barbara Sierman
René Voorburg
Kees Teszelszky

Further reading:

IIPC Steering Committee Election 2018: nominations and results

The 2018 IIPC Steering Committee (SC) elections featured 3 vacant seats. The KB (Netherlands), BnF (France), and UNT (United States) all had reached the end of their prior three-year terms. The period for IIPC members to nominate themselves for election to the SC was opened on December 1, 2017 and ran until March 25, 2018. During the nomination period, three nominations were submitted, by KB, BnF, and UNT. Thus, unlike prior years, no election process is necessary since the expiring members were the only three to nominate to fill the three vacancies. Congratulations and thanks to KB, BnF, and UNT for their long service on the SC and their willingness to continue to serve another term. In 2019, the Steering Committee will have 5 (or potentially 6) spaces open up for election and we encourage any members interested in joining the SC for the first time and contributing to the management and strategic direction of the organization to nominate themselves. The SC meets in early April at DNB (Germany). Be on the lookout for reports on outcomes from that upcoming meeting.

Jefferson Bailey (current Chair, IIPC SC)


Nomination statements:

Bibliothèque nationale de France / The National Library of France

 The National Library of France (BnF) started its web archiving programme in the early 2000s and now holds an archive of nearly a petabyte. We use and share expertise about key tools for IIPC members (Heritrix 3, OpenWayback, NetarchiveSuite, webarchive-discovery) and contribute to the development of several of them. We have developed BCweb, an application for seeds selection and curation by librarians which is being open sourced.

The BnF has been involved in IIPC since its beginning and remains firmly committed to the development of a strong community, in order to sustain these open source tools and to share experiences and practices. We have attended, and frequently actively contributed to, general assembly meetings, workshops and hackathons, and most IIPC working groups, in particular Preservation and Collections Development. We are also involved in the new Training working group. Finally, we have invested effort in making the WARC format an ISO standard and will continue to work on its evolution. Our participation in the steering committee, if continued, will be focused on making web archiving a thriving community, engaging researchers in the study of web archives and developing strong archiving strategies for all kinds of web content, including social media.

Koninklijke Bibliotheek / National Library of the Netherlands

The KB is currently a member of the Steering Committee and chair of the Membership Engagement Portfolio Group and would like to nominate itself for election of a new term in the Steering Committee.

The Netherlands were one of the early adopters of the Internet: in fact the 3rd website worldwide was from the Dutch National Institute for Subatomic Physics. The KB started in 2007 collecting websites based on selective harvesting. Currently we harvest around 13.000 websites. Due to copyright reasons, the web sites can only be seen on the premises. Collaboration with other Dutch organizations will improve the coverage of the preserved Dutch national web. In the nationwide Dutch “Network Digital Heritage” we work together on various projects with both GLAM institutions as well as researchers and suppliers of web archiving services to improve the web archiving of the Dutch web. The KB is looking forward to bring this experience to the IIPC and to develop plans to make new connections between the members of IIPC and with other organizations related to the field of creating web collections, web publications, researchers, tool development and digital preservation.

The University of North Texas Libraries 

The University of North Texas (UNT) Libraries is interested in serving another term on the IIPC Steering Committee. As a library that serves a Tier One university and a student population of 38,000 students, we are committed to providing a wide range of resources to researchers. Of these resources, we believe that the preservation of and access to Web archives is an important component. We began capturing websites in 1997 and joined the IIPC in 2007. We find great benefit in participating with an international community dedicated to preserving the Web.

In the last decade, we participated in working groups and served on the steering committee for a number of years. We actively participated in such projects as tool development and maintenance for Open Wayback and Heritrix with UNT Libraries serving as project lead for the Open Wayback project. We participated in collaborative archiving projects including development of the URL Nomination Tool, and served as Steering Committee officers when requested.

If elected, the UNT Libraries will strive to collaborate with our fellow members and represent the best interests of the IIPC community to continue to move forward the preservation of the Web.

Archiving the Croatian web: has it been fourteen years already?

The National and University Library in Zagreb has been an IIPC member since 2008. The Croatian Web Archive (Hrvatski arhiv weba, HAW), established in 2004, is open access. The current projects include delivering metadata to Europeana, implementation of persistent identifier URN:NBN, migration to OpenWayback, development of a new user interface and integration with the Digital Library portal. Web Archiving Team has also been involved in introducing librarians, archivists and researchers to web archiving and to using HAW resources.


By Ingeborg Rudomino, Croatian Web Archive, National and University Library in Zagreb and Karolina Holub, Croatian Digital Library Development Centre, Croatian Institute for Librarianship, National and University Library in Zagreb

About HAW

The National and University Library in Zagreb (NUL) in collaboration with the University Computing Centre in Zagreb (Srce) established the Croatian Web Archive (Hrvatski arhiv weba, HAW) in 2004 and started to acquire, catalogue and archive online publications according to the legal deposit provisions of the Library Act from 1997. Due to the well-known characteristics of web resources, the NUL started to archive selectively and established selection criteria.

Fig. 1. Croatian Web Archive Homepage.

We use several methods to identify a web resource for cataloguing and archiving: the HAW team searches and browses the web; website owners or content providers fill out the Registration form or we receive notifications from the ISSN Centre for Croatia.

After identification, every resource is catalogued in the library system and automatically transferred into our custom-built archiving system, where the archiving process starts. Our long-standing experience in cataloging this type of resource has shown the process to be very challenging, and describing this dynamic and variable content results in daily interventions in the bibliographic records. Because of that, we created cataloguing guidelines with a variety of examples. Our goal has been to preserve the original websites (their look and feel) as much as possible. In order to achieve quality, each resource is approached individually during the archiving process. The DAMP software, developed by the University Computing Centre in Zagreb, was built especially for this purpose. The workflow of processing web resources is integrated within the organisational structure of the Library.

We are proud of the quantity and quality of web resources stored in the Croatian Web Archive, some of which are websites of institutions, associations, clubs, research projects, news media, portals, blogs, official websites of counties, cities, journals and books. Special attention is given to news media websites/portals, which are archived daily, weekly or monthly.

Access and the first full domain crawl

This selective approach ensures quality and provides full control over the management of web resources. So far, over 6,700 titles have been archived and almost all are publicly available. All content is full text searchable, and it’s possible to search by any word in the title, URL or keywords. Advanced search is available as well. Users can browse the HAW alphabetically and through subject categories, which are extracted from the UDC field in the catalogue.

Fig. 2. Screenshots of archived Croatian websites.

To secure permanent access to archived web resources, we have recently implemented persistent identifier URN:NBN and have assigned it to archived titles and all archived instances (Fig. 3).

Fig. 3. Screenshot of archived instances with URN:NBN.

Since 2013, the metadata from HAW is delivered to Europeana through HAW’s OAI-PMH interface.

To overcome the limitations of selective archiving, the first harvest of the whole .hr domain was conducted in 2011 with the Heritrix web crawler. Since then, we have been harvesting the .hr domain annually. The collected content is publicly available via HAW’s website through the OpenWayback access interface (Fig. 4). To date, we have conducted 7 .hr domain harvests.

Fig. 4. Screenshot of harvested website in OpenWayback.

Thematic crawls

In 2011, we started to periodically harvest websites related to topics and events of national importance using Heritrix and OpenWayback, as well. Nine thematic collections have been created, mainly related to themes such as presidential, parliament or local elections, accession to the EU and the flood in Croatia. Each collection consists of several metadata: title, size, number of seeds/URLs and description.

Training and outreach

Twice every year, we organize a workshop within the Centre of Continuing Education for Librarians. With the main goal to introduce the web archiving to library professionals and students, the workshop focuses on learning how to recognize online materials that should be preserved according to existing criteria for cataloguing and archiving Croatian web resources. The participants are also introduced to the workflow of selective archiving, .hr harvests, the process of selecting materials for thematic collections and different ways of browsing the archived content.

With the experience that we have gained throughout the years, sharing our knowledge and expertise on web archiving is something that we are happy to provide and give support to all those interested. To increase awareness about HAW and web archiving among librarians, archivists, and wider community, we try to make use of every opportunity to do so – such as presenting at national and international conferences, giving lectures to students, researchers, etc.

A few thoughts for the future

The Croatian Web Archive currently has more than 40 TB of content. We are currently working on a web interface that will have new functionalities and features including full-text search for the domain harvests and news sections for web archiving community and researchers. Also, the plan is to integrate HAW’s metadata into the Digital Library portal in order to have a single access point for all digital collections.

By combining all three approaches and using different software, the Library will attempt to cover, to the greatest extent possible, the contemporary part of Croatian cultural and scientific heritage.

Visit us: http://haw.nsk.hr/en

Announcing the IIPC Technical Speaker Series

By Jefferson Bailey, Director, Web Archiving (Internet Archive) & IIPC Chair

The IIPC is excited to announce a call for presenters in a new online series, the IIPC Technical Speaker Series. The goal of the IIPC Technical Speaker Series (TSS) is to facilitate knowledge sharing and foster conversations and collaborations among IIPC members around web archiving technical work.

The TSS will feature 30-60 minute online presentations or demonstrations related to tool development, software engineering, infrastructure management, or other specific technology projects. Presentations can take any format, including prepared slides, open conversations, or live demonstrations via screen sharing. Presentations will be from employees at IIPC member organizations and attendance will be open to all IIPC members. The TSS is intended to be informational, not a formal training or education program, and to provide an open venue for knowledge exchange on technical issues. The series will also give IIPC members the chance to demo and discuss technical work (including R&D, prototype, or early-stage work) taking place in member institutions that may have no other venue for presentation or discussion.

If you are interested in presenting, please fill out the short application form.

Details on applying:

  • Applicants must be employed by an IIPC member institution in good standing
  • Access to an online webinar system (WebEx, Zoom, etc) will be provided
  • Presentations will be scheduled for 60 minutes, but can be shorter and should allow time for questions and discussion
  • Small stipends are available to presenters, if needed or if helpful in getting managerial approval to participate.

We aim to have a 2-3 TSS events per quarter, scheduled at a time amenable to as many time zones as possible. Details on upcoming speakers and registration will be shared via the normal IIPC communication channels (listservs, blog, slack, twitter). This project is funded as part of IIPC’s 2018 suite of projects, including work by IIPC Portfolios and Working Groups, as well as other forthcoming member services. The TSS is currently administered by the IIPC Steering Committee Chair (jefferson@archive.org) and the IIPC Program and Communications Officer (Olga.Holownia@bl.uk). Contact either or both with any questions.

Please apply and present to the IIPC community all the excellent technical work taking place at your organization!

How Can We Use Web Archive?: A Brief Overview of WARP and How It Is Used

By Naotoshi Maeda, National Diet Library of Japan

As we all know, the use of web archives has recently become a hot topic in the web archive community. In the 14th iPRES held in Kyoto, the National Diet Library of Japan (NDL) took part in some sessions and presented some examples of how web archive can be used. Here, I post the poster and re-present the topics.

Fig. 1: The poster about use cases of web archive presented in iPRES 2017 (pdf)

Overview of WARP

Since 2002, the NDL has been operating the web archive called WARP. It has been harvesting websites under two different frameworks. The first is Japan’s Legal Deposit system and the second is with the permission of the copyright holder. The National Diet Library Law allows the NDL to harvest websites of public agencies, including those of the national government, municipal governments, public universities, and independent administrative agencies. On the other hand, legal deposit does not allow the NDL to harvest websites of private organizations, so the NDL needs to receive permission from the copyright holder beforehand. At present, WARP archives roughly 1 petabyte of data, comprising 5 billion files from 130,000 captures.

Fig. 2: Statistics of archived content in WARP

85% of the archived websites can be accessed via the Internet based on permissions of rights holders, and WARP provides a variety of search methods, including URL, full text, metadata, and by category.

WARP uses standard web archiving technologies, such as Heritrix for web-crawling, WARC file format for storage, OpenWayback for playback, and Apache Lucene Solr for full text search.

Linking from live websites

Given this background, here I show some examples of how WARP can be used.

The first use case is linking from live websites. As mentioned above, WARP comprehensively harvests and archives the websites of public agencies under the legal deposit system. A significant quantity of content is posted, updated, and deleted on these websites every day. Many of these agencies use WARP as a backup database. Before deleting content from their websites, they add a link to content that is archived by WARP. Doing this enables these websites to keep archived content seamlessly available while also reducing the operating costs of their own web servers.

Fig. 3: Linking from live websites to WARP.

Analysis and Visualization

The graphs below present the results of some analysis of content archived in WARP. The first circular graph illustrates link relations between websites in Japan’s 47 prefectures, thereby showing the extent of their interconnection on the Web. The second graph shows the percent of URLs on websites of the national government that were live in 2015, and indicates that 60% of the URLs that existed in 2010 gave 404 errors during 2015. The third bubble chart shows the relative size of data accumulated from each of the 10,000 websites archived in WARP. Thus, you can see at a glance what websites and how much data are archived in WARP.

Fig. 4: Link relations between websites in Japan’s 47 prefectures.
Fig. 5: The percent of URLs on websites of the national government that were live in 2015.
Fig. 6: Relative size of data accumulated from each of the 10,000 websites archived in WARP.

Curation

The next use case shows how WARP can be used for curation. Curators can use a variety of search methods to find content of interest archived in WARP, but it is not easy for them to gauge the full extent of archived content. The NDL curates archived contents for a variety of subjects and provides visual representations that could provide curatots with unexpected discoveries. Here are two examples: a search by region for obsolete websites of defunct municipalities and the 3D wall for the collection of the Great East Japan Earthquake in 2011.

Fig.7: Search by region for obsolete websites of defunct municipalities.
Fig. 8: 3D wall for the collection of the Great East Japan Earthquake in 2011.

Extracting PDF Documents

The fourth use case is extracting PDF documents. The websites that are archived in WARP contain many PDF files of books and periodical articles. We search for these online publications and add metadata to those that are considered significant. These PDF files with metadata are then stored into the “NDL Digital Collections” as the “Online Publications” collection. Furthermore, the metadata are harvested using OAI-PMH by “NDL Search” which is an integrated search service of catalogs including libraries, archives, museums, academic institutes in Japan, so that curators can find PDF files using conventional search methods. 1,400,000 PDF files cataloged in 1,000,000 records are already available online. The NDL launched a new OPAC in January 2018 and it implemented the similar integrated search.

Fig. 9: Extracting PDF documents from archived websites.

Future challenges

I want to conclude this post with a short discussion of future challenges that have been lively discussed by IIPC members too.

Web archives have tremendous potential for use in big data analysis, which could be used to uncover how human history has been recorded in cyberspace. The NDL also needs to study how to make data sets suitable for data mining and how to promote engagement with researchers.

Another challenge is the development of even more robust search engine technology. WARP provides full-text search with Apache Lucene Solr, and has already indexed 2.5 billion files in the creation of indexes totaling 17 terabytes. But we are not satisfied with the search results, which contain duplicate material archived at different times and other noise. We need to develop a robust and accurate search engine specialized for web archives that uses temporal elements.

The Study of punk culture through the Portuguese Web Archive

Arquivo.ptIn the third guest blog post presenting the results of  Investiga XXIDIOGO DUARTE, introduces his study of the emergence of the Straight Edge, a drug-free punk subculture, in Portugal which was made through the web pages preserved by Arquivo.pt. Being an international and informal suburban culture, Straight Edge had in the internet one of the factors of its expansion in the second half of the nineties. This text presents a first approach to build the history of the Straight Edge culture.


Since its eruption in the second half of the 1970s, punk was characterized by a multiplicity of derived experiences and expressions that defied the simplistic and sensationalist picture often portrayed of a self-destructive movement (due to the drug and alcohol excesses of some of its members). One of those expressions with a significant growth and impact was Straight Edge.

Sober punk: “I’ve got better things to do

Born in the beginning of the 1980’s in Washington D.C., U.S.A., by the voice of one of the most emblematic bands of punk-hardcore history, Minor Threat, Straight Edge was one of the answers to that self-destructive spiral. Besides the refusal to consume addictive substances, vegetarianism and animal rights became strongly associated with Straight Edge lifestyle since its beginning.

Minor Threat lyrics quickly found echo in a number of individuals that identified themselves with punk rebelliousness and the raw energy of its loud and fast music but that were not feeling attracted to some of its common behaviors. In a short notice, Straight Edge was reclaimed as an identity by a growing number of bands and individuals all over the United States.

The explosion of Straight Edge in Portugal

In Portugal, this punk subculture started to explode in the beginning of the 1990s, with X-Acto, the first Straight Edge band, appearing in 1991. Through this decade, Straight Edge never stopped to grow, with more and more bands and individuals reclaiming its principles to guide their lives.

Fig. 1: X-Acto website preserved by Arquivo.pt

In the second half of the 1990s, Internet became of the of main platforms of communication within the Straight Edge community. Making it easier to spread its ideas and events among a larger audience, the internet created a new space of sociability complementary to the concerts and other meeting spaces.

The growth of the Straight Edge culture reflected some of the social and political dynamics of the Portuguese society that emerged during the 1990s, but it also contributed to accelerate those changes, particularly through its interventional and strongly politicized characteristics.

Anti-consumption, anti-capitalism, anti-racism, feminism, ecology and, especially, veganism and animal rights were some of the causes more actively promoted by the Straight Edge followers.

As a predominantly suburban culture, informal and absent of any institutional structure, based in the punk Do It Yourself ethics, Straight Edge remained underground, without any media or public visibility. Information circulated through concerts, through independent distributors and, with the Internet, online through online forums, websites or blogs.

The importance of web archives to the study of popular subcultures

Fig. 2: Founded in 1998, StraightEdge.pt was the most important website of the Straight Edge community in Portugal.

With the slowing down of the movement during the early 2000’s, much of the information available online that documented the existence of this culture disappeared – in some cases irretrievably – without having been preserved in traditional archives or without leaving a trace in institutional media.

Thus, the possibilities of studying the Straight Edge culture and its impact on the Portuguese society were severely reduced. Arquivo.pt recovered and archived many of those pages and re-opened the possibility of studying then.

The websites preserved by Arquivo.pt were the basis of this research. Through them, we observed Straight Edge’s eruption, expansion, consolidation and decline in Portugal and analyzed the changes that occurred in its internal dynamics, in its main concerns and the splits that traversed it (firstly, in its relation to punk culture in general, and then inside the Straight Edge scene itself).

This study provided a glimpse into the potential that web archives offer for the study of almost any contemporary culture, providing a new source of information for social groups and events that are usually underrepresented in traditional archives.

Without web archives, the study of the eruption of the Straight Edge culture in Portugal would have been impossible, just a few years after it happened.

In the Internet age, the same applies to a lot of different phenomena, even to those widely studied. Undeniably, research using web archives implies new methodological and epistemological challenges, but the main challenge is also an opportunity to find new perspectives and new study objects.

Learn more

About the author:

Diogo Duarte is a researcher at the Institute of Contemporary History (FCSH-NOVA) and a Doctorate student in the same institute, with a thesis about the history of anarchism in Portugal.