Human scale web collecting for individuals and institutions (Webrecorder workshop)

By Anna Perricci, Rhizome

Web archiving ‘at scale’ is usually equated to collecting with automated software (a web crawler) but an assumption that more information is equated to more value is not always right, especially with web archives. A massive scope or scale isn’t required to make meaningful, useful web archives. Collecting at a ‘human scale’ can be as good or better for forming certain collections.

Webrecorder is a free, easy to use, browser based web archiving tool set provided by Rhizome. Rhizome, an affiliate of the New Museum in New York City, champions born-digital art and culture through commissions, exhibitions, digital preservation, and software development. Webrecorder’s development has been generously supported by the Andrew W. Mellon Foundation.

With Webrecorder you can make high fidelity interactive captures of web content as you browse web pages. A “high fidelity capture” means that from a user’s perspective there is a complete or high level of similarity between the original web pages and the archived copies, including the retention of important characteristics and functionality such as: video or audio that requires a user to press ‘play’, or resources that require entry of login credentials for access (e.g. social media accounts). Webrecorder can capture most types of media files, JavaScript and user-triggered actions, which are things that most crawlers struggle with or are unable to obtain.

Workshop attendees will be given an overview of Webrecorder’s features, then engage in hands-on activities and discussions. Further instruction will alternate with opportunities for participants to use the tools introduced and share their thoughts or questions. Instructions on how to manage the collected materials, download them (as a WARC file), and open a local copy offline using Webrecorder Player will also be covered in this workshop.

Human scale web collecting with Webrecorder is not expected to meet all the requirements of a large web archiving program but can satisfy many needs of researchers or smaller web collecting initiatives. Webrecorder can be a great tool for personal digital archiving projects as well. Larger web archiving programs can benefit from using Webrecorder to capture dynamic content and user-triggered behaviors on websites. The WARC files created with Webrecorder can be downloaded and ingested to join WARCs that have been created using crawler-based systems.

With a tool like Webrecorder anyone can get started with web archiving quickly at no cost, which is empowering both to any information professionals and their stakeholders.

On November 14th you can also learn more about Webrecorder in an afternoon session entirely focused on Webrecorder and high fidelity web archiving. This time will start with a 30 minute presentation on Python Wayback (pywb), a core component of Webrecorder, by pywb’s creator and Webrecorder’s lead developer, Ilya Kreymer. Then there will be a 1 hour panel on capturing complex websites and publications using Webrecorder with Jasmine Mulliken, Sumitra Duncan, Nicole Coleman, and me (Anna Perricci).

Whether you are a seasoned expert or newer to web archiving I hope you will be able to join us for the session and this workshop on November 14th at the IIPC WAC. The limit on the number of workshop attendees has been removed so please feel welcome to register.