Contribute to CDG’s AI Collection!

By Tiiu Daniel, Web Archive Leading Specialist, National Library of Estonia

“Trurl” by Daniel Mróz, from The Cyberiad by Stanisław Lem (Wydawnictwo Literackie, Kraków, 1972). Illustration copyright © 1972 Daniel Mróz. Reprinted by permission.

After significant breakthroughs at the end of the 20th and at the beginning of 21st centuries, artificial intelligence (AI) has played a greater role in our daily lives. Although AI has a huge positive impact on a variety of fields such as manufacturing, healthcare, art, transportation, retail and so on, the use of new technologies also raises ethical issues as well as security risks. One critical and hotly debated issue is the impact of ongoing automation on labor markets, to include changing educational requirements for jobs, job elimination, and various models for transitions.

The IIPC Content Development Group invites curators and web archivists around the world to contribute websites to a new “Artificial Intelligence” web collection.

The purpose of this collection is to bring together and record web content related to use of AI and its impact on any possible aspect of life, reflecting attitudes and thoughts towards it, future predictions etc.

The content can be in any language focusing on specific countries or cultures or have a global scope.

We especially welcome contributions from underrepresented countries, cultures, languages and other groups, or those countries without IIPC members. Curators currently building AI related collections at their own institutions are welcome to contribute their seeds (matching below criteria) to aid in the development of a collection with an international perspective.

The collection aims to cover the following subtopics:

  • Machine learning, natural language processing, robotics, automation;
  • AI in literature, visual arts (e.g. ceramics, drawing, painting, sculpture, design, photography, filmmaking, architecture) and performing arts (e.g. theater, public speech, dance, music etc.); AI in emerging art forms;
  • AI and law/legislation;
  • Social and economic impact (e.g. impact on behavior/interaction, bias in AI, unemployment, inequality, changes in labor markets);
  • Ethical issues (e.g. weaponization of AI, security, robot rights);
  • Future predictions/scenarios concerning AI.

Types of web content to include are personal forms such as blogs, forum posts, and artist websites; trend reports, statements, and analyses (i.e. from government agencies, NGOs, scientific or academic institutions, advocacy groups, businesses).

Time frame covered by content: from the 1990s onwards.

Out of scope are: full social media feeds and channels (Facebook, twitter, Instagram, YouTube, WhatsApp), user’ video channels (YouTube, Vimeo), apps and other content which is difficult or impossible to crawl.

That said, if you locate individual social media posts of unique value, such as an Instagram post by a bot or a particularly relevant and ephemeral individual video, please submit them for consideration.

Nominations are welcomed using the following form.

The call for nominations will close on the 30th of June 2019. Crawls will be run during the summer 2019. Collection will be made available at the end of 2019.

 For more information about this collection, contact Tiiu Daniel (tiiu.daniel[at]nlib.ee).


Lead-Curators of CDG Artificial Intelligence Collection
Tiiu Daniel, Web Archive Leading Specialist, National Library of Estonia
Liisi Esse, Ph.D. Associate Curator for Estonian and Baltic Studies Stanford University Libraries
Rashi Joshi, Reference Librarian /Collections Specialist, Library of Congress

CDG Co-Chairs
Nicola Bingham, Lead Curator Web Archiving, British Library
Alex Thurman, Web Resources Collection Coordinator, Columbia University Libraries

Contribute to CDG’s Climate Change Collection!

By Kees Teszelszky, Curator Digital Collections, Koninklijke Bibliotheek – National Library of The Netherlands and Lead Curator, CDG Climate Change Collection

Climate change is one of the most urgent and hotly debated issues on the web in recent years. The IIPC Content Development Group is inviting all curators and web archivists from around the world to contribute websites to a new collaborative “Climate Change” collection.

Breiðamerkurlón
Breiðamerkurlón, Iceland

In recent decades there is has been strong evidence that the earth is experiencing rapid climate change, characterized by global temperature rise, warming oceans, shrinking ice sheets, glacial retreat, decreased snow cover, sea level rise, declining arctic sea ice, extreme weather events, and ocean acidification. Ninety-seven percent of climate scientists agree that these climate-warming trends over the past century are very likely due to human activities, and most of the leading scientific organizations worldwide have issued public statements endorsing this position (source: climate.nasa.gov/evidence). Global and local action to mitigate this crisis has been complicated by political, economic, technical, cultural, and religious debates.

Many people feel the urge to reflect on this topic on the web. We would like to take an international snapshot of born digital culture relating to documentation of and social debate on the challenging issue of climate change. You can contribute to this collection by nominating web content about any aspect of climate change, and the content can be focused on specific countries or cultures or have a global focus, and can be in any language.

We especially welcome contributions from underrepresented countries, cultures, languages and other groups, or those countries without IIPC members. Curators currently building climate change related collections at their own institutions are welcome to contribute their seeds (matching below criteria) to help us build a collection with an international perspective.

Examples of subtopics might include climatology, climate change denial, climate refugees, religious reflections on climate change, etc. Eligible types of web content include organizational reports or statements (i.e. from government agencies, NGOs, scientific or academic institutions, advocacy groups, political parties/platforms, businesses, religious groups) or more personal forms such as blogs or artistic projects.

Out of scope are: social media feeds (Facebook, Twitter, Instagram, YouTube channels, WhatsApp), video (YouTube, Vimeo), apps and other content which is difficult or impossible to crawl.

Collecting seeds started on 1 April 2019 and more nominations can be added to this spreadsheet. Crawls will be run during the summer of 2019, to conclude shortly after the upcoming UN Climate Action Summit on 23 September 2019.

Organized by the IIPC and supported by web archivists around the world, the special web collection ‘Climate Change’ is one of the ways the IIPC helps raise awareness of the strategic, cultural and technological issues which make up the web archiving and digital preservation challenge.

For more information about this collection contact Kees Teszelszky for more details: kees.teszelszky[at]kb.nl

IIPC Content Development Group: 2019 collections

By Nicola Bingham, Lead Curator Web Archiving, British Library and Co-Chair of the Content Development Working Group

During 2019, the Content Development Group (CDG) will continue to work on several established collections: 

New for 2019, the CDG is undertaking a Climate Change Collection, led by Kees Teszelszky of  the National Library of the Netherlands. The first crawl will take place before the General Assembly & the Web Archiving Conference in June, with a final crawl shortly after the next UN Climate summit in September. This collection has sparked a lot of interest on the CDG mailing list and many curators have expressed an interest in contributing.

We are also planning an Artificial Intelligence Collection, led by Tiiu Daniel of the National Library of Estonia, Liisi Esse of Stanford University Libraries and Rashi Joshi of Library of Congress. The details are still to be firmed up.

We are planning to crawl one of our collections, or a subset of a collection, in order that it can be used by researchers.

Collaborate to develop web archive collections with Cobweb!

By Kathryn Stine, Manager, Digital Content Development and Strategy at the California Digital Library

Cobweb is a recently launched collaborative collection development platform for web archives, now available for anyone to use to establish and participate in web archiving collecting projects at https://cobwebarchive.org. A cross-institutional team from UCLA, the California Digital Library (CDL), and Harvard University has developed Cobweb, which was made possible in part by funding from the United States Institute for Museum and Library Services and initially hosted by CDL. We’ve been encouraged by the enthusiasm and engagement that’s met Cobweb and look forward to supporting a range of collaborative and coordinated web archiving collecting projects with this new platform.

Peter Broadwell & Kathryn Stine introducing CobWeb at the Web Archiving Conference in Wellington (slides).

At the 2018 IIPC Web Archiving Conference in New Zealand, Cobweb tutorial attendees played with Cobweb functionality and provided useful feedback and ideas for platform refinements and future feature options. Thank you to all who have shared their suggestions for advancing Cobweb! A number of demonstration projects are now on the platform that showcase how Cobweb supports web archiving collection development activities, including nominating web resources to a project and claiming intentions for, and following through with, archiving nominated web content. Additionally, the extensive Archive of the California Government Domain (CA.gov) has been established as a Cobweb collecting project and the CA.gov team is considering how to integrate Cobweb into its collection development workflows.

Cobweb centralizes the often distributed activities that go into developing web archive collections, allowing for multiple contributors and organizations to work together towards realizing common collecting goals. The coordinated activities that result in rich, useful web archive collections can draw upon distinct areas of expertise or capacity including subject specialization, technical facility with content capture, and resources for storing and managing content. The Cobweb platform is well-suited to supporting curated and crowdsourced collection building, from complex, multi-partner initiatives to local efforts that require coordination, such as that between digital archivists and library subject selectors.

If you have web archiving collecting goals that can benefit from engaging in collaborative and/or coordinated participation, learn more about getting started with Cobweb by visiting https://cobwebarchive.org/getting_started, checking out the Cobweb presentation from the IIPC WAC, or by emailing cobwebarchive[at]gmail.com.

The Study of punk culture through the Portuguese Web Archive

Arquivo.ptIn the third guest blog post presenting the results of  Investiga XXIDIOGO DUARTE, introduces his study of the emergence of the Straight Edge, a drug-free punk subculture, in Portugal which was made through the web pages preserved by Arquivo.pt. Being an international and informal suburban culture, Straight Edge had in the internet one of the factors of its expansion in the second half of the nineties. This text presents a first approach to build the history of the Straight Edge culture.


Since its eruption in the second half of the 1970s, punk was characterized by a multiplicity of derived experiences and expressions that defied the simplistic and sensationalist picture often portrayed of a self-destructive movement (due to the drug and alcohol excesses of some of its members). One of those expressions with a significant growth and impact was Straight Edge.

Sober punk: “I’ve got better things to do

Born in the beginning of the 1980’s in Washington D.C., U.S.A., by the voice of one of the most emblematic bands of punk-hardcore history, Minor Threat, Straight Edge was one of the answers to that self-destructive spiral. Besides the refusal to consume addictive substances, vegetarianism and animal rights became strongly associated with Straight Edge lifestyle since its beginning.

Minor Threat lyrics quickly found echo in a number of individuals that identified themselves with punk rebelliousness and the raw energy of its loud and fast music but that were not feeling attracted to some of its common behaviors. In a short notice, Straight Edge was reclaimed as an identity by a growing number of bands and individuals all over the United States.

The explosion of Straight Edge in Portugal

In Portugal, this punk subculture started to explode in the beginning of the 1990s, with X-Acto, the first Straight Edge band, appearing in 1991. Through this decade, Straight Edge never stopped to grow, with more and more bands and individuals reclaiming its principles to guide their lives.

Fig. 1: X-Acto website preserved by Arquivo.pt

In the second half of the 1990s, Internet became of the of main platforms of communication within the Straight Edge community. Making it easier to spread its ideas and events among a larger audience, the internet created a new space of sociability complementary to the concerts and other meeting spaces.

The growth of the Straight Edge culture reflected some of the social and political dynamics of the Portuguese society that emerged during the 1990s, but it also contributed to accelerate those changes, particularly through its interventional and strongly politicized characteristics.

Anti-consumption, anti-capitalism, anti-racism, feminism, ecology and, especially, veganism and animal rights were some of the causes more actively promoted by the Straight Edge followers.

As a predominantly suburban culture, informal and absent of any institutional structure, based in the punk Do It Yourself ethics, Straight Edge remained underground, without any media or public visibility. Information circulated through concerts, through independent distributors and, with the Internet, online through online forums, websites or blogs.

The importance of web archives to the study of popular subcultures

Fig. 2: Founded in 1998, StraightEdge.pt was the most important website of the Straight Edge community in Portugal.

With the slowing down of the movement during the early 2000’s, much of the information available online that documented the existence of this culture disappeared – in some cases irretrievably – without having been preserved in traditional archives or without leaving a trace in institutional media.

Thus, the possibilities of studying the Straight Edge culture and its impact on the Portuguese society were severely reduced. Arquivo.pt recovered and archived many of those pages and re-opened the possibility of studying then.

The websites preserved by Arquivo.pt were the basis of this research. Through them, we observed Straight Edge’s eruption, expansion, consolidation and decline in Portugal and analyzed the changes that occurred in its internal dynamics, in its main concerns and the splits that traversed it (firstly, in its relation to punk culture in general, and then inside the Straight Edge scene itself).

This study provided a glimpse into the potential that web archives offer for the study of almost any contemporary culture, providing a new source of information for social groups and events that are usually underrepresented in traditional archives.

Without web archives, the study of the eruption of the Straight Edge culture in Portugal would have been impossible, just a few years after it happened.

In the Internet age, the same applies to a lot of different phenomena, even to those widely studied. Undeniably, research using web archives implies new methodological and epistemological challenges, but the main challenge is also an opportunity to find new perspectives and new study objects.

Learn more

About the author:

Diogo Duarte is a researcher at the Institute of Contemporary History (FCSH-NOVA) and a Doctorate student in the same institute, with a thesis about the history of anarchism in Portugal.

Today’s news to be forgotten tomorrow?

Arquivo.pt
Research financed by Fundação para a Ciência e a Tecnologia, SFRH/BGCT/135017/2017

A study about the transformations of newspaper websites can only be carried out because there are web archives preserving materials that the newspapers themselves do not preserve or provide. In the second guest blog post in the series showcasing Investiga XXIDIOGO SILVA DA CUNHA, University of Lisbon, presents the results of his project focusing on transformations of this kind in four Portuguese newspapers using Arquivo.pt.


The transition to what is referred to as Digital Age and Information Society implied a great transformation which continues to take place at several levels. The professionals of the various communication sectors are now confronted at the forefront with new conditions to perform their work.

An important change occurred at the level of the support of journalistic messages. Since the 1990s, newspapers have begun to translate their printed press editions into online editions.

Fig. 1: Detail of preserved version of Diário de Notícias website in the Arquivo.pt graphical interface, October 13, 1996.

At the end of the 90s, great importance was given to online editions, focusing part of the newsroom workflow on their update 24/7, an approach known as “web-first” or “online first”. Something was happening. Born-digital content has become an integral part of today’s journalism with some of this content being published exclusively in the newspaper’s online editions.

The disappearance of born-digital newspaper materials

It is now common to consider in the context of Communication, Media and Journalism Studies that the structure of the online newspaper websites can accumulate journalistic materials and can be consulted in the long term by both journalists and readers, according to search filters specific to such structure.

In the same line of reasoning, it seems that the expectations of journalists and other professionals linked to newspapers and media companies are similar. The existence of such expectations was confirmed in the present research on the Portuguese newspaper websites.

But, as Web Archiving Studies have been showing, there is a general trend for websites to be deeply modified or disappear within a year. In the case of newspaper websites, the problem is aggravated by the fact that they are updated at least daily and their structure as a whole, from its URL to its layout, also undergoes changes, although this happens over a longer period of time. So, although the news content produced by journalists may remain on the newspaper websites for a while, these websites end up with missing elements or they just disappear.

The transformations of Portuguese newspaper websites: a case study

Web archives can be seen as an alternative in terms of public, direct and interactive access to born-digital journalistic materials that are not preserved or that are not publicly provided by newspapers and their media companies. In this sense, a web archive becomes an information technology structure which functions as a ‘source’ in the conventional, historiographical sense of the term.

The research on the transformations of Portuguese newspaper websites, that was carried out using Arquivo.pt, focused on a longitudinal study (1996-2016) of the structure of the websites of four weekly and daily newspapers: Correio da Manhã, Diário de Notícias, Expresso and Público.

The process of describing and comparing the preserved versions of those newspapers’ homepages in Arquivo.pt enabled us to reconstruct the development trends between the different layouts and the different web addresses of these pages. From this work, we drew the following general conclusions:

  • Websites are increasingly extensive and vertically oriented;
  • Websites gradually become aesthetically cohesive, consolidating the newspaper’s visual identity;
  • Changes are increasingly less noticeable as they tend to be on the “micro” rather than  “macro” level (see Fig. 2)
Fig. 2: Detail of preserved versions of Expresso website, 2008, 2011 and 2012, respectively.
  • More embedded images and videos are used, often framed in galleries, the number of links, buttons, menus and scroll bars has also increased over time;
  • The visual changes, along with the changes of web addresses, are sometimes shaped by the relationships of the media companies with audiovisual and telecommunications companies, e.g. in the different versions shown in Fig.3, the names, colors and/or symbols of these companies are present in the user interface of the newspapers (we see Clix logo on the top left and a pink button on the top right corner in 2007 and in the 2012 capture they are replaced by the AEIOU logo).
Fig. 3: Detail of preserved versions of Expresso website, 2007 and 2012, respectively.

Future work

It is now possible to propose at least three ways to looking at the developments listed above:

  • using digital tools for detailed analysis of changes in layouts at the level of information design,
  • extending the scope of the study to the websites of other newspapers (e.g. other countries, other companies, other types of social institutions, etc.),
  • widening the scope of the study even more to confront the lines of development discovered with web publishing models beyond the spectrum of journalism (e.g. blogs).

It is also worth underlining that it is fundamental to develop a systematic reflection on the web archives as such, perceiving them not only as informatic structures, but also as ‘research infrastructures’, with their own professional and epistemic cultures. In the terms of research on web archives, the work of Niels Brügger seems to offer an excellent starting point. However, it will be crucial to consider web archives in the context of Big Data discussions around reductionist and empiricist trends in the social sciences.

A reflection of this kind would integrate web archives in discussions about ontology, epistemology, methodology, culture, economy and politics. The question would be to think of web archives not only as instruments of access to the world, not only as windows to the digital recent past, but as devices that are part of the constitution of the world, as mediating technologies with their own implications in retrospective placement, themselves part of the digitalization process.

As outlined above, it’s equally important that there is a dialogue between researchers, journalists and newspaper editorial staff. The general problem of digital preservation, especially complicated in the field of media and journalism, makes clear the need to establish digital preservation guides for journalists and editors and to promote the joint discussion of information curation initiatives, if we don’t want today’s news to be forgotten tomorrow.


Learn more


About the author:

Diogo Silva da Cunha is a PhD student of Philosophy of Science and Technology, Art and Society at the University of Lisbon. His major fields of interest are epistemology of the social sciences and communication, media and journalism studies. Diogo Silva da Cunha recently participated in a study on the digitalization process in Portuguese journalism promoted by the respective national regulatory entity. Last year, he participated in the research project of Arquivo.pt, in the context of which he proposed, developed and applied a model of analysis of journalistic material available in web archives.

Memory of the online presence of a Faculty: an exhibition

Arquivo.ptIn 2017 Arquivo.pt launched Investiga XXI, a project that aims to promote the use of the Portuguese Web Archive as a research tool and resource. In this first guest blog post introducing the Portuguese initiativeRICARDO BASÍLIO presents a collaboration between  Arquivo.pt and researchers from the Faculty of Social and Human Sciences of Universidade Nova de Lisboa (FCSH-UNL) which resulted in the creation of an online exhibition that illustrates a use case for the historical information preserved by Arquivo.pt. This exhibition highlights extracts of institutional online memories.


FCSH: 40 years of lifetime, 20 years online

FCSH was founded in 1977 and it is part of Universidade Nova de Lisboa. Since 1997, that FCSH websites have been used as communication interfaces with its community of teachers, researchers and students.

Arquivo.pt preserves web content published since 1996. Therefore, the time span of the web content preserved by Arquivo.pt covers 20 years of the institutional online memory of FCSH, that is half of the Faculty’s lifetime.

Fig. 1: The first version of the FCSH website preserved by Arquivo.pt

In the early years of the Web, the FCSH website mostly replicated printed information. However, it has gradually become a comprehensive portal to academic live at the Faculty including also news, lists of researchers, research programs or access points to services.

Research centers are important entities of the Faculty’s ecosystem. In 1997 there were 30 small research centers, but in the 2016 they were merged into 16 larger ones.

The research centers are autonomous, manage their own projects and organize specific events. This fact resulted in the creation of over 100 additional related websites serving various purposes, such as institutional communication, project descriptions and event promotions.

The online exhibition aimed to create an institutional memory through a chronological narrative built from past web pages preserved by Arquivo.pt.

Synthesizing 20 years of memories into a single page

The project began by inventorying a large number of current websites related to the Faculty activities. We subsequently narrowed our scope to include only the institutional websites leaving other ones for future work (e.g. projects and events). All the identified websites were targeted to be preserved by Arquivo.pt.

Fig. 2: Table with elements for collecting information about the preserved websites of a given organizational entity.

The data collection was performed manually through the Arquivo.pt search interfaces. We mainly searched for the hostname and analyzed the corresponding version history, noticing its main content changes and references to external websites of events and projects. The data was collected, selected and registered into a page per organizational unity (see Fig. 2).

Some research centers adopted multiple hostnames along time. On the other hand, the institutional identity may have also changed due to organizational merging, name changes or different institutional frameworks. For example, CHAM “Centro de Humanidades” (in 2017) had two previous names:  “Centro d’Além Mar” in 2002 and then changed to “Centro d’Aquém e d’Além Mar” in 2013-2014, when merged with “Centro de História da Cultura – CHC”, “Centro de Estudos Históricos – CEH” and “Instituto Oriental – IO”. Although, the hostname of the website has never changed: cham.fcsh.unl.pt.

Sometimes it was not straightforward to conclude if we were facing the same organizational entity after a merge, even when the website remained with the same title, hostname and URL. It’s hard, however, to imagine that the entity changed if everything remained the same. Therefore, our conclusions were  validated through interviews with current and previous staff of the Faculty and research centers. Hence, the importance of institutional support and direct  interaction with the entities.

Designing a time travel to the past

Fig. 3: Homepage of the online exhibition with images taken from the websites.

The objective was to create a website with a clean look and that was easy to browse. We anchored its navigation on suggestive images extracted from preserved web pages, to reinforce that it is an exhibition about online memory, rather than about current information available on the live-Web.

Thus, the homepage of the online exhibition presents a collection of preserved web images from old websites of organizational units that belonged to FCSH.

The chosen publishing platform was the free version of WordPress.com, so that anyone can create a similar project, despite a potential lack of financial resources.

By clicking on each image, the user is taken to a page that describes the online memory of each entity of the Faculty. It presents the following elements: featured image, brief synopsis, list of addresses along time and selection of mesmerizing moments.

The description of each entity has a maximum length of 150 words and includes links to versions preserved on Arquivo.pt. This interaction between the online exhibition and the web archive aims to provide the user experience of browsing an institutional memory.

Fig. 4: Description page of an entity of the Faculty.

The exhibition is complemented with frequently asked questions and tutorials related to digital preservation.

Future work, because a website is never finished

The next step is to promote this exhibition through the institutional communication channels of the Faculty (e.g. institutional website, mailing lists).

The exhibition still has plenty of room to be complemented with additional entities that could be aggregated in collections organized by topic or scientific area.

Direct interaction with research centers is mandatory as well as organization of training courses on web preservation and research to raise awareness to the importance of web archiving.

Conclusions

This project was developed in just 3 months, between May and July 2017. This short time span forced us to focus and set priorities on the most important issues. We would still be lost now choosing plug-ins if we had had more time and, however, would the extra plug-ins had actually been needed to accomplish the objectives? The users don’t seem to miss them on the exhibition.

We aimed to demonstrate that anyone could develop a similar exhibition to preserve the online memory of an organization without requiring significant financial resources or technical skills.

We hope that this project will encourage librarians and archivists to create ways of preserving the online memory of their institutions.

If we did it, you can also do it.


Learn more:

About the author:

Ricardo Basílio, has a Master in Documentation and Information Sciences, was a librarian at the Faculty of Social and Human Sciences of Universidade Nova de Lisboa, and at the Art Library of Fundação Calouste Gulbenkian, on the digital collections about portuguese tiles, the “DigiTile” project. His areas of interest are digital preservation, digital libraries and technologies that support information. Created and manages a website in Portuguese about Digital Preservation (Digital Preservation Guide).