DIY Web Archiving

illustration pour la chronique "archivage web"

Archiving media arts is complex. Film and video benefit from a network of institutions (notably through the International Federation of Film Archives) that are dedicated to their preservation and that have developed common practices and vocabularies as well as a good knowledge of the properties of the media. In the digital arts, practices are characterized by plurality and the rapid evolution of devices, media and formats. Initiatives for the preservation of media artworks have been developed by several groups (Variable Media Network, DOCAM research alliance, ALN/NT2, Initiative for Indigenous Futures, Rhizome, V2_Lab for the Unstable Media).

In this column, I will approach the question of web archiving with a very practical (DIY) perspective, by proposing a feedback on the Conifer and ArchiveWeb.page tools developed by Rhizome (a New York organization working for the valorization and preservation of net art) and Webrecorder. Artists and arts organizations can use these tools to archive any type of web content: official website, web art, virtual exhibitions, interventions on social networks, etc.

Archiving Lab à lab

At CQAM, we have archived Labàlab wiki, a project initiated by the digital arts committee between 2013 and 2015. The methodology, values and ideas conveyed through Lab à lab are still relevant in 2021 and, even if we wanted to stop hosting the site on the internet, it was important for us to keep a substantial archive of the project.

In order to do this, we tested Conifer, a tool that allows interactive capture of web pages in the standard ISO WARC format. This type of archive has the advantage of preserving the style and interactions built into the web pages and providing the viewer with an experience close to the original. The WARC web archive can be viewed online, via Conifer’s hosting service (free accounts currently enjoy 5G of storage space) or via web applications (such as Replay.Web). Finally, the capture tool can run remotely on older browsers, which is very useful for preserving web pages with technology components that are no longer supported by current browsers (e.g. Flash).

First experience with web archiving

Preliminary readings

Before I started archiving the Lab à lab website, I read the entire Conifer user guide. This allowed me to understand the scope and limitations of the tool, its interface and some of its features.

Then, I had the chance to consult the brand new one-page guide on the subject prepared by Hélène Brousseau, digital collection and systems librarian at Artexte. This guide, available in Artexte’s Toolbox, presents a set of questions and issues to consider when planning a web archiving session.

Test 1: Creating a Conifer Account

Once I was ready to start, I created a free Conifer account for CQAM. I was already able to run a few capture tests to familiarize myself with the tool.

These initial tests quickly revealed a problem, which a re-reading of the Conifer guide helped me identify. The Conifer web application works from servers located in the United States and this fact caused problems in the capture and display of several web pages of the Lab à lab site (for example: http://labalab.ca/À+propos+of+LabàLab). If my hypothesis is correct, it is the presence of accented characters in several URLs of the website that was giving the tool trouble. To remedy the problem of the location of Conifer’s servers, the guide suggests the use of a local application, ArchiveWeb.page Desktop app, allowing us to make our captures from our own location.

Test 2: Downloading ArchiveWeb.page

So I downloaded the application, consulted the user guide (the functionalities are very similar to Conifer because the two tools share a common ancestor: Webrecorder) and proceeded to a second series of tests, which fortunately proved conclusive! The local application, running from my own computer and my own internet network, was able to capture web pages identified by URLs with accented characters without any problem.

Planning

With the help of Artexte’s guide, I planned my capture sessions according to the contents we wanted to preserve. Lab à lab being a website with few pages, we decided to archive them all. However, we could have selected some more significant sections or, on the contrary, followed the hyperlinks pointing to external resources to archive them too. A clear definition of the boundaries of our archive was useful to plan the capture sessions and to make the necessary verification and adjustments to fulfill our objective.

Creating the Collection + Capture Sessions

I created the “Projet Lab à Lab (2013-2015)” collection in my capture tool and associated all my subsequent capture sessions with it. I recommend going systematically through the captures, going section by section and page by page within each of those sections. Note that capturing audio-visual content, such as video, will require the content to be played over its entire length, which can require a considerable investment of time. The work of capturing is obviously somewhat repetitive (meditative?) and requires a good dose of patience. A reasonable level of concentration also helps to keep track.

Review and patches

When the capture sessions are completed, it is a good idea to view the entire collection to make sure that all the desired pieces are there. In my case, several pages were missing. I can’t say if it was the tool that failed during my capture sessions or if it was me that went too quickly over some pages, without giving the tool time to capture all the elements that compose it.

Once the missing pages (or elements) were identified, I proceeded to some targeted capture sessions. It is possible to capture one page at a time and to check in the collection, with the search tool, if the corresponding URL is there. I ended my capture sessions when all the pages to be archived were found in my collection.

Uploading to Conifer

As our intention was to keep a public archive of Labàlab, we made the choice, for the moment at least, to host it on Conifer. Since I used a local application to produce the archive, I first had to download it in WARC format on my computer, and then upload the file in the eponymous collection on CQAM’s Conifer account. In any case, it is also recommended that you keep a copy of the WARC file in your archive, whether it is on your computer, an external hard drive and/or a cloud-based archive system.

Organization

By default, Conifer attempts to locate URLs that correspond to web pages among all captured items. This automatic indexing is recorded in a list called “Pages detected”. The lists form different access points to the archived web pages. You are free to organize your archive with one, several or no lists. If you don’t prepare any lists, people who want to browse your collection will be able to navigate directly through all the resources. For the Labàlab collection, we have prepared a few lists that correspond to the main sections of the website menu.

Finally, you can edit the title of your collection and its description in order to contextualize the archive you have built.

Publication

The last step, if that is your intention, is to make your collection public and share the URL! You can find the Labàlab archive at this address: https://conifer.rhizome.org/CQAM/projet-lab-à-lab-2013-2015

Verdict

Conifer and ArchiveWeb.page are two tools that, while imperfect and with some limitations, address an important issue for the arts community with a certain simplicity. The ease with which they allow artists and small organizations to preserve web pages in all their complexity (and in a standardized format) is in my opinion quite remarkable. There are other software and other methods for archiving the web, but these two options are probably the most accessible for our community.

If you wish to discuss web archiving further, you are welcome to contact me 🙂

Best,

Isabelle L’Heureux