Home, National Library of Ireland
Menu
Blog Archive

Error 404? The horror, The horror!

Tuesday, 14 August 2012

Have your say on Web Archiving at the National Library of Ireland

by Catherine Ryan, Digital Collections Student

While it probably hasn’t quite made it into the Oxford English dictionary yet, the 404 File Not Found web error message has certainly moved beyond the realm of the technical and technical slang, and has now come to mean that someone is stupid, clueless, lost or ineffectual. ‘You’re so 404’ is probably not a very nice thing to say to someone but it’s a phrase that stands a greater chance of being heard now than it did a few years ago thanks to the ubiquity of the Web and the ever-dreaded 404 error.

One of the more imaginative 404 pages from the folks at Lileks.com

One of the more imaginative 404 pages from the folks at Lileks.com

One of the more imaginative 404 pages from the folks at Lileks.com

But what exactly is this 404 error message, I hear some ask? You are the lucky ones! Simply put, it is the message that you get when the item you were looking for on the Internet is no longer in the place that it was. In some cases it can be found under a different URL address; in others it has disappeared from the Internet entirely, becoming permanently unavailable. It is this disappearance of so much online material that is, at its core, the reason for web archiving and it is becoming more and more important as the volume of information published on the Web continues to rise. As Kalev H. Leetaru of the University of Illinois put it on a Library of Congress blog:

"... We’ve reached an incredible point in society. Every single day a quarter-billion photographs are uploaded to Facebook, 300 billion emails are sent and 340 million tweets are posted to Twitter. There are more than 644 million websites with 150,000 new ones added each day, and upwards of 156 million blogs. Even more incredibly, the growth rate of content creation in the digital world is exploding..."

It is an incredible record of our times and much of it disappears as quickly as it goes up. The disappearance of these materials is not always signified by the existence of a 404 error message. Some sites may just update their materials, replacing the old with the new.  Other websites simply vanish off the Internet, with the average lifespan of a website around 2003 estimated at only 100 days. To have any hope of preserving even a part of this record, we need to select, collect and archive what we can now before it vanishes forever.

To tackle this problem and try to preserve what we could, we initiated two web archiving pilot projects here at the National Library in 2011 in partnership with the Internet Memory Foundation. The first project was based on the General Election of 2011 and you can read about that here.

The President-elect, now President Michael D. Higgins celebrates his election victory on Monday, 14 November 2011 - from TheJournal.ie

The President-elect, now President Michael D. Higgins celebrates his election victory on Monday, 14 November 2011 - from TheJournal.ie

The President-elect, now President Michael D. Higgins celebrates his election victory on Monday, 14 November 2011 - from TheJournal.ie

Our second project was based on the 2011 Presidential Election. Just as with the General Election Web Archive, we looked at candidate sites, political party sites, official government sites as well as online commentary such as blogs and forums. We also tried to capture online news sites such as TheJournal.ie, RTE.ie, regional newspapers and some sites relating to Oireachtas Inquiries and Judicial Pay referenda. The archive itself is now accessible and can be viewed on Internet Memory Foundation’s website.

Not to be outdone, 2012 is also seeing its fair share of web archiving here as we've been working on an archive for the Fiscal Treaty Referendum in addition to a few other events and anniversaries such as the Titanic, James Joyce and Bloomsday, and of course, the football. This archive of 40 sites will be ready for viewing over the next few weeks.

Archiving the Presidential Election 2011

Archiving the Presidential Election 2011

Archiving the Presidential Election 2011

And we’re not done yet! We have been planning a general archive for sites of Irish interest for a little while now and hope to start crawling the sites over the next few weeks. Our problem? We don’t know everything about everything out there. We’re only human after all and, while we have received some excellent site suggestions from staff, we are fairly sure that something, somewhere on the Web has escaped our attention. Probably more than a few things if we are to be honest! So we want you to have your say as well. What blogs do you enjoy reading? Are there any local events in your area that have a good website that you would like to see kept? Or what are your personal interests? What part of the Web do you hang out in that you want to see recorded for future generations? The websites should be of Irish interest and subjects can range from well-known to lesser-known bloggers, from architecture to Irish language sites, poetry, politics and the media, and from cultural, literary or sporting events to websites with visual materials of note. The list is not prescriptive as the Web contains so much of interest.

We can’t archive everything so we hope you can help us decide what to keep by entering your suggestions in our survey before 5 p.m. on this Friday, 17th August!

Alas, as with any project, there are a few limitations to what we can do so we can’t guarantee that every suggestion will make it into the next archive. Please do bear the following in mind if your suggestion doesn’t appear in the archive:

  • Copyright: legal deposit in Ireland does not extend to any digital publications so we must seek permission from every site owner in order to archive their site. It’s unusual for us so far but they may say no!
  • A greater problem for us to date is the ability of the crawler to capture more sophisticated and dynamic web content. If a site is full of Javascript, if it is made up of databases that need to be queried, if it is full of audio, RSS feeds, YouTube or Flash elements then we won’t be able to capture that site very well, if at all. As each site costs money, we may decide not to risk a failed capture.
  • We may have received a number of submissions on the same topic and decide to leave a particular site out for the moment.
  • You may have submitted a website on a topic that we are planning for a later themed archive. We have tentatively pencilled in archives on Irish writers, local history and genealogy, Irish theatre, and the Irish economy among others. If you want to add a suggestion now, it will be saved for future consideration.

So, with all that said, if you think you know of a suitable website, please do fill out our survey by Friday (which really means you have all weekend too to get your thinking caps on!) and have your say on web archiving at your National Library!