Sound advice - blog

Tales from the homeworld

My current feeds

Fri, 2007-Jan-05

Death and Libraries

Have you ever wondered what will happen to your when you ? Perhaps it is the influence of parenthood on my life, but I have been thinking about the topic of late. If a part of your legacy is in your blog, what will your legacy be? Perhaps have a role in guaranteeing the future of today's web.

I suspect that most bloggers haven't really thought about the problem. How long will your blog or web site last? Only as long as the money does. The monthly internet bill needs paying, or the annual web hosting fee if your hosting occurs externally. If you have that covered your domain registration will be up for renewal in less than two years. Perhaps you don't have a vanity domain. Maybe you are registered with blogger. This kind of blog is likely to last a lot longer, but for how long? Will your great grandchildren be able to read your blog? Will their great grandchildren? Will your great great great granchildren be able to leave new comments on your old material?

Blogs are collections of resources. Resources demarcate state, and return representations of that state. These representations are documents in particular formats, such as HTML4. So in addition to the question of whether the resources themselves will be durable we must consider how durable the document formats used will be. We may even have to look at whether HTTP/1.1 and TCP/IPv4 will be widely understood a hundred years from now.

The traditional way to deal with these sorts of longevity problems is to produce hard copies of the data. You could print off a run of 1000 bound copies of your blog to be distributed amongst interested parties. These parties might be your descendants, historians who think you have some special merit in the annuls of mankind, and perhaps most universally: Librarians who wish to maintain a collection of past and present thought.

We could attempt the same thing with the web, however the web maps poorly to the printed word given the difficulty of providing appropriate hyperlinks. It also rests on the notion that the person interested in a particular work is geographically close to the place that it is housed, and can find it through appropriate means. Let us consider another possibility in the future networked world. Consider the possibility that those with an interest in the works host the works from their own servers.

Consider the cost of running a small library today. If all data housed in the library eventually became digital data, that data could be distributed anywhere in the world for a fraction of the cost of running a library today. We already see sites like the wayback machine attempting to record the web of yesteryear, or google cache trying to ensure that today's content is reliably available. Perhaps the next logical step is for organisations to start hosting the resources of the original site directly. After all, there is often as much value in the links between resources as there are in the resource content itself. Maintaining the original urls is important. Perhaps web sites could be handed over to these kinds of institutions to avoid falling off the net. Perhaps these institutions could work to ensure the long survival of the resources.

The technical challenges of long-term data hosting are non-trivial. A typical web application consists of application-specific state, some site-specific code such as a PHP application, a web server application, an operating system, physical hardware, and a connection to an ISP. Just to start hosting the site would likely require a normalisation of software and hardware. Perhaps an application that simply stores the representations of each resource and returns them to its clients could replace most of the software stack. The connection to the ISP is likely to be different, and will have to change over time. The application protocols will change over the years as IPv6 replaces IPv4 and WAKA replaces HTTP (well, maybe). The data will have to hop from hardware platform to hardware platform to ensure ongoing connectivity, and from software version to software version.

If all of this goes to plan your documents will still be network accessible long after your bones have turned to dust. However this still assumes the data formats of today can be understood or at worst translated into a form that is equivalent into the future. I suggest that we have already travelled a few decades with HTML, and that we will travel well for another few decades. We can still read the oldest documents on the web. With good standards management it is likely this will still be the case in 100 years. Whether the document paradigm that HTML sits in will still exist in 100 years is another question. We may find that these flat documents have to be mapped into some sort of immersive virtual envrionment in that time. The librarians will have to keep up to date with these trends to ensure ongong viability of the content.

I see the role of librarian and of system administrator as becoming more entwined in the future. I see the librarian as a custodian for information that would otherwise be lost. Will today's libraries have the foresight to set the necessary wheels in motion? How much information will be lost before someone steps in and takes over the registration and service of discontinued domains?

Benjamin