Mark Boulton on the digital preservation of the web’s first steps
Today marks the twentieth anniversary of CERN’s published statement that made the World Wide Web technology available on a royalty-free basis, enabling the web to flourish. The system’s simplicity ensured it rapidly eclipsed rival information-retrieval systems, such as WAIS and Gopher.
The first site, now returned to the first active URL, was hosted on Tim Berners-Lee's NeXT computer and described the basic features of the web. However, Mark Boulton (MB) spoke to .net to explain that the project is about far more than uploading some files. It’s about ensuring the story behind the birth of the web is made available to future generations.
.net: Why are you embarking on this project? Why is it important?
MB: For future generations, to try to demonstrate the significance of the World Wide Web project, the circumstances under which it was proposed, and what it was for. It had very specific needs because it was all to do with projects at the time where the amount of data that was going to be collected, shared and discussed was too much for paper to handle. We also want to demonstrate the very early experience of what browsing the web was like when there were only 30 websites in the world.
There’s also a preservation aspect. We all thought it was crazy that the first ever website was no longer online at its original address given its huge cultural significance. And there’s also the hardware side of things: we’ll be restoring the original NeXT machines, ensuring there are spares to preserve them.
.net: So this isn’t just about getting some files and bunging them online — it’s a much bigger project!
MB: We’re all web nerds, and so we know this is an important thing to do, but also how trivial the process of finding and uploading files is. But actually, it’s not, because as soon as you upload files via FTP, it changes the time stamps. So they have to be put up in a particular way to preserve the time stamps. Also, the first website that’s going online is actually from around 1992. We want to discover earlier versions of the website and reinstate them on the URL as we go, and to document that process.
.net: So is this really about authenticity — making it seem as though the site had never been removed in the first place?
MB: Putting that website on that very first URL is to preserve the experience in as purer form as we can, and then it’s all the other things around it, to enable that, and to document and tell the story to different age groups and audiences, so they can learn from that in future generations. Kids are now growing up in a world where they’ve always known the internet; they never knew a time when it didn’t exist. We need to preserve the stories so they don’t just disappear.
.net: Digital preservation is now becoming a big thing. It’s sad how much history has been lost and how transient web content has been.
MB: It’s the physicality of it. There’s no waste. With old TV and film, there was physical waste after showing something — a tape that was stored in a warehouse. We never had that, but we do have old machines and servers that are magnetic media that will degrade. Once they’re ruined, it’s all gone forever, and so we need to rescue this stuff before it’s too late.
There’s also the team side of things. All the people originally involved with the web’s earliest days are getting older and moving on, no longer wanting to be involved with these sorts of discussions. We need to talk to them, get information from them and preserve it. We want to preserve the stories, the hardware and the software, and even reinstate the original infrastructure.
.net: How far do you plan to go in recreating the original experience of the web?
MB: Eventually, we’d like to recreate everything from the web’s earliest days, with its few sites and rudimentary browser with key commands. You had an 80-by-23-character space that you could use. It was a very limited window! We’re looking at possible emulation to recreate that kind of browsing experience.
.net: It’s also a timely project, what with the anniversary of CERN’s document regarding the web being in the public domain.
MB: Today is a very significant date, but people should also never forget the significance of the document. There were lots of discussions at the time whether the web could become a business and profitable, but it was released into the public domain when CERN decided it should remain focused on the science.
Really, that’s a broader goal that’s not stated in the project, to show the fundamental research CERN does every single day. It’s of huge human consequence, and projects like the World Wide Web regularly arrive, from medical research to climate-change analysis. All kinds of things that impact us in incredible ways, and it comes from researching science. That work must carry on.
.net: With the web now so ubiquitous, it’s easy to forget it was invented. People just type ‘www’ and away they go…
MB: And the fact ‘www’ in the URL was a mistake is hilarious. It was never supposed to be there! There were two websites created at CERN. Information about the company was supposed to be on info.cern.ch and the World Wide Web project site was to be www.cern.ch. They got mapped to the wrong URLs and it stuck, with ‘www’ becoming part of the URL structure. There are so many of these stories around the very first days of the web that we’ve started uncovering, and they’re going to be lost unless they’re captured, documented and told in the right way.
.net: How do you see the project evolving?
MB: It’s a big project, and we’ve stated some aims to provide a real sense of focus. However, we don’t know the scale of some of the problems yet, such as reinstating original infrastructure, preserving all of the data, and so on.
An important part of this will definitely be community involvement. We’re opening an IRC channel, and there’s Twitter, email and a survey to find out a little more about those who are visiting. This will be then be broadened so more people can get directly involved. For example, there’ll be experts out there who know the hardware inside-out, and that expertise is no longer at CERN. We could hack together a line-mode browser, but what if we could get the original scientists who worked on it to assist, or people who studied with them? That’d really help the project move along!