BBC NEWS Americas Africa Europe Middle East South Asia Asia Pacific Arabic Spanish Russian Chinese Welsh

 You are in:  UK
Front Page 
Northern Ireland 
UK Politics 
Talking Point 
In Depth 

Commonwealth Games 2002

BBC Sport

BBC Weather

Wednesday, 27 March, 2002, 12:46 GMT
The internet. Volume One
Library shelf
Books are one thing, but try cataloging the net
test hello test
By Jonathan Duffy
BBC News Online
Millions of web pages disappear every year, as sites are updated or go out of business. Now there is a project to save a slice of Britain's internet heritage.
The internet is truly an awesome beast. In the UK alone, some 60,000 new domain names are registered every month, and that's just those with a ".uk" suffix.

Many thousands of sites go up and come down, and are hardly even noticed. The average webpage disappears after just 60 days, according to the Digital Preservation Society.

Stephen Bury
Stephen Bury has an overwhelming task
To a librarian, whose whole role in life is to preserve information, this is the stuff of sleepless nights.

But now the British Library is having a stab at preserving websites, as it has done with books for about 200 years.

Since June last year, the library has been running a pilot project to archive 100 British websites. Now it wants to expand the scheme, called Domain UK, to permanently keep tabs on 10,000 sites of social and historical importance.

Tidal wave of data

The digital age has been a big headache to librarians.

The internet archive
The British Library is seeking copyright clearance to make its archive available to the public
"It feels like swimming against an incoming tide," says the British Library's Clive Field.

Before computers, the job of archiving information was relatively straightforward. Publishers of books, magazines, periodicals and newspapers sent copies of everything they produced to the British Library, which filed them away for future reference.

But no such obligation has been established for digital information.

The result is that thousands of web pages and websites, which one day could be seen as invaluable documents of our time, are disappearing.

Lost forever

"It's hard to know which sites we've missed, because of course we don't know about them," says Dr Field. "But some of the early pages put up after 11 September, those of bereaved relatives and friends, have gone already.

The British Library
Keeping up with the age: The British Library
"To future historians these could have been fascinating for the accounts of authentic, personal experiences of all the people who went through that."

The British Library had been talking for some time about the need to save web material, but it finally sprang into action during last year's general election campaign.

"We found 79 sites dedicated to the election and we thought we've got to hold on to these somehow," says Stephen Bury, who runs the Domain UK pilot project.

Rich variety

"The debate that goes on during an election will always be historically and culturally important, and increasingly this is going on on the net."

The archive started during the general election campaign
Next time Tony Blair goes to the country, Mr Bury expects there to be about 5,000 dedicated sites.

Mr Bury was in charge of selecting the 100 sites that have been monitored for the pilot project, which helps explain the presence of - the internet home of Mr Bury's favourite football team, Blackburn Rovers.

Others selected include, a site dedicated to a London taxi which used to double up as a mobile art gallery; and, a literary site.


"We were looking for a cross section of British sites, ones that represented a wide variety of interests and subjects. So, for example, we chose the Soil Association site and also the Monsanto site, to see how the debates on GM foods matched up."

Blackburn Rovers site
British Library
So far the project has been fairly manageable, says Mr Bury, with regular "snapshots" of the sites being taken to ensure all pages are preserved.

But if the expansion plans go ahead - this depends on a 600,000 grant from the government - he could have his work cut out.

As well as regularly archiving 10,000 sites, the plan is to take half-yearly snapshots of the whole ".uk" domain - which currently runs to about 25 million web pages.

The challenge then is to catalogue them all so future generations will know how to navigate this mass of information.

Internet links:

The BBC is not responsible for the content of external internet sites

Links to more UK stories are at the foot of the page.

E-mail this story to a friend

Links to more UK stories