BBC Homepage World Services BBC News BBC Sport BBC Education
BBC News Front Page | Programmes 

Black Holes in Cyberspace - The Invisible web

The tv series>>
Michael Lewis: Profile>>
Golden oldies>>
I spy>>
A slice of the web>>
The invisible web>>

By the BBC's Annabel Colley and Matthew McDonnell

The internet is a fantastic search tool that has changed people's lives, but its vast potential remains untapped for most of us.

Incredibly, the total amount of information stored on the internet is about 500 times greater than what is accessible using search engines like Google or Hotbot.

For every one billion "visible" web pages, there are another 500 billion hidden below. It is all legal publically available information - it is just a case of finding it.

The internet company, Brightplanet estimate that information in the invisible web is growing ten per cent faster than information on the commonly searchable web.

Brightplanet carried out the first big study into the invisible web, or the "deep web" as they prefer to call it. Download deepwebwhitepaper.pdf And they concluded that the information it contains is often highly relevant.

According to Brightplanet: "Information held in the deep web is up to two thousand times better quality than the information easily retrieved by the search engines from the 'surface web'."

So why can we not search this invisible web with the search engines?

The missing links

Search engines use programs called spiders that 'crawl' the web, skimming between web pages via text links.

As they go, they index any text and code they come across, but they miss out on a vast reservoir of valuable, interconnected databases and documents.

That is because a lot of the best quality information is held in subject databases and the search engines cannot get into these.

It may be because a password is needed to get at these databases.

Or it could be that you need to register to get at the information which is often free of charge, but inaccessible unless you have signed up.

It could also be that the search engine does not recognise code contained within the database. This means that using most search engines, you are likely to miss out on a whole host of information such as:

  • Images and audio files
  • Flash or other animated pages
  • Non HTML files like Word or Excel files
  • Newsgroup and chat room posts

    Chris Sherman is co-author with Gary Price, of "The Invisible Web, Uncovering information search engines can't see"*.

    Chris explains: "When an indexing spider comes across a database, it's as if it has run smack into the entrance of a massive library with securely bolted doors.

    "Spiders can record the library's address, but can tell you nothing about the books, magazines or other documents it contains."

    Finding the hidden treasure

    Paul PedleyPaul Pedley is head of research at the Economist Intelligence Unit in London. He runs courses for the Association for Information Management - ASLIB:

    He teaches researchers how to retrieve information held in the invisible web and has written a book on the subject - "The Invisible Web"*. He has seen an unprecedented demand for his course this year.

    "As far as the Invisible Web is concerned, there is not one handy solution", Paul Pedley.  real 56k

    Search engines trying to keep up

    The general search engines, like Google, remain essential searching tools so long as you know that they will retrieve only a very small percentage of information on the internet.

    Many of the search engines are aware of the problem and Google is among those introducing specialist tools that will search newsgroups, PDF documents and images.

    But they have trouble keeping up. The internet grows at the rate of 7.3million new pages a day - and the deep web or invisible web is the fastest growing area.

    Information registered with search engines often takes up to six weeks to start appearing in searches.

    "We're basically expanding people's brains with all the world's information", Sergey Brin, Co-founder of Google  real 56k

    There is a growing number of specialised invisible web tools which will take your query and run it on thousands of online databases.

    These include:

  • Copernic:
  • Brightplanet's lexibot:

    The right tool for the job

    The future may be digital, but it seems that one solution to the invisible web is to return to relying on human searching behaviour rather than on computers.

    As a searcher you should think hard about your behaviour and perhaps develop a new approach to online searching.

    Use the right tool for the job. Do not use a general search engine when you know there is one that specialises in a specific subject, for example the law, newspaper articles or photographs.

    Gary Price, one of the world's leading experts on the invisible web, explains why it pays to know your specialist sources.

    "A good librarian would not start looking for a phone number by searching the Encyclopaedia Britannica" says Price.

    "Both professional and casual searchers should at least be aware that they could be missing some information or wasting time finding what could be found more easily".

    "This is very similar to a good reference librarian knowing the major reference tools in his or her collection."

    Knowing your sources

    So how do you get to know about new sources that may be hidden in the invisible web?

    Librarians want you to use their expertise. Some of the more savvy ones are now re-marketing themselves as freelance information professionals or online information brokers.

    But there are just as many free newsletters and mailing lists, sometimes compiled by information professionals, to keep you updated on internet sources that the search engines may not always find. Try one of these:

  • Resource Shelf
  • Freepint
  • Research Buzz

    *The Invisible Web
    by Paul Pedley, published June 2001 by Aslib publications.
    July The Invisible Web.

    *The Invisible web: uncovering information search engines can't see
    by Chris Sherman and Gary Price (to be published in the USA by Cyber Age books in July)

  • Internet links: the BBC is not responsible for the content of external internet sites.


     Back to top