Europe South Asia Asia Pacific Americas Middle East Africa BBC Homepage World Service Education



Front Page

World

UK

UK Politics

Business

Sci/Tech

Health

Education

Sport

Entertainment

Talking Point

In Depth

On Air

Archive
Feedback
Low Graphics
Help

Wednesday, July 7, 1999 Published at 18:05 GMT 19:05 UK


Sci/Tech

Web engines could do better

How to find your target in 800 million pages

The performance of search engines used to find material on the Web is deteriorating, new research has suggested.

A study conducted by Dr Steve Lawrence and Dr C. Lee Giles shows that no engine is now logging more than about 16% of the publicly indexable Web.

The estimated combined coverage of the 11 engines used in the study is 335 million pages, or 42% of the estimated total number of pages.

This performance is "substantially" worse than when the researchers last did a survey in December 1997.

The researchers also accuse the engines of a bias towards US Websites and sites that have more links to them - in other words, the more 'popular' sites. The engines are also more likely to index commercial sites than educational sites, they say.

Lawrence and Giles believe this bias has damaging and divisive consequences.

Unecessary duplication

"Search engine indexing and ranking may have economic, social, political, and scientific effects," they say.

"For example, indexing and ranking of online stores can substantially effect economic viability; delayed indexing of scientific research can lead to the duplication of work or slower progress; and delayed or biased indexing may affect social or political decisions."

Lawrence and Giles, of NEC Research in Princeton, New Jersey, publish their research in the latest edition of the science journal Nature.

They estimate there are now around 800 million pages on the Web, encompassing about 15 terabytes of data (about 6 terabytes of textual content, after removing HTML tags, comments, and extra whitespace); it also contains about 180 million images (three terabytes).

About 83% of sites contain commercial content and 6% contain scientific/educational content. Only 1.5% of sites contain pornographic content.

Multiple searchers

The researchers say that greater attention should be paid to the accessibility of information on the web, in order to minimise unequal access to information, and maximise the benefits of the Web for society.

Because the overlap between the engines remains relatively low, the men recommend using metasearch engines such as MetaCrawler, which combine the results of multiple searches.

The estimated combined coverage of the engines used in the study is 335 million pages, or 42% of the estimated total number of pages. Hence a substantial improvement in web coverage can be obtained using metasearch engines such as MetaCrawler, which combine the results of multiple searches."

The study found the Northern Light search engine to have the greatest coverage.



Advanced options | Search tips




Back to top | BBC News Home | BBC Homepage |


Sci/Tech Contents

Internet Links


Nature

NEC Research Institute

Northern Light


The BBC is not responsible for the content of external internet sites.




In this section

World's smallest transistor

Scientists join forces to study Arctic ozone

Mathematicians crack big puzzle

From Business
The growing threat of internet fraud

Who watches the pilots?

From Health
Cold 'cure' comes one step closer