[an error occurred while processing this directive]
BBC News
watch One-Minute World News
Last Updated: Tuesday, 8 June, 2004, 09:28 GMT 10:28 UK
Inside the Google search machine
By Mark Ward
BBC News Online technology correspondent

If anyone knows how to get their webpage to top Google's search results it is Matt Cutts.

Google logo
Google: Best known for its search page
Mr Cutts is one of a team at Google who help webmasters and website creators tweak their pages to ensure they are properly indexed by the search engine.

But ironically, says Mr Cutts, he does not have an extensive personal web presence that can take advantage of this insider knowledge.

All he has is a few pages dating from his college days that he says he does not regularly update.

Though, it must be said, they do appear top of any search for the name "Matt Cutts" on Google.

Search here

Mr Cutts says that Google works hard to ensure that most of the problems that webmasters encounter can be solved automatically via its help pages or using the tools it provides.

Given the huge number of webpages out there in cyberspace, Google indexes more than 4.2 billion, it is the only approach that will work.

"We have a philosophy of trying to develop things scalably," Mr Cutts told BBC News Online.

The reason it can do this is because of the huge technical resources that Google has built up since it started.

In 2003 Google spent $173m on its data centres and is expecting to spend about $250m in 2004.

Google staff play roller hockey
It is not all work at Google
Although Google's senior technology folk have filed papers about how it does what it does, it has been reluctant to say just how many servers it owns and operates.

The estimates of how many machines it has in its datacentres range from 10,000 to 80,000.

This concentration of computer power could be addressing more than 6,000 terabytes of data.

In contrast to most other net firms, Google does not rely on these machines being reliable and all are based around cheap and easy to replace PC chips.

"The model of having a lot of machines and have them fail is a very powerful one," says Mr Cutts. "You have a small team replacing hard drives and it never affects the index."

Instead, he says, Google uses software to keep its search system reliable.

Google used to update its web index every month which, because it caused results to jump around a little, was dubbed the Google Dance.

But not anymore, says Mr Cutts.

"Within the last year we have improved out way of processing and indexing the web," he says. "You are not going to see Google dances."

"Now we crawl a percentage of the web everyday," he says, "so after a relatively small time frame we hit every page."

Bombs away

Google does not just have one copy of the entire web, it has several to help with reliability and ensure results are returned quickly.

Gmail logo from Google website www.gmail.com
Gmail is Google's webmail service
Also, says Mr Cutts, there are quite a few Googlers, as staff are called, that keep an eye on its web index and make sure it is accurate.

Even the software at the heart of the search engine is regularly tweaked to ensure that results are relevant.

"We work on algorithmic solutions to scalably handle problems," he says. "We look at ways not just solutions for particular incidents but entire classes of problems."

"You do not have to worry about the bias of the computer. It's a fair and equitable way to tackle it."

Attempts to catch out the indexing system and force results to the top of returned results, called Google bombs, only work on a very small scale, says Mr Cutts.

Even blogs, which tend to refer to each other a lot, do not trouble the indexing system.

"Blogs are not so much of a problem," says Mr Cutts. "They show up less often than you expect."

In some respects, running the search system is just a preparation for everything else Google wants to do.

"Once you have thousands of machines with all these capabilities it's a lot of fun to see what else you can do with them," he says.


RELATED INTERNET LINKS:
The BBC is not responsible for the content of external internet sites


PRODUCTS AND SERVICES

News Front Page | Africa | Americas | Asia-Pacific | Europe | Middle East | South Asia
UK | Business | Entertainment | Science/Nature | Technology | Health
Have Your Say | In Pictures | Week at a Glance | Country Profiles | In Depth | Programmes
Americas Africa Europe Middle East South Asia Asia Pacific