[an error occurred while processing this directive]
BBC News
watch One-Minute World News
Last Updated: Monday, 26 May 2003, 09:43 GMT 10:43 UK
How to spot and stop spam
By Mark Ward
BBC News Online technology correspondent

Screengrab of spam in mail box, BBC

Unsolicited e-mails now infuriatingly clutter many inboxes, just as paper junk mail buried many a front door mat. But is smart technology set to save us from spam?

To us humans, spam is very easy to spot.

Unfortunately to your computer one e-mail message looks very like another.

Without help it will see nothing special about the formatting in junk mail to distinguish it from the stuff you want to read.

Many anti-spam programs work by scanning e-mail messages for the keywords that spammers use, but your genuine friends tend to avoid.

Word list

But the spammers know this and use lots of tricks - some clever, some obvious - to fool the keyword spotters.

This explains the strangled spelling, strange spacing and replacement of some letters with numbers in words that the anti-spam programs are looking for.

"If you look at spam people hardly ever write the word Viagra anymore," says Paul Graham, a US software guru who has spent a lot of time studying junk e-mail.

Viagra tablets, PA
Viagra often spelled V-l-a-g-r-a online
The tricks spammers use mean that keyword filters will only ever be able to stop a small proportion of spam.

They will always catch the obvious ones but, if the list of keywords is too large, they start stopping real mail too.

Mr Graham thinks that for many users an anti-spam system that stopped legitimate mail was far worse than one that let all the proper mail through plus a bit of junk.

"You definitely want to err on the side of conservatism," he says.

To do a better job of spotting spam, Mr Graham came up with a different technique that means he hardly ever sees junk mail anymore. "For me and all my friends spam is a solved problem."

Gone for good

The technique goes by the formidable name of Bayesian Filtering and uses probability to work out if a mail is junk or real.

Current versions are 99.7% accurate at spotting. Other Bayesian filters, such as CRM114, do an even better job.

Paul Graham, Sarah Harlin
Paul Graham: spam scourge
This means that Mr Graham sees a couple of spams per week, instead of up to 150 every day without the filter.

The system is based around a huge corpus of junk and spam mails that Mr Graham gathered over a few months.

These thousands of messages have been statistically analysed to extract the top 15 features that define them as spam.

Any incoming mail is scanned to see how many of these defining characteristics it possesses.

The list of defining features includes some words, such as "teens", but others were less obvious and include formatting codes and routing information found in e-mail headers.

Money maker

Mr Graham believes widespread use of Bayesian filters could destroy the spammers' business model.

The sheer number of spam mail sent means that even tiny response rates, reportedly 0.0001%, means junk mailers turn a profit.

"I think filtering 90% will probably be enough to do it," he said, "that would increase their costs by a factor of 10," says Mr Graham.

Monty Python cartoon, BBC/Python Pictures
A Monty Python sketch inspired the use of the word spam for junk mail
"Spammers are not really committed to being in the direct mail business."

Others are not so sure that the spammers will ever stop.

"It is like an arms race where the spammers come up with new tricks and people come up with a new way to detect them," says James Key, technology head at anti-spam firm Blackspider Technologies.

Mr Kay believes a combination of technology and legislation to make spamming illegal will be needed to beat back the tide of junk.

Certainly spammers must feel under siege at the moment.

US states are passing laws that outlaw spam, net service firms are filing lawsuits and installing basic filters. Some are even adopting Bayesian filters to spot the most obvious spam.

Who knows, one day soon spam might only ever be associated with processed meat.




SEE ALSO
Government to crack down on spam
06 May 03 |  Technology
Virginia boosts anti-spam laws
30 Apr 03 |  Americas
Spammers and virus writers unite
30 Apr 03 |  Technology
Net giants take on spam
28 Apr 03 |  Technology
Where spam comes from
24 Apr 03 |  Technology
AOL targets spam e-mails
15 Apr 03 |  Technology

RELATED INTERNET LINKS
The BBC is not responsible for the content of external internet sites



FEATURES, VIEWS, ANALYSIS
Has China's housing bubble burst?
How the world's oldest clove tree defied an empire
Why Royal Ballet principal Sergei Polunin quit

PRODUCTS & SERVICES

Americas Africa Europe Middle East South Asia Asia Pacific