By Mark Ward
BBC News Online technology correspondent
Good news for spammers, the smart filtering software used to catch spam can be beaten.
Putting random words in spam is becoming popular
With a little ingenuity it is possible to create messages that get past anti-spam filters every single time.
The discovery has been made by anti-spam researcher John Graham-Cumming who studies the novel ways spammers try to defeat the technologies used to stop junk mail.
The bad news for spammers is that this flaw in filtering systems is not easy to exploit and can be combated.
Ham versus spam
If you have an e-mail address you will know about spam and the longer you have had that address the more spam you will get. It is estimated that 60% of all messages sent are now spam.
To cut out the junk, many e-mail users have turned to a technology known as Bayesian filtering to spot and stop spam before it reaches their in-box.
When trained to spot what is spam and what is legitimate mail these smart filters can catch, in many cases, more than 99% of junk messages.
The smart filtering has been so successful that it has already forced a change in the way spam messages are written.
INTERNET MAIL SPAM FIGURES
60% - January 2004
58% - December 2003
56% - November 2003
52% - October 2003
54% - September 2003
50% - August 2003
50% - July 2003
49% - June 2003
48% - May 2003
46% - April 2003
45% - March 2003
42% - February 2003
Random words are being added to some messages specifically to fool the filters.
"They are looking for things that are not spammy," said Mr Graham-Cumming, "words that outweigh the spamminess of the message."
This is the reason that many spam messages feature rarely used words such as "formic", "brouhaha", "granitic" and "occlusive".
Thankfully, it does not work.
"It's a completely ineffective technique," he said.
But Mr Graham-Cumming, who is a member of the Sophos Anti-Spam Task Force, has found a way to beat Bayesian filters that guarantees a message will get through every time.
He was prompted to investigate the weaknesses of Bayesian filters because, although he uses them himself, some messages still get through.
Viagra is hawked by many spammers
To find out how to beat the filters Mr Graham-Cumming sent himself the same message 10,000 times but to each one added a fixed number of random words.
When a message got through he trained an "evil" filter that helped to tune the perfect collection of additional words.
Soon he had generated a short list of words that, if added to a spam message, would guarantee its safe passage into his inbox.
"The actual words it found were a total surprise," said Mr Graham-Cumming.
The list included words such as "Berkshire", "Marriott", "wireless", "touch" and "comment".
Including just one of these words convinced Mr Graham-Cumming's real spam filter that a message was ham rather than spam.
My Graham-Cumming said defending against spam that uses these words would be very difficult because the words are tied to a person's job and lifestyle.
But, he said, the good news is that the technique to discover these trigger words is very time consuming.
He had to send himself thousands of copies of the same message each one holding an encoded chunk of HTML that reported back to him when it got past the filter.
These HTML bugs can be thwarted by turning off the preview pane in e-mail.
And, he said, this would have to be repeated for every person a spammer wanted to reach because they would all have a different list of key words.
But, he said, although sending thousands of messages to one person would be counter-productive, sending fewer to a larger pool of people, such as all the staff at a business, might produce some key words for that business and help a spammer get their junk mail through.