Millions of Spams Filtered To-Date  
   Sign Up Now     |  FAQ/Questions  |  My Account   
Home - About Spam - About Spamming Techniques
Login:
 User ID:
 
 Password:
 

Not signed up yet? Sign up now for no-risk, FREE 30 day trial!
 
   Pricing
   Corporate
   Support
   Technology
   Licensing
   About PrismEmail
   About VIS
   Contact Us
  About Spamming Techniques  
 

Very few people want spam, fewer read it, fewer respond to the spam by visiting an spamvertised website, and far fewer actually buy something. Since spam is so ineffective spammers must make every effort to get their spam into your inbox. If the spam doesn't get to your inbox then there's certainly no chance you'll open it.

More and more people are using filters, such as PrismEmail, to block spam. Despite this, spammers seem to think that if they can get their spam past the filters you are likely to buy from them. This is, of course, a dubious assumption since it seems that those that specifically filter spam are those that are least likely to purchase from the spammer. Nonetheless, spam filtering is essentially an arms race between the spammers and spam filters such as PrismEmail.

This page will review some of the tactics that spammers use to try to get past filters and what the filters do in response.

  Word Substitution  
 

Years ago, spam filtering was pretty easy. Certain words appeared in spam often but virtually never appeared in legitimate email. For example, the phrase "limited time offer" might often appear in spam but virtually never in a valid email. As a result, spam filters would simply look for certain suspect phrases and discard the message if it found them. Spam filters grew more effective by adding more and more phrases and words to their database.

Of course, spammers just started substituting silly words for the words they knew would be in many spam filters. That's why you see spam with messages such as "V!agra" instead of "Viagra"--they assume that the word Viagra may be in a spam filter, but V!agra might not be. Of course, the immediate problem for the spammer is that a message filled with mis-spelled, mangled words looks a lot less professional and trustworthy so doing this drives down their response rate even further.

This approach is still used by spammers and can be somewhat effective depending on the spam filter being used. There are so many ways you can mangle the word "Viagra" that it is probable that some variations of the word will get by spam filters that do nothing more than scan the spam for known suspicious words. This technique is not very useful against modern spam filters, however--especially Bayesian filters such as those used here at PrismEmail. After all, an email that contains the word "Viagra" may or may not be spam, but an email that contains the word "V!agra" is almost definitely spam--so the fact that a spammer tries to mangle the words in the email actually can make it easier to realize the message is spam.

 
  HTML Encoding  
 

When the word substitution became difficult for the spammers, they started using technical means to try to use the words they wanted to use without being detected by spam filters. HTML provides a way to encode characters such that the characters themselves don't appear in the message and spammers used this to their advantage. For example, Viagra can be written in an email as %56%69%61%67%72%61. When you read the email with your email program it will be displayed as "Viagra" but a simple spam filter looking for the word "Viagra" won't find it since it is encoded with numbers.

As it turns out, this was just spammers taking advantage of very simple spam filters. Since this hadn't been done before spam filters hadn't been developed to handle the number-encoded messages so the spam got through. Of course, as soon as this type of spam was noticed the spam filters simply improved their software to be able to decode the numbers and filter normally.

Many spammers still use this approach to try to get by spam filters, but it's not clear why they bother. This technique is completely useless against any modern spam filter since they all are capable of decoding these types of messages and filtering normally.
 
  HTML Comments  
 

In the spirit of the HTML Encoding technique just explained, spammers found yet another way to use HTML to their advantage by using HTML comments. HTML comments allow comments to be embedded in an HTML page or email withoutbeing displayed to the user. This is often used by legitimate developers to document their webpage so that other developers can understand what the original developer did. For example, the comment <!- This is a comment -> can be embedded anywhere in an HTML page and the user will not normally see it.

Spammers use this by trying to break up suspicious words with HTML comments. For example, instead of writing the word "Viagra" they may write "Vi<!- useless comment ->agra". Again, as is the case with HTML encoding, the user will see this as "Viagra" but a simple spam filter that is not capable of dealing with HTML comments will not filter this since it will not see "Viagra" as a continuous string of characters.

Like HTML encoding, it's not clear why spammers still bother with this. All modern spam filters completely ignore HTML comments so there is no benefit for spammers to use them. Unfortunately, spammers often use an insane number of comments such that the size of a spam message can be doubled just because they insert a comment in the middle of every word. And, like HTML encoding, the very presence of HTML comments embedded within words is often a very good indication that the message is spam. So, once again, the spammers' efforts to get past the filter actually makes it easier to detect them.

 
  Random Variations  
 

Years ago, some spam filters tried to catalog each and every spam that was received. When someone reported a spam that message was saved in a database so that if anyone else received that same message it would automatically be discarded as spam. This worked on the assumption (that used to be valid) that each of the millions of spam contained almost the exact same body with no changes whatsoever. That being the case it was not terribly difficult to compare a new message to see if it was very similar to a spam that someone else had already reported.

When these spam filters became popular, spam software evolved to produce unique messages. The spam software would insert random garbage words in various places throughout the message and in the subject. Thus it is very common to see "Buy Viagra here 3d3fdsas" where there is some random-looking garbage in the subject and scattered throughout the body. The spammers do this so that software that analyzes each message will not be able to realize it is the same message as spam that has already been reported since there will be sufficient random words to make the system think that it's an entirely different message.

This approach to spam filtering isn't very common anymore. Since spammers make each message different it's difficult to use this approach to detect spam which makes the approach less effective. As such, few systems use this kind of filtering. Since few systems use this technique it is strange that spammers still insert random garbage words in the body or subject. Most spam filters ignore such garbage completely. Still others are smart enough to realize that these garbage words are a pretty good indicator of spam, so some spam filters actually are able to detect that a message is probably spam based on the presence of these words.

 
  Dictionary Word Inclusion  
 

The newest approach to filtering spam is the Bayesian filter which is described more fully here. This uses a statistical approach to spam filtering such that each word in an email is counted to determine how often it appears in good email and how often it appears in spam. The word "Viagra" might appear in 50 spams but only 1 good email--as such, the presence of the word "Viagra" is a pretty good indication that the message is spam. When this information is combined with the probabilities of other words being "spammy" or not it is possible to calculate the probability that a given message is spam.

This approach is one of the most effective that has ever been used to fight spam, is available here at PrismEmail, and is the approach to spam filtering we advocate.

Spammers are just starting to try to get around Bayesian filters. Unfortunately for them, it is unlikely they will. Please see this discussion for a full explanation, but Bayesian filtering is such that the statistics are different for each user. That means that for a message to get through the spammer needs to use words that are commonly used in your good email and not used in spam. This is very, very difficult if the spammer doesn't have a large sample of your good email and spam.

Some spammers apparently believe that if they insert random words from the dictionary that are not usually used in spam that they will have a better chance of getting past a Bayesian filter. Fortunately for us, this doesn't usually work. They tend to insert random words from the dictionary, such as "political," "democracy," "nation," etc. Presumably they believe that since most spam won't use these words that using them will get them past a Bayesian filter. It will not. The spammer would have to use distinctive words that are very specific to you. For example, if you have a friend named Thomas then that word would help a spammer get passed a Bayesian filter-- but it would only help them get past your Bayesian filter. The word Thomas wouldn't help them get past someone's Bayesian filter unless they talked a lot about someone named Thomas.

Since spammers don't know what words are truly innocent for you (such as "Thomas") it is unlikely that a brute dictionary attack will work. We've received spam that had entire sections of the U.S. Constitution embedded in the spam to try to get past the Bayesian filter and even so Bayesian was able to realize it was spam.

 
     
TRY 1 MONTH FOR FREE!
Just pick a Prism account name, a password, and provide your POP3 information, and you can start receiving spam-free email within minutes.
The Spam Problem
With the good comes the bad. Today, our email inboxes are clogged with unwanted, unsolicited emails. In the early days, spam wasn't a problem. It is today! PrismEmail can help.
ISPs and our system
Most "normal" ISP email accounts and email clients will work with our service.
Privacy and Security is key
We know that an anti-spam service that helps to improve your privacy is of no use if the service itself were to abuse your privacy. In addition, our service was designed from the ground-up to ensure that no customer information remains on any Internet-accessible server.
Spam News
Find the latest information about spam in the news and the Internet's battle against it.
© Copyright 2002 - 2005 by Vault Information Services LLC. All Rights Reserved.
Service and information provided "as-is" without warranty. Please see terms of serivce.
Liability in no case will exceed amount paid for service.