Millions of Spams Filtered To-Date  
   Sign Up Now     |  FAQ/Questions  |  My Account   
Home - Technology - Bayesisn Filtering
Login:
 User ID:
 
 Password:
 

Not signed up yet? Sign up now for no-risk, FREE 30 day trial!
 
   Pricing
   Corporate
   Support
   Technology
   Licensing
   About PrismEmail
   About VIS
   Contact Us
  Bayesian Filtering  
 

PrismEmail offers Bayesian Spam Filtering. Bayesian filtering is a method of filtering spam using statistics. This approach is up to 99.5% effective. This means that if you currently receive 20 spams per day, Bayesian can reduce that to one spam every 10 days or so. Bayesian filtering "learns" from the good email and spam you receive. That means that, over time, your Bayesian filter will actually become more effective and you'll receive less and less spam.

To help your Bayesian filter learn correctly, the user has two responsibilities:

  1. Report spam. If any spam gets through, the user should report it. This lets the Bayesian filter learn from its mistake and make it better able to detect similar spam in the future.
  2. Report false positives. If any valid email is incorrectly flagged as spam, the user needs to report it to the system so that the Bayesian filter learns how to identify good email.
PrismEmail makes it very easy to report either of these two types of incidents. However, as time goes on, the Bayesian filter will become more and more accurate and it will become less and less common for the user to have to make these reports.

As a user, that's all you need to know. However, if you'd like to know more about how Bayesian filtering works, feel free to read the rest of this page. We will explain this approach to spam filtering here in an easy-to-read and understandable format. Those who are interested in a very thorough and complete technical explanation of how this works are invited to visit Paul Graham's site. Mr. Graham has an excellent site with a large amount of space dedicated to explaining the how's and why's of this approach to spam filtering.

THE BAYESIAN METHOD

The Bayesian approach is actually quite simple: It calculates the probability that a given message is spam or not based on the contents of that message and based on the contents of past messages and past spam that you have received. This is not based on a reactive filter as is the case with traditional approaches. It uses past good email and past spam that you have received as a predictor to determine whether a new message is probably spam or not.

There's a few numbers and percentages in the next few paragraphs. Don't let it scare you. It doesn't get mathematical, it just uses some numbers to illustrate a point.

Let's say you have received 1000 good emails and 1000 spams. The word "click" appeared in 35 of the good emails and appeared in 750 of the spam emails. While we won't go into the mathematics behind it, Bayesian statistics tells us that if the word click appears in 35 of 1000 good emails and in 750 of 1000 spam emails, then the presence of the word "click" means that the given message has a 95.54% chance of being spam.

Further, let's say that the word "sex" has a 98.62% chance of being spam. Given no other information about the email, if a future message contains both the word "click" and the word "sex" Bayesian statistics tells us that there is a 99.93% chance that that message is spam.

But what if these words were in a message between you and a friend named Tom and were part of the phrase: "The cat's sex is male. Getting medicine for him is one click away on some pet website. Do you want to buy it, Tom?" This is an innocent message, not spam. But according to what we just said there's a 99.93% chance that it's spam. Not quite.

Bayesian spam filtering doesn't just consider the "bad" words to determine spam, it also considers the good words. For example, maybe 40 of 1000 of your good emails contain the word "Tom" (since you often receive email from him) but only 3 spams of 1000 contained Tom's name. Given that information on the word "Tom", Bayesian statistics tells us that if a message contains the word "Tom" that there's only a 6.98% chance that that message is spam. Further, let's say that no spam ever contained the word "cat" and a few of your good emails did. So we consider that "cat" only has a 1% probability of being spam.

Bayesian tells us that if the message contains the words "click", "sex", and "Tom" and "cat" that the spam probability is only 53.8%. Of course, in reality we don't just consider the two best and two worst words. We consider all the unusually good and bad words in the message. Using the Bayesian approach, a spam message usually has a very high probability of being spam. That is to say, most spams usually rank between 90% and 99%. Very few rank lower. So with Bayesian filtering, we can say that "Any message with a probability of over 90% is spam, anything else is good." With such an approach a message like the one above would get through. Almost all spam, however, would not have made it through since it would probably not contain the words "Tom" and "cat" which reduced the spam probability for that message.

BAYESIAN "LEARNS" AUTOMATICALLY FOR EACH USER

The beauty of the Bayesian approach is that it "learns" automatically, and does so for each user independently. The spam probability for the word "Tom" might be 6.98% if you know someone named Tom, but if you don't know anyone named Tom the probability might be 98%. It learns based on your email.

This also means you don't have to tell the system the name of everyone you know. The system, over time, will automatically detect those words that are normally part of good email and will also detect those words and features that are normally an indication of spam.

HOW IT WORKS AT PRISMEMAIL

The above system is handled automatically by PrismEmail , if you wish. The Bayesian filter is optional, although for best results we highly recommend you use it.

Using the Bayesian filter does require a small amount of responsibility on your part as the user. Since the Bayesian filter learns, it must be told if it makes a mistake so that the same mistake isn't made in the future. That means that if a spam gets through to your inbox you must report that message as spam so that the Bayesian filter can do the proper statistics in the future. Likewise, if a message is caught as spam you must tell PrismEmail that it wasn't spam so that the Bayesian filter can learn from that. It's very important that the user report missed spam and also report email that was incorrectly deemed to be spam. Failing to do this will cause the Bayesian filters to "learn" incorrectly leading to poor performance.

Every message you download from PrismEmail has a link in the message headers to report that message as spam. If you receive a message that is spam and has made it through to you, just click that link in the message header. That's all it takes to report the message as spam and have the Bayesian filter learn accordingly.

Likewise, if you notice a message was caught as spam then you can either login to your PrismEmail account at this website and mark the mail for downloading, or you may wait until you receive the spam summary message once per day that indicates all the spam captured in the last 24 hours--if you click the link to download one of those captured messages, the Bayesian filter will assume it got it wrong and adjust accordingly.

The good news is that you will have to make these corrections less and less often the more you use it. Once your Bayesian filter starts to get tuned, it will automatically be able to detect almost all spam--and the new tricks and new words that spammers start using will also be noticed by Bayesian and included in the statistics. So even if spammers start using new techniques to try to avoid spam filters, your Bayesian filter will probably adapt to that quickly.

In effect, you just need to give the Bayesian filter a little help in the beginning. Once you give it a few pushes in the right direction you will find that Bayesian actually starts teaching itself about the characteristics of your good email and of spam without you having to manually report it.

PERFORMANCE IMPROVES OVER TIME

Your Bayesian filter will improve over time. In fact, when you first start using the Bayesian filter, PrismEmail won't use the Bayesian approach to filter your email. That's because some amount of good email and spam history must be accumulated on which to base the decisions. When you first start there won't be any history so no decisions can be made. During this time we suggest you use the traditional filters offered by PrismEmail which will probably catch 90% of your spam. You should report those spams that get through which will cause your Bayesian filter to be tuned appropriately. When there is sufficient statistical information collected, PrismEmail will start using the Bayesian filter to filter out spam.

Over time, a history of both good email and spam mail will be built based on the email you receive. As you report errors to PrismEmail and as more history is generated, the performance of the Bayesian filter will improve. According to Paul Graham, a finely-tuned Bayesian filter can filter up to 99.5% of all spam with no false positives. That means if you are receiving 20 spams per day right now, with a finely tuned Bayesian filter your spam should drop to about one spam every 10 days or so.

BENEFITS OF BAYESIAN FILTERING

To you, the email user, the biggest benefit you'll see is a drastic reduction in the amount of spam you receive. As just mentioned, instead of receiving 20 spams per day you might receive one spam every 10 days or so. What a relief!

Another benefit is that every user's Bayesian filter is "tuned" to that user's email. It's not one set of traditional filters for everyone where if the spammer can find a new way to word his email he can sneak it through the filters. Since everyone has a differently tuned Bayesian filter it's almost impossible for a spammer to prepare a spam that will get through a significant number of differently tuned Bayesian filters.

Paul Graham also talks about the possible results of the widespread use of Bayesian filters. Less spam will reach users which means a lower response rate for spammers. This means less profits for spammers and, in turn, a lower motivation to spam in the first place. Even if there isn't widespread use of Bayesian filters, the immediate benefit to those users who use it is clear: Less spam.

 
     
TRY 1 MONTH FOR FREE!
Just pick a Prism account name, a password, and provide your POP3 information, and you can start receiving spam-free email within minutes.
The Spam Problem
With the good comes the bad. Today, our email inboxes are clogged with unwanted, unsolicited emails. In the early days, spam wasn't a problem. It is today! PrismEmail can help.
ISPs and our system
Most "normal" ISP email accounts and email clients will work with our service.
Privacy and Security is key
We know that an anti-spam service that helps to improve your privacy is of no use if the service itself were to abuse your privacy. In addition, our service was designed from the ground-up to ensure that no customer information remains on any Internet-accessible server.
Spam News
Find the latest information about spam in the news and the Internet's battle against it.
© Copyright 2002 - 2005 by Vault Information Services LLC. All Rights Reserved.
Service and information provided "as-is" without warranty. Please see terms of serivce.
Liability in no case will exceed amount paid for service.