|
|
- How does Bayesian filtering work?
Bayesian filtering is a statistical approach to spam filtering.
It allows the system to calculate a spam probability for
every email you receive. This probability is calculated
on past email you have received as well as past spam and
is calculated on a user-by-user basis so only your
past email is used to predict whether an incoming message
is spam. After all, what someone else might consider legitimate
you might consider spam. You may click
here for a more complete explanation of Bayesian filtering.
- Does Bayesian keep my emails forever for statistical purposes?
No. Your email is constantly purged from our system within 1-2 days of you
downloading your mail from us. When Bayesian statistics is enabled, the system
will maintain a statistical file that counts how many times each word has appeared
in your real email and in spam. That is all Bayesian needs to know to work. There
is no way to determine the contents of email messages once they have been purged.
All the Bayesian system will keep is statistical information that indicates how
many good and bad messages contained each word.
- Is Bayesian filtering complicated
for the user?
No. Bayesian filtering is very easy to use. The only thing you have to do is report any spam that gets through
the filter and also report any email that is incorrectly deemed to be spam. As long as you report these incidents
your Bayesian filter will improve in accuracy and you'll have fewer reports to make.
- How effective is Bayesian filtering?
According to Paul
Graham, Bayesian filtering can achieve a 99.5% success
rate. This means only 5 of every 1000 spams get through.
If you receive 20 spams per day, Bayesian filtering should
be able to reduce this to one spam every 10 days or so.
- Should I use other filters with
Bayesian filtering?
We suggest that you enable as many filters as possible when you first start using Bayesian filtering. Because
your Bayesian filter must first accumulate a history of good email and spam, Bayesian filtering will not activate
the moment you select it. Rather, the filter will start accumulating statistics immediately and will implement
Bayesian filtering when it has sufficient statistics to do so. You will receive an email to notify you when
Bayesian filtering starts operating. Until then, enabling the other filters will allow the system to catch
most of your spam which itself will help teach your Bayesian filter. Once the Bayesian filter is operating you
may still want to leave the other filters enabled since we may be able to help you filter out known spam websites
that may not be part of your Bayesian statistics.
- Will Bayesian filters block legitimate
email?
While no filtering method is 100% perfect, Bayesian filtering should be the most accurate at catching spam while
letting real email through. When you first start using Bayesian filtering it is possible that some spam will
get through and that, possibly, some valid email will be blocked. This is why you must report it to the automated
system when it happens--so the Bayesian filter can adjust accordingly. As you continue to use the Bayesian filter,
however, more and more spam will be caught and fewer and fewer real emails will be blocked. Of course, you may
wish to establish whitelists in the event you see certain messages from certain people being blocked. This will
also help the Bayesian filter learn that emails from these sources are not spam.
- I'm using the Bayesian filter, but it doesn't seem very accurate. Why?
Remember that when you first enable the Bayesian filter all that will happen is that
the filter will start to collect statistics. The Bayesian filter can't operate until it has
sufficient statistical information on which to base its filtering decisions. By default, the
Bayesian filter will start filtering when it has at least 100 good messages and 100 spam
messages in its statistics file. Thereafter, the Bayesian filter will become more and more
accurate as more and more statistics are generated. The Bayesian filter will start out
moderately accurate and improve from there. With statistics for 1000 good messsages and 4000
spam messages we have seen the Bayesian filter catch 99.8% of the spam received. So give the
Bayesian filter time--if you report missed spam as well as false positives you will
see accuracy improve drastically.
|