Spam Identification
Spam identification programs use a series of heuristics and algorithms to identify and classify spam. Once identified, they tag the spam using a variety of methods, leaving the filtering/rejection/deletion up to the end user. In some instances, the service provider will take care of the initial filtering and leave the rest up to the end user.
Heuristic Identification
Heuristic identification is generally accomplished by matching a set of rules against the suspected spam mail. Rules based identification can be cpu intensive since this entails parsing the message multiple times to identify matches. In addition, rules need to be updated on a regular basis as spammers evolve their messages to slip through.
Spamassassin is the most widely known spam prevention program that uses heuristic identification to identify spam.
Algorithmic Identification
Algorithmic identification is accomplished through the use of carefully crafted algorithms. These algorithms attempt to identify word constructs, sentence structure, and other patterns throughout the message. Algorithmic identification works well when trained properly, and when the algorithms are correct. Spammers generally try to use poisoning techniques to make these techniques less effective.
Bayesian filtering is probably the most widely recognized algorithmic identification technique for spam. Other techniques include Neural Networks, Genetic Algorithms, and more.
Open Source Solutions
- Spamassassin
- One of the most well known open-source spam identification tools. Spamassassin uses a series of rules written to identify spam, as well as a Bayesian filtering system that learns from known spam and ham.
- DSpam
- DSpam takes another approach to spam identification. Dspam relies completely on algorithm based identification. It is not quite as well known as Spamassassin, but has a similar track record for identification of spam.
- SpamPal
- SpamPal is a spam identification tool similar to Spamassassin.
Commercial Solutions
- SpamArrest
- SpamArrest is a commercial spam prevention tool.