Fighting Spam

by Thomas Fowler

The Wikipedia defines spam as “the abuse of electronic messaging systems to send unsolicited, bulk messages”. Most spam is propagated via e-mail, and I’m sure most of you are all too familiar with this type of spam. However, spam can also be sent via other electronic media, such as instant messenger services, Usenet newsgroups, weblogs (”blogs”), and even mobile phones.

Those of you who are running guestbooks on your websites have also probably noticed junk messages appearing amidst your other (legitimate) comments. NoteWay’s guestbook system has a spam filter in place that catches about 90% of guestbook spam. The filter looks for certain “patterns” common to how most spam messages are delivered using automated software tools. When a spam message does slip through our filters, we add the web address advertised in the spam to a blacklist of web addresses. This helps prevent the same message from appearing more than once on your guestbook. We are also working on other techniques to reduce comment spam that we’ll be rolling out in the near future.

Below I will explain why spammers target guestbooks, how they spread their spam messages, and discuss some techniques for blocking this type of spam.

Unfortunately, spamming is economically viable for spammers, because it costs them virtually nothing to spread their junk all over the internet. Using automated software programs that do their dirty work for them, they can send out thousands of unsolicited e-mail messages with the click of a button. Similar software programs allow them to leave spam comments on online forums and guestbooks with the same ease. Although there are some spammers who manually leave “spammy” comments on websites, most of them use automated software of some sort to carry out their campaigns.

Over the last couple of years, blogs, forums and guestbooks have become favored targets for spam, especially if they are popular with the public. Because of the way that some search engines rank search results, links from popular sites can help the search engine rankings of the sites being linked to. Thus, spammers target highly ranked sites with the hope that they will improve their own search engine rankings.

Several techniques for stopping this type of spam have been invented and implemented on various websites. Unfortunately, combating spam is similar to a spy vs. spy conflict. Every time someone finds a new way to block spam, the spammers figure out a way to get around it. In other words, no solution is perfect. Below is a summary of some of the more popular anti-spam techniques, including the pros and cons of each.

Word filters

This technique involves scanning for certain words in a “blacklist” and blocking any messages that contain these words. This type of filter is fairly easy to implement, but could have the unintended consequence of blocking legitimate comments (e.g. someone may innocently use a blacklisted word in their comment). Also, since spammers often get creative with their spelling of these words, the blacklist has to be updated regularly to include all the different spelling variations.

Not allowing links in posts

Since a spam message doesn’t do a spammer much good unless it contains a link to their website, one technique for combating spam is to simply disallow links in comments posts. However, since links are the “currency” of the web, this will also hurt legitimate commenters, who may wish to tell you about their website or link to a website that may be of interest to you.

Comment moderation

Some guestbook systems implement what is called “comment moderation”. This means that no comment is posted on your guestbook or blog until you have manually approved it. This is a very effective way to prevent unwanted posts (not just from spammers), but places an extra burden on you as the site owner to periodically review comments in your comment queue. This can be especially tedious if you get a lot of comments on your site.

Turing tests

You may have come across websites where you’re asked to re-type the letters presented in a graphical image before you’re allowed to submit a form. This is a type of “Turing test” designed to distinguish automated software programs from human beings. Unfortunately, software programs are becoming more and more sophisticated and can now “pass” a lot of these tests. Also, these types of tests discriminate against users who for various reasons cannot view images.

Questions or comments about this article? Are there other topics you would like us to discuss in the future? Please contact us with your suggestions.

Author biography: Thomas Fowler is Vice President of Technology and Development at NoteWay Media. He has been involved with music for most of his life and now earns a living building websites.