Blog comment spam keeps on coming.

Posted on Thursday, June 23, 2005 12:02 PM

Even with Captcha enabled, comment spam keeps appearing on blogs. Either spammers have the ability to do OCR on captcha text, which I doubt as I only found few academic papers how to do this and the required skill set and amount of tweaking for each site using captcha images, would be well beyond most spammers.

Looking at the IP logs it is clear most comment spam comes from overseas where human labour costs are low and it may be worth while to pay someone to manually defeat captcha measures. What I think is needed is some kind of classification system, like Bayesian filtering, similar to what DSPAM, BogoFilter and SpamAssassin are using.

I have most of the code developed based on Paul Graham's (http://www.paulgraham.com/spam.html) 'Plan for Spam' thesis. I think such symptoms will, or already do, affect any public online systems such as Wikis and Forums.

Ultimately, only some automated classification moderation system will do, in addition, with some human supervision to fix false positives.