« Using TypeKey for Movable Type Authentication | Main | The Penny Whistle »

January 20, 2006

Blogging Hell - Trackback Spammers Have a New Trick

by Ferdinand T Cat

The trackback spammers have a new trick. Since most anti-spam technology is based on word matching, they're disguising the words using escape codes. Consider, for example, the following string.

ge capital

In raw form, it looks like gibberish, but if it's put into a web page it displays as ge capital.

A group of characters beginning with an ampersand (&) and ending with a semi-colon (;) is called an HTML escape sequence. For example, the escape sequence © displays a copyright symbol. The &# sequence tells the browser to expect a character code number instead of an actual character. 103 is the character code number for a lower-case G, so that's what the browser displays.

The character code number feature was designed to make it possible to display characters that cannot be typed on a keyboard or that consist of more than 7 bits. There is nothing, however, that prevents the feature from being used for a normal character. In the early days of spam harvesting, when web crawlers would search pages for EMAIL addresses to use for mailing lists, using the &# feature for the letters in your EMAIL address would confuse the crawler, saving you from loads of EMAIL spam.

Thankfully, there's an easy way to stop this trick: simply add escape sequences to your spam keyword filter. You have to do this with a PERL pattern, since the escape sequences are not words. Here's the pattern to use:

/&#\d+;/

This particular pattern will match the &# escape codes, but will leave legitimate escape codes such as © and < alone.

Respectfully submitted,

Ferdinand T. Cat


# At Fri 2:19 PM | Permalink | Trackback URI | Comments (1) | More Blogging Hell

Trackback Pings

Comments

Ferdy, Bruce, I've recently been plagued with spam that got through my filter and couldn't figure out how. What a life saver this is.

Thanks!!


Posted by: GM Roper at January 22, 2006 3:27 PM

Leave a comment

HTML is not allowed in comments; however, if you put in a raw URL (http://www.somewhere.com/page.html) it will automatically be converted to a link.. Also, it is likely your comment will not appear unless you refresh the page manually after posting it.

Leave a comment