Spammers: Getting to know you

Probably my last entry in this recent series of Weblog/Email Spammers: Evil thereof. I only write this entry because some of you have implemented my little comment hack, and have most likely found it’s not working with our friend, vig-rx.

My latest hit from this particular fiend was based on a Google search for the following:

blog August 2003 Name: URL: Comments:

In other words, if the page follows a consistent Movable Type template, which shows the month and year of posting, as well as containing the traditional form and comment field labels of Name, URL, and Comments, and had the word “blog” somewhere in the page (such as the title), you were visited. What the spam automation is doing – guesswork, only, so take with a grain of salt – is grabbing the page, finding the form, finding all of the form fields (including my own little hack), and recreating a form POST with the same fields.

Targeting specific weblogging phrases makes sense because we all start with a basic set of templates for our tools and then modify them. Unfortunately, we focus on the appearance and not the content. So, for instance, we MT users leave the comment form in the same page as the individual posting, and we leave the labels the same – Name, URL, and Comment. To make things easier, we use the word ‘blog’ somewhere, such as our title, blogroll, or so on.

I don’t use ‘blog’ in my basic template at Burningbird, but do mention the word when I’m talking about blogging – which just goes to show that perhaps I need to talk about blogging less. My Practical RDF weblog isn’t getting hit by the comment spammer because I don’t maintain the comment form in the individual entry pages; the labels of Name and URL are missing (not to mention the form for scraping). Most of my other weblogs aren’t getting hit because I rarely mention the word ‘blog’ in them – other than in Weblogging for Poets – and even if I did, I’m not using the traditional date annotation with these essays. No August. No July.

Not only does this person have a decent understanding of how to use technology – using different IP addresses, timed delays between accessing the page and posting the comment, page scraping (grabbing the form fields), and most likely changing the requesting Agent so that it looks like the request is coming from a browser (IE, of course) – they have a fairly good understanding of people, and our habits. Clay Shirky, this is the type of person you should invite to your software summits.

This comment spammer is a good social software engineer, lying in wait observing us and our patterns and then crafting software that fits how we do business. Rather than get angry at this person, we should marvel at their ability to write software that is so adaptive to how we use software. Rather than tear our hair out and gnash our teeth, and block every IP from ChinaNet, China’s primary Internet provider, we should be smiling wryly at how our own habits have been used against us.

After all, the solution to this spammer, this time, is to change one label in the template – for instance, changing the label of URL to Link. All of our clever technical hacks fail but a simple human hack succeeds. Of course, as we adapt so will the hackers. There is no ultimate solution to this problem, other than eliminate comments.

When I was heavily involved with P2P technologies (Peer to Peer, such as the music sharing software), we knew that the key to making our software work would be to fit the technology to people’s behavior, not make people change to fit the technology. We need to look no further for our teachers of this type of software development and distribution than the virus writers and spam generators.

Take our recent email spam buddy that’s cause us all so much heartache. You would think that we would have learned not to open email attachments by now, but we’re still getting hit because people are still opening the email, launching the virus contained in them, and generating yet more emails. Why do we open them? Simple – the spammers use our behaviors against us.

They pull people from contact lists and used these as senders so the names are familiar. They vary the subject line. They take advantage of open hooks within the software that’s installed by default and the operating systems on most PCs. Most of all, though, they used subject messages that triggered trusting responses within us – the use of “Thanks!” and “Wicked Screensaver”, “Details”, and so on. I wouldn’t be surprised if the spammers weren’t collecting data from the machines of people that opened the attachments, seeing just which subject was responsible for more results.

And we make these things so easy for the Bad Guys. We use Outlook for our email on Windows because that’s what’s installed by default. We trust the identity of the senders without examining the headers. We trust our software to protect us, though the same software blocks friends as well as foe. Most of all, we fall into patterns that can be automated – such as all of us Movable Type people using a comment form that has the same labels of Name, URL, and Comment.

Recently, there’s been discussion that email is ‘broke’, though I have no idea what people mean by email in this context. Do you mean the protocols? The email applications? Or do you mean people using the software, because there’s a world of difference in looking at email from a technology perspective, and looking at email and how we use it. Yet, rather than focus on our behavior when using software, we focus on the technology and we talk about using RSS as replacement for email, same as we talk about using htaccess and MT to block IPs of spammers from our sites. Or using my own comment hack, so easily set aside.

And all the while, the virus writers and spammers are watching us, seeing how we react, observing what we’re doing, listening to our debates – and are already hard at work writing the next generation of virus and spam generators.