Category: Technology

Throttling the Trackback

Post author By Shelley Powers
Post date February 1, 2005

I was hit with 781 trackbacks last night, all of which went into moderation, but all of which triggered my comment throttle (trackbacks are stored in the same table as comments in WordPress), so if you tried to comment and couldn’t you’ll know why.

I added throttles now to the trackback code–only allows ten trackbacks in a minute, 30 in a day. My site is using customized code, but I created a customized wp-trackbacks.php file for WordPress 1.22, which you can access here. Note, I’ve not done a thorough job testing the throttle code on trackbacks (it has been in use for months at Burningbird for comments) so use at your own risk. If someone spots a bug, let me know.

Search in the code for the Burningbird throttle comment, and change the 10 or 30 to whatever value you want.

I imagine that this is notice being given by the comment spammers that nofollow won’t stop them. Contrary to what you read in the Register though, pagerank is the primary reason for comment spams, not click through. While I am not making the issue into a religion, as Scoble asserts, I don’t agree that nofollow is going to be a solution for comment spam. However, I’m also not going to ignore spammer FUD: I imagine the only reason that “Sam” agreed to the interview with the Register was to cast additional doubt on nofollow. This isn’t because he’s concerned about nofollow driving him out of business, but because he knows he’ll have to send that much more spam to make up for sites that are using nofollow.

A credible coder

Post author By Shelley Powers
Post date January 25, 2005

Recovered from the Wayback Machine.

I’ve been silent in this weblog, primarily because I’ve been working on a couple of other projects. I had talked with a good, and wise, friend of mine about this effort and he made a point that I felt was valid: that I should implement those applications or functionalities I’ve talked about previously first, before taking on a new project. It’s all about credibility you see.

Of course, I could point that I’ve delivered numerous tricks, tips, code fixes, not to mention how-tos and tutorials and what not, as well as helping to install 109 weblogs, and answering whatever questions others have asked. I had assumed that this gave me some credibility; Still, I understand what he was saying: complete applications shout where help and tips and fixes only whisper.

So I’ve been working on three major projects. The first is a ‘comment package’ that has much of the nifty comment functionality out at Burningbird, wrapped up into a package that could be used by people using other weblogging tools. This includes live preview, spell check, and post-edit functionalities. I have it finished for WordPress, but I’m also creating a Movable Type and Textpattern backend. These latter are for a couple of friends that have asked for the functionality. More to the point, though, I wanted to demonstrate that one can extend the tools outside of the traditional plug-in environment, and with that extension, make the functionality available to a variety of tools because the data and functionality between the tools is so similar.

This one is almost finished, but incorporating these changes into the templates for the other tools is a bit tricky because they are a changing, not the least because of that nofollow nonsense.

The second project is to provide updates to the material in my chapters for the Practical RDF book. This has been interesting and fun, and I am pleased to see a maturity in both specs and data. I’d like to see the technology a little more easily embeddable, which means lightweight language specific frameworks. But they’re coming.

I’m actually using the technology that I’m covering above in the final project, which is a variation of the Poetry Finder I talked about long ago. However, rather than just poetic annotation, this application will allow one to specify any field of data, such as legal, political, genetics, whatever, and annotate it using RDF statements, which are then added into the MySQL database for a specific weblog post.

It’s not going to require that users understand RDF, or even that developers understand RDF. The developers will be able to define a set of statements they want to capture as a model, and this will be used to generate form statements of the nature of _______________ issomething _____________________. An example would be “bird” IS METAPHOR FOR “freedom”, with the issomething provided by the model developer, and the values provided by the user.

Then, when a post is accessed (either WordPress, or Movable Type implemented as a dynamic PHP weblog) with an “/rdf” extension, .htaccess rules will trigger functionality that will deliver a complete RDF/XML output of all the data that is defined for a resource defined with the URL of the article. So, a specific article could have poetic and political annotation, and both would be combined into one model and returned when the URL is accessed with the “/rdf” extension.

Sure the statements defined using this extension are simple, but most models will consist of simple assertions. A case in point is the example data that I’m using at Practical RDF’s original Query-o-Matic: a listing of all terrorist acts since 1988. Simple, yet; but still managing to capture a lot more context that the other folksonomy/tags implementations I’ve seen.

The tricky part on this is getting the PHP together to maintain the backend services. There is RAP (RDF API in PHP), but it doesn’t implement SPARQL (the W3C RDF query language spec) yet, and I had hoped to use this query language.

My hope is by the time I’m finished with these projects, WordPress 1.5 will be out. I’ll then follow through on Wordform, but based on the final 1.5 product, not one in process. It will also be less ambitous than my original intent, primarily because its for my own use and as a curiousity to others — I don’t expect much interest in it.

What it will have is:

1. All data operations are pulled into a separate file rather than have bits and pieces of SQL scattered about. This makes it a whole lot easier to make changes, especially to the data model.

2. I’ll most likely be altering the comment and trackback spam prevention to incorporate my own ideas, which have shown themselves to be working relatively nicely in Burningbird. I’ve talked about these previously in this weblog.

3. I’m going to change the conditional checks in the code. All of them are as follows:

if ( ’spam’ != $approved ) {

In other words, the literal is first, the variable second. In all my years of programming, you put the variable first, because if it’s null (hasn’t been assigned a value) the conditional fails at that point without having to check the second value. I wasn’t aware that PHP differed in this regard, and I have no idea why the developers of WordPress do it this way. But it bugs me, so I’m changing them for no other reason than it bugs me, unless someone pops in with the reason for this, in which case I won’t. Who knows, maybe PHP does handle this all differently.

4. I’m making the admin more dynamic. Well, I’ve already made this change. With this, you can add a new comment or post status, high level menu item, and individual post menu item by updating tables, as these will not be table driven. In line with this, the semantic data extension talked about earlier will be incorporated into Wordform’s administration pages, as well as my existing fullpage preview functionality, per comment moderation, and post status of ‘insert’.

I won’t be adding multiple weblog support, primarily because it’s a lot of work, I won’t be using it, and Wordform is mainly for my use. The separation of SQL into a separate file should help with this if I ever get energetic about this application again.

When finished with all the various application, I will put them online as GPL open source for others to do with as they will. I’ll be posting on these changes, as well as links to the code, at Burningbird rather than this weblog; except for the Practical RDF book updates, of course, which will go to the book weblog.

Technology Weblogging

NoFollow

Post author By Shelley Powers
Post date January 18, 2005

Six Apart has announced what Dave Winer only hinted about and we’ve been expecting — Google and the other search companies have partnered with the weblogging companies to come out with the use of rel=”nofollow”, as a way of dealing with comment spam.

When added to the weblog template for links, this instructs the search bots not to include the links within page ranking. The point being that once the spammers realize that their effort is futile, they’ll go away, like the professional business people we all know they are.

This might have worked…three years ago when we the webloggers called out for Google to help. At the time that comment spam started to become a problem, one of the suggestions was for Google to get involved and come up with a way to mark links so that they have no value for the Google webbot.

Now, all these years later, we read the following at Six Apart:

Recently, we’ve reached out to other blog tool vendors to try to coordinate information about comment spam techniques and behaviors. As part of these efforts, we’ve also begun to talk to search companies about enriching linking semantics to better indicate visitor-submitted content (like comments or TrackBacks).

Others are jumping up and down about this now, such as Scoble. I’m not quite jumping up and down. But I’ll add it to my template, and hope for the best.

If you do implement this, you need to implement it not only in your comment listing but also in the sidebar ‘recent comments’ plugin or code that you’re using. Your legitimate commenters or trackbacks won’t get any link rank for their entries, but I imagine people are so desperate they don’t care anymore.

Remembered another

WordPress is going to have to change its comment policy to automatically create hypertext links for internal links. Otherwise spammers will just include links within the comment itself.

Technology Weblogging

Take your hands off the tech and back away slowly

Post author By Shelley Powers
Post date January 15, 2005

Recovered from the Wayback Machine.

Several people have linked to Martin Schwimmer and his indignation about the fact that Bloglines re-prints the content of his post, without attribution and with the possibility of future advertisements (…or guilty until proven innocent). This violates the cc license, he says, because he can only be republished if proper attribution is given, and in a non-commercial setting.

This sounded familiar, and sure enough, digging around in my archives finds this. where another person reacted in outrage when they found out their feed was being re-published:

What was a surprise is that Mitch reversed himself and now offers a Creative Commons license on his material, though the license information isn’t duplicated in Mitch’s RSS feed directly. Mitch also brings up the ‘commercial’ aspect of re-publishing the material at LiveJournal, and what’s to stop someone from grabbing the content and putting it behind password protected sites that charge money for access.

Easy – don’t publish all your entire post in your RSS feed; keep the RSS feeds to excerpts only. Remove the content-encoded field and just leave the description. And adjust your blogging tool to publish excerpts, only. If your weblogging tool doesn’t allow this adjustment, ask the tool builder to provide this capability. The RSS feeds are there to help promote your ideas, not promote their theft. But you have to control the technology, not let the technology control you.

Wait until he discovers the other online sites, such as 2rss.com, that do add ads into the feed if you use it to subscribe within any aggregator, Bloglines or not.

update

Also, see this about creative commons licenses and RSS feeds back in 2002.

Question to Mr. Schwimmer — is your cc license attached to your feed?

Technology

Be Stingy

Post author By Shelley Powers
Post date January 13, 2005

Regarding Dave Winer’s idea for some form of centralized syndication feed system, I got a chuckle out of the comment, “What problem am I having and how is a centralized service going to help?” in Phil Ringalda’s post Centralized Subscription? Not that way thanks. You see now the great benefit of being exposed to us techs through weblogging: you get to experience, with us, the joy of uncertainty that comes from knowing that you’re always on the edge of failure.

Dave does have a point in that if you provide one click subscriptions for one aggregator, such as a Subscribe via Bloglines button, it won’t work for other aggregators; you either have to blow off the others, or you end up with a trail of buttons down your page, like stepping stones across a vast sea of syndication.

You could be like me, and provide the bare minimum to aid in subscription: auto-discovery enabled via my weblog tool, and a couple of links to feeds in my sidebar. However, I will be the first to admit that clicking a link to open an XML file isn’t the friendliest way to get people to subscribe to your site’s syndication feeds.

I am open to alternatives to this arrangement, but not necessarily Dave’s approach. Though he hastens to say that his approach isn’t a centralized directory, it is a centralized source of data, one with consequences beyond the intended purpose.

Dave’s solution would require that you pass to the service a link to an OPML file, which contains a listing of sites to which you subscribe, and then click a link to add a new subscription. In return the service would provide the list in a format specific to whatever aggregator you use. Your subscription list would then be merged with other subscription lists, and made public; the data contained being accessible for other purposes.

With this approach, not only would I be able to more easily subscribe to your writing, I could also take a look at who you read, and don’t read. Would your subscription list be the same as your blogroll? If not, are you prepared to answer questions from those who you link to, but don’t read? How about those who you read, but don’t link? I could even use your subscription list as my own, so that I can read the exact same sites you read every day; more, I could follow you around in comments, adding my own following yours, just to let you know I’m near and thinking of you.

Phil wrote his own scenario, about subscribing to a site that provides information about spastic colons, which can then get Googled by the hot new love of your life. We say we’re an open book, but do we really want to be that open?

As Phil demonstrates so effectively, which service works best is the one that requires the minimum of information. This follows from a known paradigm in designing relational databases or class systems in languages such as PHP–more data is more overhead and increased complexity, so you keep the data needs as simple and specific to the problem being solved, as possible.

In fact, though the needs of aggregation aren’t the same as identity, we could apply Kim Cameron’s second law of identity, the Law of Minimal Disclosure, to this problem: The solution which discloses the least identifying information is the most stable, long-term solution.

In the case of too many subscription buttons, Phil recommends the Syndication Subscription Service, as a solution. The service doesn’t require anything more than a link to your syndication feed, and when accessed, returns a set of buttons for many different aggregators. In fact, I liked this service so much that I’ve pulled my links to my two Atom and RDF feeds in the sidebar and replaced them with a link to it, instead.

Though it is also a centralized service, it’s one that requires a minimum of data and effort, and since the code to support it is open source, could be duplicated if need be. Best of all, it’s something I can use now for this newly discovered problem I didn’t know I had, but which has now been solved, and so no longer exists.

Much of the discussion is about handling feeds like audio files, and the so-called feed protocol. I like what Seth Dillingham wrote on this long ago:

The feed protocol was originally designed for farms. Cattle, for example, just have to click a button to access a feed: url on the farmer’s server, which causes grain to be dropped in the trough.

In a bizarre misuse of this important technology, the feed protocol can also be used to request an RSS or Atom file, to “feed your brain.”

I’m with the cows on this one — if I can’t poke a button with my nose and have it give me food right now, I’m moving to a different barn.