Right tool for the right job: XML formats redux

In the last post, I said I was a pusher of code, not a designer. As a pusher of code, then, I do feel comfortable commenting on the user of Atom or RSS for an import/export format.

Danny Ayers recently pointed out that there’s a new Atom format spec. Good, clean writeup with an interesting twist in the Introduction:

Atom is an XML-based document format intended to allow lists of related information, known as “feeds”, to be synchronised between publishers and consumers. Feeds are composed of a number of items, known as “entries”, each with an extensible set of attached metadata. For example, each entry has a title.

The primary use case that Atom addresses is the syndication of Web content such as Weblogs and news headlines to Web sites as well as directly to user agents. However, nothing precludes it from being used for other purposes and kinds of content.

That’s a bit like saying, “Here now, we have a specification for the banking industry wot would be a good spec for those of you who run petrol stations, what say?”

In my opinion, Atom, as with RSS, make a great syndication format, but there’s too much of the underlying purpose to the format to make them exceptionally good for universal weblog transforms, including pushing weblog data from one tool to another. For instance:

Each item in Atom, or RSS for that matter, has a link associated with it. I suppose one could use this to hold a slug, or filename, but the two are not the same information.
Atom has the concept of an identifier, atom:id, which doesn’t translate well in weblogging terms. Each tool would have it’s own unique identification system.
Too many of the fields are associated with the mechanics of the feed, such as atom:generator. While this is essential for syndication feeds, there’s no need for this in weblog migration. Though you can say it’s optional, if you find there is no fit in the business for most (all?) of the optional bits, then you may be looking at a poor fit, overall, between the spec and the use.
There is a lot of data missing in Atom. Keyword-value pairs is something I think a format has to support. There isn’t anything in the specification to do with categories, or how hierarchical categories would be managed. One could say the same for comments – right now they have to be artificially transformed into little feed items, when what they are, are comments to a post, not individual feeds.

The latter item is the kicker for me. If you say that one can extend the model to include this extra data since Atom supports namespaces, why not take this a step further and say, well, then we’ll go with a new model specifically focusing on migrating data between tools; a syndication feed is not the same thing as porting an entire weblog between tools.

Of course, saying something like “Atom is not a good fit for this purpose” is similar to invoking the Lazy Web to have it done, and I’m sure a dozen feeds will be created that use Atom, or RSS, to produce and consume migration data. However, I’m not saying it can’t be done; I’m saying that the forcing a specification for one purpose into being used for another purpose will, in the long run, be more trouble then its worth. Especially when you consider the political ramifications to using a syndication feed.

One could write a tool that both exports and imports data directly into the database, rather than interfacing through the tool, but this is not a comfortable option for many non-geeks. They could be concerned, and rightfully, that the underlying data model could change for the tools, and what worked one time may not work the next. The best approach is to use something that tools support, so that users have a degree of comfort with the post.

What we don’t need is one tool using an RSS formatted import mechanism, while another uses an ATOM formatted export. Asking all tools to support all syndication formats for weblog imports and exports is a bit much; generating multiple syndication feeds is a matter of a new arrangement of tags, but consuming the different feeds is a whole different game.

At the same time, telling people who are already apprehensive about learning a new set of template tags that they need to transform the output generated by one tool before it can be used the another (oh, and there will probably be loss of data between the two) is a Geeks Choice response.