Walking in Simon’s Shoes

The editor for my book, Practical RDF, was Simon St. Laurent, well known and respected in XML circles. Some might think it strange that a person who isn’t necessarily fond of RDF and especially RDF/XML, edit a book devoted to both, but this is the way of the book publishing world.

Simon was the best person I’ve worked with on a book, and I’ve worked with some good people. More importantly, though, is that Simon wasn’t an RDF fanatic, pushing me into making less of the challenges associated with RDF, or more of its strengths. Neither of us wanted a rah-rah book, and Practical RDF is anything but.

I’ve thought back on many of the discussions about RDF/XML that happened here and there this last year. Simon’s usually been on the side less than enthusiastic towards RDF/XML, along with a few other people who I respect, and a few who I don’t. Mine and others’ blanket response has usually been in the nature of, “RDF/XML is generated and consumed by automated processes and therefore people don’t have to look at the Big Ugly”. This is usually accompanied by a great deal of frustration on our part because if people would just move beyond the ‘ugliness’ of RDF/XML, we could move on to creating good stuff.

(I say ‘good stuff’ rather than Semantic Web because the reactions to this term are best addressed elsewhere.)

However, the situation isn’t really that simple, or that easily dismissed, If pro-RDF and RDF/XML folks like myself are ever going to see this specification gain some traction, we need to walk a mile in the opponent’s shoes and acknowledge and address their concerns specifically. Since I know Simon the best, I’ve borrowed his shoes to take a closer look at RDF/XML from his perspective.

Simon has, as far as I know, three areas of pushback against RDF: he doesn’t care for the current namespace implementation; he’s not overly fond of the confusion about URI’s; and he doesn’t like the syntax for RDF/XML, and believes other approaches, such as N3, are more appropriate. I’ll leave URIs for another essay I’m working on, and leave namespaces for other people to defend. I wanted to focus on concerns associated directly with RDF/XML, at least from what I think is Simon’s perspective (because, after all, I’m only borrowing his shoes, not his mind).

The biggest concern I see with RDF/XML from an XML perspective is its flexibility. One can use two different XML syntaxes and still arrive at the same RDF model, and this must just play havoc with the souls of XML folks.

As an example of this flexibilty, most implementations of RDF/XML today are based on RSS 1.0, the RDF/XML version of the popular syndication format. You can see an example of this with the RSS 1.0 file for this weblog.

Now, the XML for RSS 1.0 isn’t all that different from the XML for that other popular RSS format, RSS 2.0 from Userland — seen here. Both are valid XML, both have elements called channel and item, and title, and description and so on, and both assume there is one channel, but many items contained in that channel. From an RSS perspective, it’s hard to see why any one would have so much disagreement with using RDF/XML because it really doesn’t add much to the overhead for the syndication feed. In fact, I wrote in the past about using the same XML processing for RSS 1.0, as you would for RSS 2.0.

However, compatibility between the RDF/XML and XML versions of RSS is much thinner than my previous essay might lead one to believe. In fact, looking at RSS as a demonstration of the “XMLness” of RDF/XML causes you to miss the bigger picture, which is that RSS is basically a very simple, hierarchical syndication format that’s quite natural for XML; its very nature tends to drive out the inherent XML behavior within RDF/XML, creating a great deal of compability between the two formats. Compatibility that can be busted in a blink of an eye.

To demonstrate, I’ve simplified the index.rdf file down to one element, and defined an explicit namespace qualifier for the RSS items rather than use the default namespace. Doing this, the XML for item would look as follows:

<rss:item rdf:about=”http://rdf.burningbird.net/archives/001856.htm”>
<rss:description></rss:description>
<rss:link>http://rdf.burningbird.net/archives/001856.htm <dc:subject>From the Book</dc:subject>
<dc:creator>shelleyp</dc:creator>
<dc:date>2003-09-25T16:28:55-05:00</dc:date>
</rss:item>

Though annotating all of the elements with the rss namespace qualier does add to the challenge of RSS parsers that use simple pattern matching, because ‘title’ must now be accessed as ‘rss:title’, but the change still validates as valid RSS using the popular RSS Validator, as you can see with an example.

Next, we’re going to simplify the RDF/XML for the item element by using a valid RDF/XML shortcut technique that allows us to collapse simple, non-repeating predicate elements, such as title and link, into attributes of the resource they’re describing. This change is reflected in the following excerpt:

<rss:item rdf:about=”http://rdf.burningbird.net/archives/001856.htm”
rss:title=”PostCon”
rss:link=”http://rdf.burningbird.net/archives/001856.htm”
dc:subject=”From the Book”
dc:creator=”shelleyp”
dc:date=”2003-09-25T16:28:55-05:00″ />

Regardless of the format used, the longer more widely used approach now and the shortcut, the resulting N-Triples generated are the same, and so is the RDF model. However, from an XML perspective, we’re looking at a major disconnect between the two versions of the syntax. In fact, if I were to modify my index.rdf feed to use the more abbreviated format, it wouldn’t validate with the same RSS Validator I used earlier. It would validate as proper RSS 1.0, and proper RDF/XML, and valid XML — but it sings a discordant note with existing understanding of RSS, RSS 1.0 or RSS 2.0.

More complex RDF/XML vocabularies that are less hierarchical in nature stray further and further away from more ‘traditional’ XML even though technically, they’re all valid XML. In addition, since there are variations of shortcuts that are proper RDF/XML syntax, one can’t even depend on the same XML syntax being used to generate the same set of triples from RDF/XML document to RDF/XML document. And this ‘flexibility’ must burn, veritably burn, within the stomach of XML adherents, conjuring up memories of the same looseness of syntax that existed with HTML, leading to XML in the first place.

It is primarily this that leads many RDF proponents as well as RDF/XML opponents into preferring N3 notation. There is one and only one set of N3 triples for a specific model, and one, and only one RDF model generating the same set of N3 triples.

Aye, I’ve walked a mile in Simon’s shoes and I’ve found that they’ve pinched, sadly pinched indeed. However, I’ve also gained a much better understanding of why the earnest and blithe referral to automated generation and consumption of RDF/XML, when faced with criticism of the syntax, isn’t necessarily going to appease XML developers, now or in the future. The very flexibility of the syntax must be anathema to XML purists.

Of course, there are arguments in favor of RDF/XML that arise from the very nature of the flexibility of the syntax. As Edd Dumbill wrote relatively recently, RDF is failure friendly, in addition to being extremely easy to extend with its built-in understanding of namespace interoperability. And, as a data, not a syntax person, I also find the constructs of RDF/XML to be far more elegant and modular, more cleanly differentiated, than the ‘forever and a limb” tree structure of XML.

But I’m not doing the cause of RDF and RDF/XML any good by not acknowledging how easy is it to manipulate the XML in an RDF/XML document, legitimately, and leave it virtually incompatible with XML processes working with the same data.

I still prefer RDF/XML over N3, and will still use it for all my application, but it’s time for different arguments in this particular debate, methinks.