Simpler is better

Manu Sporny and Ian Hickson have had an interesting, and telling, exchange about RDFa and microdata in the HTML WG list (see the opening email for the thread). In one of the emails, Hickson writes about why he created a whole new microdata section, rather than incorporate RDFa:

By “technical problems” I mean problems with the design, as opposed to
editorial problems. They’re primarily usability issues, which are to some
extent subjective. I make no apology for having an opinion on what makes a
usable language; it’s my job to have such an opinion.

Generally speaking, my position on this topic is a straightforward one:
simpler is better.

One asks: what is simple about creating an entirely new metadata solution, when there are two viable ones (microformats and RDFa) with both history and use? A new microdata section with predefined vocabularies that will be out of sync with their outer specification counterparts before the ink on HTML5 is even dry?

I haven’t touched on the microdata section of the HTML 5 specification in my little story on HTML 5 yet, because that one, in particular, is really key to everything that is wrong with the HTML 5: the specification and the process. It all really boils down to Ian and a few of his friends having opinions, and the power to enforce those opinions on the next version of the web. There are no checks. There are no balances. There is nothing but an illusion of equality and fairness.

What’s obscene is that no one really likes the microdata section, not even Ian himself. Oh, a few of his buddies manfully came out with the appropriate murmurs of delight, but none of these folks are interested in metadata. More importantly, where there is universal application of microformats and RDFa, there is no implementation that supports microdata (though I imagine one will be tossed into Opera quickly, just for spite).

In the end of the thread, when last I looked, Sam Ruby has vetted Manu Sporny’s RDFa alternative HTML 5 specification. So what does that mean? Your guess is as good as mine, but in my opinion, it does not mean that RDFa has a fair chance, any more than creating specification text for a new description of the summary or alt attributes, means these will be given a fair chance. If any of this comes down to a vote, I have no doubt that the WhatWG folks will be able to swing a majority. After all, client side data storage is sexy, summary attributes for screenreaders is not.

Having a majority does not mean that the best decision wins.



I don’t have time at the moment to write anything in-depth on the recent decision of the W3C to let the charter for the XHTML2 working group expire. Instead, I’m going to list several interesting and/or relevant writings others have done, as both bookmarking for a future story, and for your edification.

I’ll probably add to this list over time.

update I have just filed my first formal objection with the W3C about the philosophy of one vote/one veto for the major browser vendors over any aspect of the HTML 5 specification.

What the one vote/one veto decision principle means is that if a company, such as Microsoft, states it will not implement, say, SVG in HTML, the Canvas element, or any other aspect of HTML 5— up to and including the entire HTML 5 specification — that it will be pulled from the HTML 5 specification. No discussion among the members of the HTML WG would be allowed to override this decision.

This is what replaces work on XHTML2. This is the future of your web.


A Loose Set of Notes on RDFa, XHTML, and HTML5

There’s been a great deal of discussion about RDFa, HTML5, and microdata the last few days, on email lists and elsewhere. I wanted to write down notes of the discussions here, for future reference. Those working issues with RDFa in Drupal 7 should pay particular attention, but the material is relevant to anyone incorporating RDFa.

Shane McCarron released a proposal for RDFa in HTML4, which is based on creating a DTD that extends support for RDFa in HTML4. He does address some issues related to the differences in how certain data is handled in HTML4 and XHTML, but for the most part, his document refers processing issues to the original RDFaSyntax document.

Philip Taylor responded with some questions, specifically about how xml:lang is handled by HTML5 parsers, as compared to XML parsers. His second concern was how to handle XMLLiteral in HTML5, because the assumption is that RDFa extractors in JavaScript would be getting their data from the DOM, not processing the characters in the page.

“If the object of a triple would be an XMLLiteral, and the input to the processor is not well-formed [XML]” – I don’t understand what that means in an HTML context. Is it meant to mean something like “the bytes in the HTML file that correspond to the contents of the relevant element could be parsed as well-formed XML (modulo various namespace declaration issues)”? If so, that seems impossible to implement. The input to the RDFa processor will most likely be a DOM, possibly manipulated by the DOM APIs rather than coming straight from an HTML parser, so it may never have had a byte representation at all.

There’s a lively little sub-thread related to this one issue, but the one response I’ll focus on is Shane, who replied, RDFa does not pre-suppose a processing model in which there is a DOM. The issue of xml:lang is also still under discussion, but I want to move on to new issues.

While the discussion related to Shane’s document was ongoing, Philip released his own first look at RDFa in HTML5. Concern was immediately expressed about Philip’s copying of some of Shane’s material, in order to create a new processing rule section. The concern wasn’t because of any issue to do with copyright, but the problems that can occur when you have two sets of processing rules for the same data and the same underlying data model. No matter how careful you are, at some point the two are likely to diverge, and the underlying data model corrupted.

Rather than spend time on Philip’s specification directly at this time, I want to focus, instead, on a note he attached to the email entry providing the link to the spec proposal. In it he wrote:

There are several unresolved design issues (e.g. handling of case-sensitivity, use of xmlns:* vs other mechanisms that cause fewer problems, etc) – I haven’t intended to make any decisions on such issues, I’ve just attempted to define the behaviour with sufficient detail that it should make those issues visible.

More on case sensitivity in a moment.

Discussion started a little more slowly for Philip’s document, but is ongoing. In addition, both Philip and Manu Sporney released test suites. Philip’s is focused on highlighting problems when parsing RDFa in HTML as compared to XHTML; The one that Manu posted, created by Shane, focused on a basic set of test cases for RDFa, generally, but migrated into the RDFa in HTML4 document space.

Returning to Philip’s issue with case sensitivity, I took one of Shane’s RDFa in HTML test cases, and the rdfquery JavaScript from Philip’s test suit, and created pages demonstrating the case sensitivity issue. One such is the following:

<title>Test 0011</title>
<div about="">
Author: <span property="dc:creator t:apple T:banana">Albert Einstein</span>
<h2 property="dc:title">E = mc<sup>2</sup>: The Most Urgent Problem of Our Time</h2>

Notice the two namespace declarations, one for “t” and one for “T”. Both are used to provide properties for the object being described in the document: t:apple and T:banana. Parsing the document with a RDFa application that applies XML rules, treats the namespaces, “t” and “T” as two different namespaces. It has no problem with the RDFa annotation.

However, using the rdfquery JavaScript library, which treats “t” and “T” the same because of HTML case insensitivity, an exception results: Malformed CURIE: No namespace binding for T in CURIE T:banana. Stripping away the RDFa aspects, and focusing on the namespaces, you can see how browsers handle namespace case in an HTML document and in a document served up as XHTML. To make matter more interesting, check out the two pages using Opera 10, Firefox 3.5, and the latest Safari. Opera preserves the case, while both Safari and Firefox lowercase the prefix. Even within the HTML world, the browsers handle namespace case in HTML differently. However, all handle the prefixes the same, and correctly in XHTML. So does the rdfquery JavaScript library, as this test page demonstrates.

Returning to the discussion, there is some back and forth on how to handle case sensitivity issues related to HTML, with suggestions varying as widely as: tossing the RDFa in XHTML spec out and creating a new onetossing RDFa out in favor of Microdatacreating a best practices document that details the problem and provides appropriate warnings; creating a new RDFa in HTML document (or modifying existing profile document) specifying that all conforming applications must treat prefix names as case insensitive in HTML, (possibly cross-referencing the RDFa in XHTML document, which allows case sensitive prefixes). I am not in favor of the first two options. I do favor the latter two options, though I think the best practices document should strongly recommend using lowercase prefix names, and definitely not using two prefixes that differ only by case. During the discussion, a new conforming RDFa test case was proposed that tests based on case. This has now started its own discussion.

I think the problem of case and namespace prefixes (not to mention xmlns as compared to XMLNS) is very much an edge issue, not a show stopper. However, until a solution is formalized, be aware that xmlns prefix case is handled differently in XHTML and HTML. Since all things are equal, consider using lowercase prefixes, only, when embedding RDFa (or any other namespace-based functionality). In addition, do not use XMLNS. Ever. If not for yourself, do it for the kittens.

Speaking of RDFa in HTML issues, there is now a new RDFa in HTML issues wiki page. Knock yourselves out.

updatenew version of the RDFa in HTML4 profile has been released. It addresses a some of the concerns expressed earlier, including the issue of case and XMLLiteral. Though HTML5 doesn’t support DTDs, as HTML4 does, the conformance rules should still be good for HTML5.