There’s been a great deal of discussion about RDFa, HTML5, and microdata the last few days, on email lists and elsewhere. I wanted to write down notes of the discussions here, for future reference. Those working issues with RDFa in Drupal 7 should pay particular attention, but the material is relevant to anyone incorporating RDFa.
Shane McCarron released a proposal for RDFa in HTML4, which is based on creating a DTD that extends support for RDFa in HTML4. He does address some issues related to the differences in how certain data is handled in HTML4 and XHTML, but for the most part, his document refers processing issues to the original RDFaSyntax document.
“If the object of a triple would be an XMLLiteral, and the input to the processor is not well-formed [XML]” – I don’t understand what that means in an HTML context. Is it meant to mean something like “the bytes in the HTML file that correspond to the contents of the relevant element could be parsed as well-formed XML (modulo various namespace declaration issues)”? If so, that seems impossible to implement. The input to the RDFa processor will most likely be a DOM, possibly manipulated by the DOM APIs rather than coming straight from an HTML parser, so it may never have had a byte representation at all.
There’s a lively little sub-thread related to this one issue, but the one response I’ll focus on is Shane, who replied, RDFa does not pre-suppose a processing model in which there is a DOM. The issue of xml:lang is also still under discussion, but I want to move on to new issues.
While the discussion related to Shane’s document was ongoing, Philip released his own first look at RDFa in HTML5. Concern was immediately expressed about Philip’s copying of some of Shane’s material, in order to create a new processing rule section. The concern wasn’t because of any issue to do with copyright, but the problems that can occur when you have two sets of processing rules for the same data and the same underlying data model. No matter how careful you are, at some point the two are likely to diverge, and the underlying data model corrupted.
Rather than spend time on Philip’s specification directly at this time, I want to focus, instead, on a note he attached to the email entry providing the link to the spec proposal. In it he wrote:
There are several unresolved design issues (e.g. handling of case-sensitivity, use of xmlns:* vs other mechanisms that cause fewer problems, etc) – I haven’t intended to make any decisions on such issues, I’ve just attempted to define the behaviour with sufficient detail that it should make those issues visible.
More on case sensitivity in a moment.
Discussion started a little more slowly for Philip’s document, but is ongoing. In addition, both Philip and Manu Sporney released test suites. Philip’s is focused on highlighting problems when parsing RDFa in HTML as compared to XHTML; The one that Manu posted, created by Shane, focused on a basic set of test cases for RDFa, generally, but migrated into the RDFa in HTML4 document space.
<!DOCTYPE HTML PUBLIC "-//ApTest//DTD HTML4+RDFa 1.0//EN" "http://www3.aptest.com/standards/DTD/html4-rdfa-1.dtd">
Author: <span property="dc:creator t:apple T:banana">Albert Einstein</span>
<h2 property="dc:title">E = mc<sup>2</sup>: The Most Urgent Problem of Our Time</h2>
Notice the two namespace declarations, one for “t” and one for “T”. Both are used to provide properties for the object being described in the document: t:apple and T:banana. Parsing the document with a RDFa application that applies XML rules, treats the namespaces, “t” and “T” as two different namespaces. It has no problem with the RDFa annotation.
Returning to the discussion, there is some back and forth on how to handle case sensitivity issues related to HTML, with suggestions varying as widely as: tossing the RDFa in XHTML spec out and creating a new one; tossing RDFa out in favor of Microdata; creating a best practices document that details the problem and provides appropriate warnings; creating a new RDFa in HTML document (or modifying existing profile document) specifying that all conforming applications must treat prefix names as case insensitive in HTML, (possibly cross-referencing the RDFa in XHTML document, which allows case sensitive prefixes). I am not in favor of the first two options. I do favor the latter two options, though I think the best practices document should strongly recommend using lowercase prefix names, and definitely not using two prefixes that differ only by case. During the discussion, a new conforming RDFa test case was proposed that tests based on case. This has now started its own discussion.
I think the problem of case and namespace prefixes (not to mention xmlns as compared to XMLNS) is very much an edge issue, not a show stopper. However, until a solution is formalized, be aware that xmlns prefix case is handled differently in XHTML and HTML. Since all things are equal, consider using lowercase prefixes, only, when embedding RDFa (or any other namespace-based functionality). In addition, do not use XMLNS. Ever. If not for yourself, do it for the kittens.
Speaking of RDFa in HTML issues, there is now a new RDFa in HTML issues wiki page. Knock yourselves out.
updateA new version of the RDFa in HTML4 profile has been released. It addresses a some of the concerns expressed earlier, including the issue of case and XMLLiteral. Though HTML5 doesn’t support DTDs, as HTML4 does, the conformance rules should still be good for HTML5.