There’s been a great deal of discussion about RDFa, HTML5, and microdata the last few days, on email lists and elsewhere. I wanted to write down notes of the discussions here, for future reference. Those working issues with RDFa in Drupal 7 should pay particular attention, but the material is relevant to anyone incorporating RDFa.
Shane McCarron released a proposal for RDFa in HTML4, which is based on creating a DTD that extends support for RDFa in HTML4. He does address some issues related to the differences in how certain data is handled in HTML4 and XHTML, but for the most part, his document refers processing issues to the original RDFaSyntax document.
Philip Taylor responded with some questions, specifically about how xml:lang is handled by HTML5 parsers, as compared to XML parsers. His second concern was how to handle XMLLiteral in HTML5, because the assumption is that RDFa extractors in JavaScript would be getting their data from the DOM, not processing the characters in the page.
“If the object of a triple would be an XMLLiteral, and the input to the processor is not well-formed [XML]” – I don’t understand what that means in an HTML context. Is it meant to mean something like “the bytes in the HTML file that correspond to the contents of the relevant element could be parsed as well-formed XML (modulo various namespace declaration issues)”? If so, that seems impossible to implement. The input to the RDFa processor will most likely be a DOM, possibly manipulated by the DOM APIs rather than coming straight from an HTML parser, so it may never have had a byte representation at all.
There’s a lively little sub-thread related to this one issue, but the one response I’ll focus on is Shane, who replied, RDFa does not pre-suppose a processing model in which there is a DOM. The issue of xml:lang is also still under discussion, but I want to move on to new issues.
While the discussion related to Shane’s document was ongoing, Philip released his own first look at RDFa in HTML5. Concern was immediately expressed about Philip’s copying of some of Shane’s material, in order to create a new processing rule section. The concern wasn’t because of any issue to do with copyright, but the problems that can occur when you have two sets of processing rules for the same data and the same underlying data model. No matter how careful you are, at some point the two are likely to diverge, and the underlying data model corrupted.
Rather than spend time on Philip’s specification directly at this time, I want to focus, instead, on a note he attached to the email entry providing the link to the spec proposal. In it he wrote:
There are several unresolved design issues (e.g. handling of case-sensitivity, use of xmlns:* vs other mechanisms that cause fewer problems, etc) – I haven’t intended to make any decisions on such issues, I’ve just attempted to define the behaviour with sufficient detail that it should make those issues visible.
More on case sensitivity in a moment.
Discussion started a little more slowly for Philip’s document, but is ongoing. In addition, both Philip and Manu Sporney released test suites. Philip’s is focused on highlighting problems when parsing RDFa in HTML as compared to XHTML; The one that Manu posted, created by Shane, focused on a basic set of test cases for RDFa, generally, but migrated into the RDFa in HTML4 document space.
Returning to Philip’s issue with case sensitivity, I took one of Shane’s RDFa in HTML test cases, and the rdfquery JavaScript from Philip’s test suit, and created pages demonstrating the case sensitivity issue. One such is the following:
<!DOCTYPE HTML PUBLIC "-//ApTest//DTD HTML4+RDFa 1.0//EN" "http://www3.aptest.com/standards/DTD/html4-rdfa-1.dtd">
<html
xmlns:t="http://test1.org/something/"
xmlns:T="http://test2.org/something/"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<head>
<title>Test 0011</title>
</head>
<body>
<div about="">
Author: <span property="dc:creator t:apple T:banana">Albert Einstein</span>
<h2 property="dc:title">E = mc<sup>2</sup>: The Most Urgent Problem of Our Time</h2>
</div>
</body>
</html>
Notice the two namespace declarations, one for “t” and one for “T”. Both are used to provide properties for the object being described in the document: t:apple and T:banana. Parsing the document with a RDFa application that applies XML rules, treats the namespaces, “t” and “T” as two different namespaces. It has no problem with the RDFa annotation.
However, using the rdfquery JavaScript library, which treats “t” and “T” the same because of HTML case insensitivity, an exception results: Malformed CURIE: No namespace binding for T in CURIE T:banana. Stripping away the RDFa aspects, and focusing on the namespaces, you can see how browsers handle namespace case in an HTML document and in a document served up as XHTML. To make matter more interesting, check out the two pages using Opera 10, Firefox 3.5, and the latest Safari. Opera preserves the case, while both Safari and Firefox lowercase the prefix. Even within the HTML world, the browsers handle namespace case in HTML differently. However, all handle the prefixes the same, and correctly in XHTML. So does the rdfquery JavaScript library, as this test page demonstrates.
Returning to the discussion, there is some back and forth on how to handle case sensitivity issues related to HTML, with suggestions varying as widely as: tossing the RDFa in XHTML spec out and creating a new one; tossing RDFa out in favor of Microdata; creating a best practices document that details the problem and provides appropriate warnings; creating a new RDFa in HTML document (or modifying existing profile document) specifying that all conforming applications must treat prefix names as case insensitive in HTML, (possibly cross-referencing the RDFa in XHTML document, which allows case sensitive prefixes). I am not in favor of the first two options. I do favor the latter two options, though I think the best practices document should strongly recommend using lowercase prefix names, and definitely not using two prefixes that differ only by case. During the discussion, a new conforming RDFa test case was proposed that tests based on case. This has now started its own discussion.
I think the problem of case and namespace prefixes (not to mention xmlns as compared to XMLNS) is very much an edge issue, not a show stopper. However, until a solution is formalized, be aware that xmlns prefix case is handled differently in XHTML and HTML. Since all things are equal, consider using lowercase prefixes, only, when embedding RDFa (or any other namespace-based functionality). In addition, do not use XMLNS. Ever. If not for yourself, do it for the kittens.
Speaking of RDFa in HTML issues, there is now a new RDFa in HTML issues wiki page. Knock yourselves out.
updateA new version of the RDFa in HTML4 profile has been released. It addresses a some of the concerns expressed earlier, including the issue of case and XMLLiteral. Though HTML5 doesn’t support DTDs, as HTML4 does, the conformance rules should still be good for HTML5.
Under construction
I couldn’t resist the title. Just be glad I refrained from using one of the old animated “Under Construction” GIFs.
Since I’m no longer on the hook for anything related to HTML5 and RDFa, I can return to my books. Books, plural, as I hope to be starting a new book within the “traditional” publishing track, soon.
I doubt I’ll have much to say over the next few months. Just a heads up that the site may look odd or not work at times, as I try out some new stuff. No worries, it hasn’t been taken over by aliens.
Kindle clipping limits
Recovered from the Wayback Machine.
I love books on history, and have read several on my Kindle. I hope to someday write book reviews, or perhaps use quotes from the books in my future writings. Kindle facilitated this capability by providing functionality to highlight passages, add book notes, and especially, save a Kindle “page” to a clipping file.
By saving passages from the book to a text file, I can copy and paste quotes, without worry about mistyping the text. In addition, if my Kindle died, though I may not have the books, I’d at least have my notes.
My routine would be to read a book, such as A Dark Valley: A Panorama of the 1930s or Freedom from Fear: The American People in Depression and War, 1929-1945, and once finished would copy the clipping file to my computer, delete the one on the Kindle, and start fresh. However, while reading Banana: The Fate of the Fruit That Changed the World, about a third of a way through, when I went to save a page with a passage of interest to my clipping file, I received an error:
Unable to save clipping. You have
reached the clipping limit for this item.
Clipping limit? This was the first I’d heard of clipping limits.
I deleted the clipping file, but it made no difference. Per suggestions on an Amazon thread, I also deleted a metadata file associated with the book, but again, had no luck.
I tried to find information about the clipping limit in the Kindle TOS or User Guide, but nothing was covered. I also tried to find out if one can “delete” items from the existing clipping file, in order to replace with other clippings at a later time, but once the limit is reached, nothing associated with the book can be added to the clipping file, not even a highlighted sentence.
Not all books have a clipping limit, and the limit is not the same for all books. However, there is no way to find out if a book has a clipping limit, or how big it is, unless using software to ‘crack’ the DRM (Digital Rights Management) for the book.
That I’m peeved is to put it mildly, as that was one of the Kindle features I found most valuable. It was also one of the features I’ve used to sell the reading device to others. And now I’m afraid to make notes or save clippings without wondering if I won’t hit the limit. Contrary to what Amazon or the Publishers must assume, I’m not going to use the “Save as Clipping” feature to copy the entire book—I’d rather get the book from the library and photocopy each page, because it would be easier. And I can’t wait to find out what happens when several college students hit this limit with their fancy, and expensive, new large form Kindle DXes.
More importantly, Amazon does not mention this limitation with the sales material for the device, though the company does tout the “Save as clipping” capability.
Bookmarks and Annotations
By using the QWERTY keyboard, you can add annotations to text, just like you might write in the margins of a book. And because it is digital, you can edit, delete, and export your notes. Using the new 5-way controller, you can highlight and clip key passages and bookmark pages for future use.
Yet there’s nothing about clipping limits: in the documentation, or the web site. This, to me, is a deceptive business practice. Making an assumption that people will somehow “know” about the limits because of copyright laws is especially weak, because the amount you can copy seems to be arbitrary, and we readers have no way of knowing what these limits are.
Even more disappointing, the clipping limit also applies to DRM free books from Amazon, according to a MobileRead forum entry.
update I counted the clippings from “Banana…”, and discovered that the clipping limit for this book has been set to 40. That’s Kindle clippings, not book pages. Following is a typical clipping:
busy, modern family would consist of bananas sliced into corn flakes with milk. It wasn’t just the recipe that broke new ground. It was also the coupons, pioneered by the company, packed inside cereal boxes (redeemable for free bananas that the cereal companies, not the fruit importer, paid for). The company made sure that children knew about bananas, too. It set up an official “education department,” devoted to publishing textbooks and curriculum materials that subtly provided information about the fruit. United Fruit also added a new element to its political strategy. If military action was impractical (U.S. troops might be unavailable or force precluded by situations on the ground), Central America’s geography became an ally. The region’s countries were small and easy to move between. There were plenty of natural ports on both the eastern and western coasts, and bananas could be grown just about anywhere land could be cleared and a railroad could be laid. If a government became particularly balky, the company would simply threaten to go next door. But one thing United Fruit couldn’t control was nature. Not long after bananas added themselves as a third party in cereal and milk, the troubles growers were beginning to have with an aggressive malady became public. One headline in The New York Times read: “Banana Disease Ruins Plantations—No Remedy is Available—Whole Regions Have Been Laid Waste and Improvements Abandoned by
update I’ve tried the Perl tool mobi2mobi on several of the books I have, including those with an expired copyright downloaded from Amazon, one that is copyrighted and with DRM, and one that is copyrighted, without DRM.
The values I’m getting would seem to be percentages, not absolute clipping instances. So a value of 0xa, which is hex for 10, would be 10 percent, not 10 instances. Non-DRM books return a clipping limit of 0x64, which is hex for 100, which would be, if my guess is accurate, 100%. This matches our expectation for a non-DRM enabled book: that we can highlight, or clip pages up to 100% of the content.
That the value is a percentage may have been obvious to some of you, but the idea of that Amazon would enforce such an arbitrary limit, and without notice to the customers, is still new to me.
Note, also, that Amazon is attaching what seems to be a default value of 10% to books that are no longer covered by copyright, but which you can download for free from Amazon. Looks like Amazon is also attaching DRM to these books, too. My suggestion would be to get these books elsewhere, like feedbooks.com, and hope they aren’t so limited.
I asked Wolfram Alpha: what is RDF?
I would have been more impressed by Wolfram Alpha if at the end of its interpretation of my request, it asked me, “Was this answer correct? Was this answer complete? If not correct or complete, what do you consider RDF to be?”
I then asked the same question of Google.