Categories
RDF Standards XHTML/HTML

A Loose Set of Notes on RDFa, XHTML, and HTML5

There’s been a great deal of discussion about RDFa, HTML5, and microdata the last few days, on email lists and elsewhere. I wanted to write down notes of the discussions here, for future reference. Those working issues with RDFa in Drupal 7 should pay particular attention, but the material is relevant to anyone incorporating RDFa.

Shane McCarron released a proposal for RDFa in HTML4, which is based on creating a DTD that extends support for RDFa in HTML4. He does address some issues related to the differences in how certain data is handled in HTML4 and XHTML, but for the most part, his document refers processing issues to the original RDFaSyntax document.

Philip Taylor responded with some questions, specifically about how xml:lang is handled by HTML5 parsers, as compared to XML parsers. His second concern was how to handle XMLLiteral in HTML5, because the assumption is that RDFa extractors in JavaScript would be getting their data from the DOM, not processing the characters in the page.

“If the object of a triple would be an XMLLiteral, and the input to the processor is not well-formed [XML]” – I don’t understand what that means in an HTML context. Is it meant to mean something like “the bytes in the HTML file that correspond to the contents of the relevant element could be parsed as well-formed XML (modulo various namespace declaration issues)”? If so, that seems impossible to implement. The input to the RDFa processor will most likely be a DOM, possibly manipulated by the DOM APIs rather than coming straight from an HTML parser, so it may never have had a byte representation at all.

There’s a lively little sub-thread related to this one issue, but the one response I’ll focus on is Shane, who replied, RDFa does not pre-suppose a processing model in which there is a DOM. The issue of xml:lang is also still under discussion, but I want to move on to new issues.

While the discussion related to Shane’s document was ongoing, Philip released his own first look at RDFa in HTML5. Concern was immediately expressed about Philip’s copying of some of Shane’s material, in order to create a new processing rule section. The concern wasn’t because of any issue to do with copyright, but the problems that can occur when you have two sets of processing rules for the same data and the same underlying data model. No matter how careful you are, at some point the two are likely to diverge, and the underlying data model corrupted.

Rather than spend time on Philip’s specification directly at this time, I want to focus, instead, on a note he attached to the email entry providing the link to the spec proposal. In it he wrote:

There are several unresolved design issues (e.g. handling of case-sensitivity, use of xmlns:* vs other mechanisms that cause fewer problems, etc) – I haven’t intended to make any decisions on such issues, I’ve just attempted to define the behaviour with sufficient detail that it should make those issues visible.

More on case sensitivity in a moment.

Discussion started a little more slowly for Philip’s document, but is ongoing. In addition, both Philip and Manu Sporney released test suites. Philip’s is focused on highlighting problems when parsing RDFa in HTML as compared to XHTML; The one that Manu posted, created by Shane, focused on a basic set of test cases for RDFa, generally, but migrated into the RDFa in HTML4 document space.

Returning to Philip’s issue with case sensitivity, I took one of Shane’s RDFa in HTML test cases, and the rdfquery JavaScript from Philip’s test suit, and created pages demonstrating the case sensitivity issue. One such is the following:

<!DOCTYPE HTML PUBLIC "-//ApTest//DTD HTML4+RDFa 1.0//EN" "http://www3.aptest.com/standards/DTD/html4-rdfa-1.dtd">
<html
xmlns:t="http://test1.org/something/"
xmlns:T="http://test2.org/something/"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<head>
<title>Test 0011</title>
</head>
<body>
<div about="">
Author: <span property="dc:creator t:apple T:banana">Albert Einstein</span>
<h2 property="dc:title">E = mc<sup>2</sup>: The Most Urgent Problem of Our Time</h2>
</div>
</body>
</html>

Notice the two namespace declarations, one for “t” and one for “T”. Both are used to provide properties for the object being described in the document: t:apple and T:banana. Parsing the document with a RDFa application that applies XML rules, treats the namespaces, “t” and “T” as two different namespaces. It has no problem with the RDFa annotation.

However, using the rdfquery JavaScript library, which treats “t” and “T” the same because of HTML case insensitivity, an exception results: Malformed CURIE: No namespace binding for T in CURIE T:banana. Stripping away the RDFa aspects, and focusing on the namespaces, you can see how browsers handle namespace case in an HTML document and in a document served up as XHTML. To make matter more interesting, check out the two pages using Opera 10, Firefox 3.5, and the latest Safari. Opera preserves the case, while both Safari and Firefox lowercase the prefix. Even within the HTML world, the browsers handle namespace case in HTML differently. However, all handle the prefixes the same, and correctly in XHTML. So does the rdfquery JavaScript library, as this test page demonstrates.

Returning to the discussion, there is some back and forth on how to handle case sensitivity issues related to HTML, with suggestions varying as widely as: tossing the RDFa in XHTML spec out and creating a new onetossing RDFa out in favor of Microdatacreating a best practices document that details the problem and provides appropriate warnings; creating a new RDFa in HTML document (or modifying existing profile document) specifying that all conforming applications must treat prefix names as case insensitive in HTML, (possibly cross-referencing the RDFa in XHTML document, which allows case sensitive prefixes). I am not in favor of the first two options. I do favor the latter two options, though I think the best practices document should strongly recommend using lowercase prefix names, and definitely not using two prefixes that differ only by case. During the discussion, a new conforming RDFa test case was proposed that tests based on case. This has now started its own discussion.

I think the problem of case and namespace prefixes (not to mention xmlns as compared to XMLNS) is very much an edge issue, not a show stopper. However, until a solution is formalized, be aware that xmlns prefix case is handled differently in XHTML and HTML. Since all things are equal, consider using lowercase prefixes, only, when embedding RDFa (or any other namespace-based functionality). In addition, do not use XMLNS. Ever. If not for yourself, do it for the kittens.

Speaking of RDFa in HTML issues, there is now a new RDFa in HTML issues wiki page. Knock yourselves out.

updatenew version of the RDFa in HTML4 profile has been released. It addresses a some of the concerns expressed earlier, including the issue of case and XMLLiteral. Though HTML5 doesn’t support DTDs, as HTML4 does, the conformance rules should still be good for HTML5.

Categories
Burningbird

Under construction

I couldn’t resist the title. Just be glad I refrained from using one of the old animated “Under Construction” GIFs.

Since I’m no longer on the hook for anything related to HTML5 and RDFa, I can return to my books. Books, plural, as I hope to be starting a new book within the “traditional” publishing track, soon.

I doubt I’ll have much to say over the next few months. Just a heads up that the site may look odd or not work at times, as I try out some new stuff. No worries, it hasn’t been taken over by aliens.

Categories
Stuff

Kindle clipping limits

Recovered from the Wayback Machine.

I love books on history, and have read several on my Kindle. I hope to someday write book reviews, or perhaps use quotes from the books in my future writings. Kindle facilitated this capability by providing functionality to highlight passages, add book notes, and especially, save a Kindle “page” to a clipping file.

By saving passages from the book to a text file, I can copy and paste quotes, without worry about mistyping the text. In addition, if my Kindle died, though I may not have the books, I’d at least have my notes.

My routine would be to read a book, such as A Dark Valley: A Panorama of the 1930s or Freedom from Fear: The American People in Depression and War, 1929-1945, and once finished would copy the clipping file to my computer, delete the one on the Kindle, and start fresh. However, while reading Banana: The Fate of the Fruit That Changed the World, about a third of a way through, when I went to save a page with a passage of interest to my clipping file, I received an error:

Unable to save clipping. You have
reached the clipping limit for this item.

Clipping limit? This was the first I’d heard of clipping limits.

I deleted the clipping file, but it made no difference. Per suggestions on an Amazon thread, I also deleted a metadata file associated with the book, but again, had no luck.

I tried to find information about the clipping limit in the Kindle TOS or User Guide, but nothing was covered. I also tried to find out if one can “delete” items from the existing clipping file, in order to replace with other clippings at a later time, but once the limit is reached, nothing associated with the book can be added to the clipping file, not even a highlighted sentence.

Not all books have a clipping limit, and the limit is not the same for all books. However, there is no way to find out if a book has a clipping limit, or how big it is, unless using software to ‘crack’ the DRM (Digital Rights Management) for the book.

That I’m peeved is to put it mildly, as that was one of the Kindle features I found most valuable. It was also one of the features I’ve used to sell the reading device to others. And now I’m afraid to make notes or save clippings without wondering if I won’t hit the limit. Contrary to what Amazon or the Publishers must assume, I’m not going to use the “Save as Clipping” feature to copy the entire book—I’d rather get the book from the library and photocopy each page, because it would be easier. And I can’t wait to find out what happens when several college students hit this limit with their fancy, and expensive, new large form Kindle DXes.

More importantly, Amazon does not mention this limitation with the sales material for the device, though the company does tout the “Save as clipping” capability.

Bookmarks and Annotations

By using the QWERTY keyboard, you can add annotations to text, just like you might write in the margins of a book. And because it is digital, you can edit, delete, and export your notes. Using the new 5-way controller, you can highlight and clip key passages and bookmark pages for future use.

Yet there’s nothing about clipping limits: in the documentation, or the web site. This, to me, is a deceptive business practice. Making an assumption that people will somehow “know” about the limits because of copyright laws is especially weak, because the amount you can copy seems to be arbitrary, and we readers have no way of knowing what these limits are.

Even more disappointing, the clipping limit also applies to DRM free books from Amazon, according to a MobileRead forum entry.

update I counted the clippings from “Banana…”, and discovered that the clipping limit for this book has been set to 40. That’s Kindle clippings, not book pages. Following is a typical clipping:

busy, modern family would consist of bananas sliced into corn flakes with milk. It wasn’t just the recipe that broke new ground. It was also the coupons, pioneered by the company, packed inside cereal boxes (redeemable for free bananas that the cereal companies, not the fruit importer, paid for). The company made sure that children knew about bananas, too. It set up an official “education department,” devoted to publishing textbooks and curriculum materials that subtly provided information about the fruit. United Fruit also added a new element to its political strategy. If military action was impractical (U.S. troops might be unavailable or force precluded by situations on the ground), Central America’s geography became an ally. The region’s countries were small and easy to move between. There were plenty of natural ports on both the eastern and western coasts, and bananas could be grown just about anywhere land could be cleared and a railroad could be laid. If a government became particularly balky, the company would simply threaten to go next door. But one thing United Fruit couldn’t control was nature. Not long after bananas added themselves as a third party in cereal and milk, the troubles growers were beginning to have with an aggressive malady became public. One headline in The New York Times read: “Banana Disease Ruins Plantations—No Remedy is Available—Whole Regions Have Been Laid Waste and Improvements Abandoned by

update I’ve tried the Perl tool mobi2mobi on several of the books I have, including those with an expired copyright downloaded from Amazon, one that is copyrighted and with DRM, and one that is copyrighted, without DRM.

The values I’m getting would seem to be percentages, not absolute clipping instances. So a value of 0xa, which is hex for 10, would be 10 percent, not 10 instances. Non-DRM books return a clipping limit of 0x64, which is hex for 100, which would be, if my guess is accurate, 100%. This matches our expectation for a non-DRM enabled book: that we can highlight, or clip pages up to 100% of the content.

That the value is a percentage may have been obvious to some of you, but the idea of that Amazon would enforce such an arbitrary limit, and without notice to the customers, is still new to me.

Note, also, that Amazon is attaching what seems to be a default value of 10% to books that are no longer covered by copyright, but which you can download for free from Amazon. Looks like Amazon is also attaching DRM to these books, too. My suggestion would be to get these books elsewhere, like feedbooks.com, and hope they aren’t so limited.

Categories
RDF Semantics

Wolfram Alpha: What is RDF?

I asked Wolfram Alpha: what is RDF?

results asking Wolfram Alpha what is rdf

I would have been more impressed by Wolfram Alpha if at the end of its interpretation of my request, it asked me, “Was this answer correct? Was this answer complete? If not correct or complete, what do you consider RDF to be?”

I then asked the same question of Google.

Google: what is RDF

Categories
Social Media

When social media closes the door

I have work to do, trying to pull a lot of pieces together into some semblance of a balanced and comprehensive document on HTML5, RDFa, Microdata, et al, but first, I need decompression time from an excess of social media this last week. I don’t know how all of you can manage the various weblog/mailing list/IRC/Twitter et al lives. I personally feel as if my brain has been ripped out through my eyeballs by sadistic chipmunks.

I have been waiting to see if other metadata use cases would be discussed in the WhatWG mailing list before writing any more reviews, points, or counter-points. Since the HTML5 editor, Ian Hickson, seems to have moved on to new things, I think we can assume whatever remaining use cases will either get folded into some other effort, or will be just forgotten.

In addition, I’ve also been playing with the new HTML5 Microdata proposal, too, though the underlying processing rules for generating RDF triples has been changing. Again, though, since Ian has moved on to adding vCard, and vEvent, and various other “microdata formats” to the HTML5 spec, we can assume that the RDF aspect of the document is stable. For the moment.

In the meantime, Google has rolled out use of RDFa, and though this act does not make the earth quake, it does make things in the semantic metadata world more interesting. Yes, even if Google used its own vocabulary. The Google announcement was followed soon after by a new document by Shane McCarron of the RDFa-in-XHTML working group, that provides an approach to using RDFa and HTML4 together.

There was a flurry of noise about the Google announcement everywhere, which was to be expected. Shane’s proposal also came under review, though without the Google numbers. There was some discussion on the HTML WG mailing list, the RDFa Public mailing list, and the RDFa-in-XHTML mailing list on the new proposal, but none on the WhatWG mailing list. However, a new objection arose to RDFa and RDF in general arose on the WhatWG list: link rot and its impact on RDFa, which also spread to the RDFa-in-XHTML list.

Now, I’ll be frank in that this one just didn’t hit me as a critical concern. Even after the discussion on the WhatWG mailing list, I still think that concerns about link rot are a weak objection to RDF/RDFa. After all, isn’t RDF older than some of the WhatWG members? Regardless, it’s been around long enough to know that if we were going to have problems with link rot, they would have surfaced and hit us in the face by now. But any weakness, perceived or otherwise, seems to generate a great deal of animated discussion in the WhatWG group mailing list.

There’s also a new twist on this discussion, for me at least, in that I also read the archives for the WhatWG IRC, as the discussion was taking place. You can sometimes get a lot more insight into the collective mind of the WhatWG group reading the IRC archives than you can the mailing list. My concern was that this new objection to RDFa would be pounced on by WhatWG members, and sure, enough, after both Manu Sporney and Dan Brickley provided extremely reasonable answers how link rot, if it occurred, could be fixed, the following popped up on the IRC:

Philip: gets an impression from the “Link rot is not dangerous” topic that namespace URIs are quite a fragile foundation
Philip: so they suggest building other structures on top of that, like caching and redirecting and hardcoding override lists and reminding people not to accidentally let their domains expire and making local subclasses
hsivonen: Philip: it seems to be that believing in Follow your Nose and believing in Link Rot not being dangerous are contradictory beliefs but you can pick either one and argue coherently
Philip: and I suppose it makes me wonder instead whether it’d be a good reason to not use that foundation at all
Philip: (though I don’t know what other foundations would be better)

To me, the general drift of this thread leads me back to my, yes stubbornly held belief, that “RDF/RDFa does not have to justify itself”. In other words, rather than question what is, or is not, in the HTML5 specification—a valid topic for the WhatWG—we get sidetracked into having to defend RDFa and, ultimately, RDF. I’m just not going to go there, because RDF is, and it ain’t going away, and this is true regardless of what happens with HTML5. So why are we talking about these things in the WhatWG mailing list?

jumped back into the WhatWG email list thread after reading the IRC thread, hoping to cut the hombres off at the pass, but it was too late: the more we defended, the more weight was given to this “new” problem with RDF (which is humorous, if you think on it, because the HTML5 Microdata proposal makes use of the same RDF URIs).

Following the mailing list entries (which I received whether I wanted to continue or not as I was now cc’d directly in all responses) in addition to the IRC entries, is like experiencing double vision, except in the one email list thread, all is sweetness and light, and the other IRC list, anything but. The problem with IRC, and the reason I detest it so much, is that people write first and possibly think about it later. There is little “uh oh, this is public” filtering going on. There’s also a group-think mentality that can develop in IRC channels, especially those that attract people with very similar viewpoints. The WhatWG IRC entries demonstrate evidence of group think, in that there seems to be a shared, expressed disdain several of the WhatWG members have for many of us (generally and specifically)—which makes the later, polite chit chat particularly unwelcome.

Yes, following along with the WhatWG IRC is that much more pleasant when you suddenly find yourself the subject of current discussion, as our old friend Last Week in HTML5 has noted several times in the past, and about me yesterday. Of course, MLW’s story title was also unpleasant to read: no working group for middle aged women. There was something about that title, following on the IRC comments, that left me with a feeling I’d rather go for a root canal than deal directly with with the WhatWG again.

This little saga wasn’t restricted to just IRC, mailing lists, and weblogs, it’s also hit Twitter, too. Did you expect otherwise? But my adventures in social media this last week didn’t end there: I also attempted to attend an HTML WG meeting last Thursday using Skype and IRC, but didn’t know the procedure one follows as regards to making request via IRC in order to speak during the teleconference. The technology also ended up being wonky for me and the only time I knew I was heard was when someone asked, “Who said, ‘Oh, this is ridiculous’?”

Didn’t matter anyway, because Ian Hickson, the sole and only HTML5 editor, does not attend the HTML WG teleconferences. I gather most of these meetings end up with the attendees playing a game of “What did Ian mean?” Evidently, from what others have said, Ian has stated that he finds these meetings to be a waste of his time. Of course, that’s only hearsay. Probably from Twitter.

The experiences this week just demonstrate that all of the whizzy technology doesn’t a bit of good, if you have groups of people interacting who don’t respect each other. To me, it is apparent that several WhatWG members don’t respect the RDFa folks, as they’ve continued on today, in IRC of course, dismissing Shane’s hard work with barely a glance. Not all of the folks. Both Henri, and Philip are pretty good about saying whatever they say on the IRC directly to you, in comments, email or mailing list (though my impression from both is that they don’t have a high opinion of RDF/RDFa, either). Others, however, are neither that direct, nor that helpful in their commentary.

I’m not going to pretend that the feeling isn’t mutual. After all, I wrote the first “offending” Twitter message. And I’ve been critical of HTML5, and WhatWG process (and members) here and elsewhere. Frankly, I don’t regret any of it, and if that puts into the category of “doesn’t play well with other children”, I’d rather be there than among those who are polite when communicating with you directly, and rip you a new one when your back is turned.

Luckily, I don’t officially represent the RDF or RDFa communities, and I can freely express my opinions, here and elsewhere. I know that Dan and Manu and others still want to work with the WhatWG folks, and more power to them. But I’ve since unsubscribed from the WhatWG email list, though I hesitate to stop reading the IRC, as this is about the only place where you can really see what’s happening with the HTML5 effort.

I’m also going to cut drastically back on all of this social media and do my thing in my space, because by the end of the week, all I had to show for all of the frantic activity, this networked communication with my fellow seekers of specification truth, this bright and shiny new way of togetherness, was bits of writing littered about all over the place—both by me, and about me—and a really bad mood.