Under construction

I couldn’t resist the title. Just be glad I refrained from using one of the old animated “Under Construction” GIFs.

Since I’m no longer on the hook for anything related to HTML5 and RDFa, I can return to my books. Books, plural, as I hope to be starting a new book within the “traditional” publishing track, soon.

I doubt I’ll have much to say over the next few months. Just a heads up that the site may look odd or not work at times, as I try out some new stuff. No worries, it hasn’t been taken over by aliens.


Kindle clipping limits

Recovered from the Wayback Machine.

I love books on history, and have read several on my Kindle. I hope to someday write book reviews, or perhaps use quotes from the books in my future writings. Kindle facilitated this capability by providing functionality to highlight passages, add book notes, and especially, save a Kindle “page” to a clipping file.

By saving passages from the book to a text file, I can copy and paste quotes, without worry about mistyping the text. In addition, if my Kindle died, though I may not have the books, I’d at least have my notes.

My routine would be to read a book, such as A Dark Valley: A Panorama of the 1930s or Freedom from Fear: The American People in Depression and War, 1929-1945, and once finished would copy the clipping file to my computer, delete the one on the Kindle, and start fresh. However, while reading Banana: The Fate of the Fruit That Changed the World, about a third of a way through, when I went to save a page with a passage of interest to my clipping file, I received an error:

Unable to save clipping. You have
reached the clipping limit for this item.

Clipping limit? This was the first I’d heard of clipping limits.

I deleted the clipping file, but it made no difference. Per suggestions on an Amazon thread, I also deleted a metadata file associated with the book, but again, had no luck.

I tried to find information about the clipping limit in the Kindle TOS or User Guide, but nothing was covered. I also tried to find out if one can “delete” items from the existing clipping file, in order to replace with other clippings at a later time, but once the limit is reached, nothing associated with the book can be added to the clipping file, not even a highlighted sentence.

Not all books have a clipping limit, and the limit is not the same for all books. However, there is no way to find out if a book has a clipping limit, or how big it is, unless using software to ‘crack’ the DRM (Digital Rights Management) for the book.

That I’m peeved is to put it mildly, as that was one of the Kindle features I found most valuable. It was also one of the features I’ve used to sell the reading device to others. And now I’m afraid to make notes or save clippings without wondering if I won’t hit the limit. Contrary to what Amazon or the Publishers must assume, I’m not going to use the “Save as Clipping” feature to copy the entire book—I’d rather get the book from the library and photocopy each page, because it would be easier. And I can’t wait to find out what happens when several college students hit this limit with their fancy, and expensive, new large form Kindle DXes.

More importantly, Amazon does not mention this limitation with the sales material for the device, though the company does tout the “Save as clipping” capability.

Bookmarks and Annotations

By using the QWERTY keyboard, you can add annotations to text, just like you might write in the margins of a book. And because it is digital, you can edit, delete, and export your notes. Using the new 5-way controller, you can highlight and clip key passages and bookmark pages for future use.

Yet there’s nothing about clipping limits: in the documentation, or the web site. This, to me, is a deceptive business practice. Making an assumption that people will somehow “know” about the limits because of copyright laws is especially weak, because the amount you can copy seems to be arbitrary, and we readers have no way of knowing what these limits are.

Even more disappointing, the clipping limit also applies to DRM free books from Amazon, according to a MobileRead forum entry.

update I counted the clippings from “Banana…”, and discovered that the clipping limit for this book has been set to 40. That’s Kindle clippings, not book pages. Following is a typical clipping:

busy, modern family would consist of bananas sliced into corn flakes with milk. It wasn’t just the recipe that broke new ground. It was also the coupons, pioneered by the company, packed inside cereal boxes (redeemable for free bananas that the cereal companies, not the fruit importer, paid for). The company made sure that children knew about bananas, too. It set up an official “education department,” devoted to publishing textbooks and curriculum materials that subtly provided information about the fruit. United Fruit also added a new element to its political strategy. If military action was impractical (U.S. troops might be unavailable or force precluded by situations on the ground), Central America’s geography became an ally. The region’s countries were small and easy to move between. There were plenty of natural ports on both the eastern and western coasts, and bananas could be grown just about anywhere land could be cleared and a railroad could be laid. If a government became particularly balky, the company would simply threaten to go next door. But one thing United Fruit couldn’t control was nature. Not long after bananas added themselves as a third party in cereal and milk, the troubles growers were beginning to have with an aggressive malady became public. One headline in The New York Times read: “Banana Disease Ruins Plantations—No Remedy is Available—Whole Regions Have Been Laid Waste and Improvements Abandoned by

update I’ve tried the Perl tool mobi2mobi on several of the books I have, including those with an expired copyright downloaded from Amazon, one that is copyrighted and with DRM, and one that is copyrighted, without DRM.

The values I’m getting would seem to be percentages, not absolute clipping instances. So a value of 0xa, which is hex for 10, would be 10 percent, not 10 instances. Non-DRM books return a clipping limit of 0x64, which is hex for 100, which would be, if my guess is accurate, 100%. This matches our expectation for a non-DRM enabled book: that we can highlight, or clip pages up to 100% of the content.

That the value is a percentage may have been obvious to some of you, but the idea of that Amazon would enforce such an arbitrary limit, and without notice to the customers, is still new to me.

Note, also, that Amazon is attaching what seems to be a default value of 10% to books that are no longer covered by copyright, but which you can download for free from Amazon. Looks like Amazon is also attaching DRM to these books, too. My suggestion would be to get these books elsewhere, like, and hope they aren’t so limited.

RDF Semantics

Wolfram Alpha: What is RDF?

I asked Wolfram Alpha: what is RDF?

results asking Wolfram Alpha what is rdf

I would have been more impressed by Wolfram Alpha if at the end of its interpretation of my request, it asked me, “Was this answer correct? Was this answer complete? If not correct or complete, what do you consider RDF to be?”

I then asked the same question of Google.

Google: what is RDF

Social Media

When social media closes the door

I have work to do, trying to pull a lot of pieces together into some semblance of a balanced and comprehensive document on HTML5, RDFa, Microdata, et al, but first, I need decompression time from an excess of social media this last week. I don’t know how all of you can manage the various weblog/mailing list/IRC/Twitter et al lives. I personally feel as if my brain has been ripped out through my eyeballs by sadistic chipmunks.

I have been waiting to see if other metadata use cases would be discussed in the WhatWG mailing list before writing any more reviews, points, or counter-points. Since the HTML5 editor, Ian Hickson, seems to have moved on to new things, I think we can assume whatever remaining use cases will either get folded into some other effort, or will be just forgotten.

In addition, I’ve also been playing with the new HTML5 Microdata proposal, too, though the underlying processing rules for generating RDF triples has been changing. Again, though, since Ian has moved on to adding vCard, and vEvent, and various other “microdata formats” to the HTML5 spec, we can assume that the RDF aspect of the document is stable. For the moment.

In the meantime, Google has rolled out use of RDFa, and though this act does not make the earth quake, it does make things in the semantic metadata world more interesting. Yes, even if Google used its own vocabulary. The Google announcement was followed soon after by a new document by Shane McCarron of the RDFa-in-XHTML working group, that provides an approach to using RDFa and HTML4 together.

There was a flurry of noise about the Google announcement everywhere, which was to be expected. Shane’s proposal also came under review, though without the Google numbers. There was some discussion on the HTML WG mailing list, the RDFa Public mailing list, and the RDFa-in-XHTML mailing list on the new proposal, but none on the WhatWG mailing list. However, a new objection arose to RDFa and RDF in general arose on the WhatWG list: link rot and its impact on RDFa, which also spread to the RDFa-in-XHTML list.

Now, I’ll be frank in that this one just didn’t hit me as a critical concern. Even after the discussion on the WhatWG mailing list, I still think that concerns about link rot are a weak objection to RDF/RDFa. After all, isn’t RDF older than some of the WhatWG members? Regardless, it’s been around long enough to know that if we were going to have problems with link rot, they would have surfaced and hit us in the face by now. But any weakness, perceived or otherwise, seems to generate a great deal of animated discussion in the WhatWG group mailing list.

There’s also a new twist on this discussion, for me at least, in that I also read the archives for the WhatWG IRC, as the discussion was taking place. You can sometimes get a lot more insight into the collective mind of the WhatWG group reading the IRC archives than you can the mailing list. My concern was that this new objection to RDFa would be pounced on by WhatWG members, and sure, enough, after both Manu Sporney and Dan Brickley provided extremely reasonable answers how link rot, if it occurred, could be fixed, the following popped up on the IRC:

Philip: gets an impression from the “Link rot is not dangerous” topic that namespace URIs are quite a fragile foundation
Philip: so they suggest building other structures on top of that, like caching and redirecting and hardcoding override lists and reminding people not to accidentally let their domains expire and making local subclasses
hsivonen: Philip: it seems to be that believing in Follow your Nose and believing in Link Rot not being dangerous are contradictory beliefs but you can pick either one and argue coherently
Philip: and I suppose it makes me wonder instead whether it’d be a good reason to not use that foundation at all
Philip: (though I don’t know what other foundations would be better)

To me, the general drift of this thread leads me back to my, yes stubbornly held belief, that “RDF/RDFa does not have to justify itself”. In other words, rather than question what is, or is not, in the HTML5 specification—a valid topic for the WhatWG—we get sidetracked into having to defend RDFa and, ultimately, RDF. I’m just not going to go there, because RDF is, and it ain’t going away, and this is true regardless of what happens with HTML5. So why are we talking about these things in the WhatWG mailing list?

jumped back into the WhatWG email list thread after reading the IRC thread, hoping to cut the hombres off at the pass, but it was too late: the more we defended, the more weight was given to this “new” problem with RDF (which is humorous, if you think on it, because the HTML5 Microdata proposal makes use of the same RDF URIs).

Following the mailing list entries (which I received whether I wanted to continue or not as I was now cc’d directly in all responses) in addition to the IRC entries, is like experiencing double vision, except in the one email list thread, all is sweetness and light, and the other IRC list, anything but. The problem with IRC, and the reason I detest it so much, is that people write first and possibly think about it later. There is little “uh oh, this is public” filtering going on. There’s also a group-think mentality that can develop in IRC channels, especially those that attract people with very similar viewpoints. The WhatWG IRC entries demonstrate evidence of group think, in that there seems to be a shared, expressed disdain several of the WhatWG members have for many of us (generally and specifically)—which makes the later, polite chit chat particularly unwelcome.

Yes, following along with the WhatWG IRC is that much more pleasant when you suddenly find yourself the subject of current discussion, as our old friend Last Week in HTML5 has noted several times in the past, and about me yesterday. Of course, MLW’s story title was also unpleasant to read: no working group for middle aged women. There was something about that title, following on the IRC comments, that left me with a feeling I’d rather go for a root canal than deal directly with with the WhatWG again.

This little saga wasn’t restricted to just IRC, mailing lists, and weblogs, it’s also hit Twitter, too. Did you expect otherwise? But my adventures in social media this last week didn’t end there: I also attempted to attend an HTML WG meeting last Thursday using Skype and IRC, but didn’t know the procedure one follows as regards to making request via IRC in order to speak during the teleconference. The technology also ended up being wonky for me and the only time I knew I was heard was when someone asked, “Who said, ‘Oh, this is ridiculous’?”

Didn’t matter anyway, because Ian Hickson, the sole and only HTML5 editor, does not attend the HTML WG teleconferences. I gather most of these meetings end up with the attendees playing a game of “What did Ian mean?” Evidently, from what others have said, Ian has stated that he finds these meetings to be a waste of his time. Of course, that’s only hearsay. Probably from Twitter.

The experiences this week just demonstrate that all of the whizzy technology doesn’t a bit of good, if you have groups of people interacting who don’t respect each other. To me, it is apparent that several WhatWG members don’t respect the RDFa folks, as they’ve continued on today, in IRC of course, dismissing Shane’s hard work with barely a glance. Not all of the folks. Both Henri, and Philip are pretty good about saying whatever they say on the IRC directly to you, in comments, email or mailing list (though my impression from both is that they don’t have a high opinion of RDF/RDFa, either). Others, however, are neither that direct, nor that helpful in their commentary.

I’m not going to pretend that the feeling isn’t mutual. After all, I wrote the first “offending” Twitter message. And I’ve been critical of HTML5, and WhatWG process (and members) here and elsewhere. Frankly, I don’t regret any of it, and if that puts into the category of “doesn’t play well with other children”, I’d rather be there than among those who are polite when communicating with you directly, and rip you a new one when your back is turned.

Luckily, I don’t officially represent the RDF or RDFa communities, and I can freely express my opinions, here and elsewhere. I know that Dan and Manu and others still want to work with the WhatWG folks, and more power to them. But I’ve since unsubscribed from the WhatWG email list, though I hesitate to stop reading the IRC, as this is about the only place where you can really see what’s happening with the HTML5 effort.

I’m also going to cut drastically back on all of this social media and do my thing in my space, because by the end of the week, all I had to show for all of the frantic activity, this networked communication with my fellow seekers of specification truth, this bright and shiny new way of togetherness, was bits of writing littered about all over the place—both by me, and about me—and a really bad mood.


Google searchology-Rich Snippets

Google is currently having a live presentation on changes the company is making to search. The changes are quite significant, and very impressive.

The one that caught my attention, though, is rich snippets. Google will now read and incorporate two open standards, microformats and RDFa, in its search results.

So is annotating your page with microformats and RDFa worth while now? Hell, yes! Not only is Yahoo incorporating microformats and RDFa into SearchMonkey, but now, so is Google, and as part of the general Google search functionality.

More from Danny Sullivan.