Categories
RDF

RDFa in Drupal core

While I’m in the process of looking more closely at the Microdata proposal, I wanted to note that today marked the end of the first day of the code sprint for incorporating RDFa into the core of Drupal 7.

Yes, when Drupal 7 hits the streets, 1.7 million Drupal web sites, and counting, will have built-in support for RDFa.

Categories
Web

Cite not link

I do have considerable sympathy for 1Thomas Crampton, when he discovered that all of his stories at the International Herald Tribune have been pulled from the web because of a merger with the New York Times.

So, what did the NY Times do to merge these sites?

They killed the IHT and erased the archives.

1- Every one of the links ever made to IHT stories now points back to the generic NY Times global front page.

2- Even when I go to the NY Times global page, I cannot find my articles. In other words, my entire journalistic career at the IHT – from war zones to SARS wards – has been erased.

At the same, though, I don’t have as much sympathy for Wikipedia losing its links to the same stories, as detailed by 2Crampton in a second posting.

The issue: Wikipedia – one of the highest traffic websites on the Internet – makes reference to a large number of IHT stories, but those links are now all dead. They need to delete them all and find new references or use another solution.

As I wrote in comments at Teleread:

I do have sympathy, I know I would be frustrated if my stories disappeared from the web, but at the same time, there is a certain level of karma about all of this.

How many times have webloggers chortled at the closure of another newspaper? How many times have webloggers gloated about how we will win over Big Media?

The thing is, when Big Media is gone, who will we quote? Who will we link? Where will the underlying credibility for our stories be found?

Isn’t this exactly what webloggers have wanted all along?

Isn’t this what webloggers have wanted, all along?

I have sympathy for a writer losing his work, though I assume he kept copies of his writings. If they can’t be found in hard copies of the newspaper, then I’m assuming the paper is releasing its copyright on the items, and that Mr. Crampton will be able to re-publish these on his own. That’s the agreement I have with O’Reilly: when it no longer actively publishes one of my works, the copyright is returned to me. In addition, with some of the books, we have a mutual agreement that when the book is no longer published, the work will be released to the public domain.

I don’t have sympathy for Wikipedia, though, because the way many citations are made at the site don’t follow Wikipedia’s citation policy. Links are a lazy form of citation. The relevant passage in the publication should be quoted in the Wikipedia article, matched with a citation listing the publication, author, title of the work, and publication—not a quick link to an external site over which Wikipedia has no control.

I’m currently fixing one of my stories, Tyson Valley, a Lone Elk, and the Bomb because the original material was moved, without redirection. But as I fix the article, what I’m doing is making copies of all of the material, for my own reference. Saving the web page is no different than making a photocopy of an article in the days before the web.

In addition, I will be adding a formal citation for the source, as well as the link, so if the article moves again, whoever reads my story will know how to search for the article’s new location. At a minimum, they’ll know where the article was originally found.

I’m also repackaging the public domain writing and images for serving at my site, again with a text citation expressing appreciations to the site that originally published the images.

By using this approach, the stories I consider “timeless”, in whatever context that word means in this ephemeral environment, would not require my constant intervention.

Authors posting to Wikipedia should be doing the same, and this policy should be enforced: provide a direct quote of relevant material (allowed under Fair Use), and provide a formal citation, in addition to the link. Or perhaps, instead of the link. Because when the newspapers disappear, they’ll have no reason to keep the old archives. No reason at all. And then, where will Wikipedia be?

1Crampton, Thomas, “Reporter to NY Times Publisher: You Erased My Career”, thomascrampton.com. May 8, 2009.
2Crampton, Thomas, “Wikipedia Grappling with Deletion of IHT.com”, thomascrampton. May 8, 2009.

Categories
HTML5 RDF

Holding on effort for HTML5

I have discontinued my efforts to re-examine Ian Hickson’s semantic microdata use cases, as Ian has just published another use case, and added a microdata section to the HTML5 specification. (update see at end of writing)

Announcement at WhatWGNew section in HTML5 draft.

First glance:

I am not an expert at RDFa, so read what I write accordingly. In my opinion, though, I do not find this customized microdata section in the HTML5 to be compatible with RDFa. Yes, one can extract RDF out of the text, but one can’t use an RDFa extractor to extract RDF out of the page. This means that people will have to use one syntax when incorporating RDFa into XHTML1.1, and XHTML2.0, and another for HTML5.

More importantly, where now we can use RDFa in HTML5, though not “validly”, with this change in the HTML5 spec, this will no longer be possible. One specific issue is with the “property” attribute.

The property attribute is defined as follows in the new HTML5 section:

The property attribute, if specified, must have a value that is an unordered set of
unique space-separated tokens representing the names of the name-value pairs that it adds. 
The attribute's value must have at least one token.

Each token must be either:

    * A valid URL that is an absolute URL, or
    * A valid reversed DNS identifier, or
    * If its corresponding item's item attribute has no tokens: a string containing neither a U+003A COLON character (:) nor a U+002E FULL STOP character (.), or

It does seem like the last item describing valid values was cut off, and a bit garbled, so I can’t make any interpretation based on it.

Now compare this with the description provided in the RDFa specification:

@property
    a whitespace separated list of CURIEs, used for expressing relationships
 between a subject and some literal text (also a 'predicate');

A CURIE is of the form:

curie       :=   [ [ prefix ] ':' ] reference

prefix      :=   NCName

reference   :=   irelative-ref (as defined in [IRI])

Which is really tech gobbledy for a value such as “dc:title”, where “dc” is an an abbreviation for the vocabulary namespace, in this case the Dublin Core namespace of “http://purl.org/dc/elements/1.1/”.

According to the HTML5 spec, the use of the CURIE is invalid in HTML5. Depending upon how parsers handle invalid attribute values, the use of the CURIE could actually generate an HTML error, because the HTML5 specification requires that the value either be a full URI, or reverse DNS identifier, such as com.java.somevalue.

In addition, we can’t use RDFa parsers on the HTML5 markup, because the RDFa specification specifically states that property attribute values must be in the form of CURIEs, and anything else is ignored, as Ian notes in the WhatWG announcement of his customized microdata format handling:

An alternative is to go back to the non-URI class names we had above. This doesn’t break compatibility with the RDFa processors, because when there is no colon in the property=”” or rel=”” attributes, the RDFa processors just ignore the values (this is the “no prefix” mapping of CURIEs).

According to the RDFa syntax specification, RDFa does not define a ‘no prefix’ mapping, meaning that this form of CURIE is not supported. If the value is ignored, than whether it would break RDFa parsers is moot, because what value is there to running such a parser against a page that would return no data?

Ian’s philosophy on the use of CURIEs is that these are too hard for people to understand. However, I think that people would have an easier time with them, then they would a reversed DNS identifier, which is a pure code construct made popular in certain programming languages.

There are other what I consider to be significant dissimilarities between the HTML5 proposal, and RDFa, but I’ll hold on these for now. I’d like to see what the RDFa community has to say on the new specification addition before I go further in examining the HTML5 proposal.

Ian has provided some code which seemingly extracts RDF triples out of his own customized microdata format. Will the format hold up under rigorous testing with other test cases that have had successful results in the RDFa space? I don’t know. Regardless, I do know that all of the technology that has been adapted for use with RDFa will not work with the HTML5 microdata, and whatever works with HTML5, will not work with RDFa.

I’ve heard from someone in the RDF space who thinks the HTML5 specification is “close” to RDFa, and only needs a few tweaks. I lack the experience, or perhaps the foresight, to see the same degree of similarity. And this leads me to question whether I should continue with my own re-visiting of the use cases.

I had said once before that I am willing to put the work in, if I felt it would add value to the effort. I hope I will be forgiven for believing that my work won’t impact on the direction the HTML5 editor takes. This wouldn’t be an issue, so much, if I also felt my efforts were of value to the RDFa community. I have not received any indication from the RDFa community that they see my continuing efforts as beneficial. So, I’m not necessarily completely discontinuing my effort, but I am putting it on hold, until I ascertain whether my efforts are beneficial or not.

Update

I just sent the following to the HTML comment list, and the What WG list:

Sorry for the double emails today.

I will continue with revisiting the use cases for the microdata section. One additional component I’ll add to the use cases is applying my interpretation of how RDFa might handle the use case, as compared to how it could be handled with Ian’s new HTML5 microdata proposal. This will, of course, slow me down a bit.

Note, though, that I don’t claim to be an expert on either RDFa or Ian’s new microdata proposal. My hope is that if I make a mistake, or I’m not clear, folks will respond to my writing with corrections and/or additions. The purpose behind my effort is to open discussion. I will admit, though, that I do have a bias for RDFa, primarily because this is something that’s real, today, and that I can use, today.