Categories
Semantics

Google searchology-Rich Snippets

Google is currently having a live presentation on changes the company is making to search. The changes are quite significant, and very impressive.

The one that caught my attention, though, is rich snippets. Google will now read and incorporate two open standards, microformats and RDFa, in its search results.

So is annotating your page with microformats and RDFa worth while now? Hell, yes! Not only is Yahoo incorporating microformats and RDFa into SearchMonkey, but now, so is Google, and as part of the general Google search functionality.

More from Danny Sullivan.

Categories
HTML5 W3C

Joining the HTML5 Working Group

I should be working on my book, if I don’t want my pitiful little reserve to be sucked dry before I’m finished. At the same time, though, I feel engaged with the discussion about “microdata” et al in relation to the HTML5 working group. And I figure the writing I’m doing providing new use cases and examining the differences between the HTML5 editor’s proposal and RDFa, can be useful to my book. There’s some other stuff happening at the HTML WG related to accessibility I’m also interested in, and I’m keeping a watchful eye on SVG/HTML5.

Sam Ruby has suggested I join the HTML Working group, as an Invited Expert. It doesn’t cost anything, though I am concerned about the time commitment. I’m not a joiner, per se, but I do have strong opinions about certain aspects of the specification. Now if only some big company that isn’t teetering on the edge or ruin would hire me to be their standards wonk.

What? No takers? Afraid of being singed by the Bird?

Wusses.

Anyway, I’ll put in my request to the HTML WG and we’ll see if I’m acceptable to the powers-that-be.

Categories
RDF Semantics

Use cases and comparison of RDFa/HTML5 Microdata

I’m now a member of the HTML Working Group at the W3C, as an invited expert. I was rather surprised at how fast the membership was accepted. Surprised and faintly alarmed. I imagined existing members sitting around in the dark, rubbing their hands together and murmuring, “Ahh. Fresh meat.”

I’ve been working with Philip in comments trying to compare RDF triples from RDFa and RDF tripes from Ian’s Microdata proposal, but this type of effort deserves a more in-depth test. I still have use cases to deliver, but the ones I’ve uploaded to the HTML WG don’t seem to be generating any discussion. Instead, I’m going to do one more document, with select use cases, hopefully ones Ian’s already covered so I can compare the RDFa approach (which were typically provided with the use cases— all the use cases were provided by RDFa folks, from what I can see), and the Microdata proposal approach.

In the meantime, Ian has renamed @property to @itemprop because of the concerns we raised. This insures that there is no overlap on terminology between RDFa and Ian’s Microdata proposal. There is a still a requirement, though, that the Microdata proposal be capable of generating the same RDF as RDFa, and that will be the next set of tests.

I’m open for suggestions as to the use cases to single out for testing. And I promise to be fair in my effort. After all, I’m a member of the W3C HTML WG now—I have a responsibility to be both objective and fair, in the interest of producing the best specification.

Categories
RDF

RDFa in Drupal core

While I’m in the process of looking more closely at the Microdata proposal, I wanted to note that today marked the end of the first day of the code sprint for incorporating RDFa into the core of Drupal 7.

Yes, when Drupal 7 hits the streets, 1.7 million Drupal web sites, and counting, will have built-in support for RDFa.

Categories
Web

Cite not link

I do have considerable sympathy for 1Thomas Crampton, when he discovered that all of his stories at the International Herald Tribune have been pulled from the web because of a merger with the New York Times.

So, what did the NY Times do to merge these sites?

They killed the IHT and erased the archives.

1- Every one of the links ever made to IHT stories now points back to the generic NY Times global front page.

2- Even when I go to the NY Times global page, I cannot find my articles. In other words, my entire journalistic career at the IHT – from war zones to SARS wards – has been erased.

At the same, though, I don’t have as much sympathy for Wikipedia losing its links to the same stories, as detailed by 2Crampton in a second posting.

The issue: Wikipedia – one of the highest traffic websites on the Internet – makes reference to a large number of IHT stories, but those links are now all dead. They need to delete them all and find new references or use another solution.

As I wrote in comments at Teleread:

I do have sympathy, I know I would be frustrated if my stories disappeared from the web, but at the same time, there is a certain level of karma about all of this.

How many times have webloggers chortled at the closure of another newspaper? How many times have webloggers gloated about how we will win over Big Media?

The thing is, when Big Media is gone, who will we quote? Who will we link? Where will the underlying credibility for our stories be found?

Isn’t this exactly what webloggers have wanted all along?

Isn’t this what webloggers have wanted, all along?

I have sympathy for a writer losing his work, though I assume he kept copies of his writings. If they can’t be found in hard copies of the newspaper, then I’m assuming the paper is releasing its copyright on the items, and that Mr. Crampton will be able to re-publish these on his own. That’s the agreement I have with O’Reilly: when it no longer actively publishes one of my works, the copyright is returned to me. In addition, with some of the books, we have a mutual agreement that when the book is no longer published, the work will be released to the public domain.

I don’t have sympathy for Wikipedia, though, because the way many citations are made at the site don’t follow Wikipedia’s citation policy. Links are a lazy form of citation. The relevant passage in the publication should be quoted in the Wikipedia article, matched with a citation listing the publication, author, title of the work, and publication—not a quick link to an external site over which Wikipedia has no control.

I’m currently fixing one of my stories, Tyson Valley, a Lone Elk, and the Bomb because the original material was moved, without redirection. But as I fix the article, what I’m doing is making copies of all of the material, for my own reference. Saving the web page is no different than making a photocopy of an article in the days before the web.

In addition, I will be adding a formal citation for the source, as well as the link, so if the article moves again, whoever reads my story will know how to search for the article’s new location. At a minimum, they’ll know where the article was originally found.

I’m also repackaging the public domain writing and images for serving at my site, again with a text citation expressing appreciations to the site that originally published the images.

By using this approach, the stories I consider “timeless”, in whatever context that word means in this ephemeral environment, would not require my constant intervention.

Authors posting to Wikipedia should be doing the same, and this policy should be enforced: provide a direct quote of relevant material (allowed under Fair Use), and provide a formal citation, in addition to the link. Or perhaps, instead of the link. Because when the newspapers disappear, they’ll have no reason to keep the old archives. No reason at all. And then, where will Wikipedia be?

1Crampton, Thomas, “Reporter to NY Times Publisher: You Erased My Career”, thomascrampton.com. May 8, 2009.
2Crampton, Thomas, “Wikipedia Grappling with Deletion of IHT.com”, thomascrampton. May 8, 2009.