Categories
HTML5 W3C

Annotation

(This document is part of an effort to flesh out use cases for microdata inclusion in HTML5. See the original use case document, and the background material document as well as the email correspondence that best describes this process.)

————–

USE CASE: Allow authors to annotate their documents to highlight the key
parts, e.g. as when a student highlights parts of a printed page, but in a
hypertext-aware fashion.

SCENARIOS:

* Fred writes a page about Napoleon. He can highlight the word Napoleon
in a way that indicates to the reader that that is a person. Fred can
also annotate the page to indicate that Napoleon and France are
related concepts.

—————

Ian has already provided his summary of this use case in the What WG group list. His summary

This use case isn’t altogether clear, but if the target audience of the
annotations is human readers (as opposed to machines and readers using
automated processing tools), then it seems like this is already possible
in a number of ways in HTML5.

In conclusion, this use case doesn’t seem to need any new changes to the
language.

This use case was submitted by Kingsley Idehen, who said considerably more than was entered into the summary user case. Kingsley wrote:

When writing HTML (by hand or indirectly via a program) I want to
isolate at describe what the content is about in terms of people,
places, and other real-world things. I want to isolate “Napoleon” from a
paragraph or heading, and state that the aforementioned entity is: is
of type “Person” and he is associated with another entity “France”.

The use-case above is like taking a highlighter and making notes while
reading about “Napoleon”. This is what we all do when studying, but when
we were kids, we never actually shared that part of our endeavors since
it was typically the route to competitive advantage i.e., being top
student in the class.

What I state above is antithetical to the essence of the World Wide Web,
as vital infrastructure harnessing collective intelligence.

RDFa is about the ability to share what never used to be shared. It
provides a simple HTML friendly mechanism that enables Web Users or
Developers to describe things using the Entity-Attribute-Value approach
(or Subject, Predicate, Object) without the tedium associated with
RDF/XML (one of the other methods of making statements for the
underlying graph model that is RDF).

This use case could have used some more discussion between Ian and Kingsley, because, in my opinion, Ian’s interpretation doesn’t match what Kingsley wrote.

Kingsley wrote about annotating the information within the publication, as one would use a highlighter, but he didn’t mean that this information actually has to be highlighted and made visible to the person reading the text. I believe he meant that the annotation would be visible to processes that could then be made available, both to the individual who made the annotation (most likely at a later time, as notes), or perhaps others when aggregated (the latter is my own interpretation).

The question then, is there a mechanism currently in HTML5 where one can annotate the data within a writing, in a non-visible manner, and which one then be used to make an assertion, such as Napoleon is the name of a person, and the person Napoleon is related to another entity, this one named France (which is the name of a country, and so on).

So, let me take another try at this use case:

Within a writing published on the web, I want to add annotation into the text to highlight specific facts, but I don't want such highlighting to distract from the text, so I don't want it to be visible. An example of the type of annotation I may make is to highlight the word "Napoleon" and annotate this word with an assertion that Napoleon is a person, and to add further information, that the person, Napoleon, is related to France (a country).

I write on many topics, and so I may make use of several different vocabularies in order to perform my annotation. In addition, I may have to create my own vocabulary if the annotation I want to make doesn't match any of the known and previously published vocabularies. If I do, I'll do so in such a way that there can't be a possible conflict with any other vocabulary.

Once my text is documented, I want to be able to access this annotation at a later time, separate from the document. To do this, I'll process each of my writings with an application that will pull out this specialized annotation, for aggregation and later query. In addition, by using a standard metadata annotation technique and model, the data can also be accessed by search engines, making the data also available to others.

It would help to get concurrence from Kingsley as to the accuracy of my assessment, but I do feel comfortable that my use case is a closer approximation to what Kingsley meant. If this is so, Ian’s concluding statement about this use case, including the fact that it would require no change to HTML5 could be in error.

Categories
HTML5 W3C

Going non-standard

As you may have noticed and will continue to notice, my sites are changing. Sometimes not for the better, as I try something out and it doesn’t work. All of this effort is for my new book, which will include coverage of web page markup, in addition to other technologies, though the book will end up being more of a narrative than the tutorials and how-tos I’ve done in the past.

Currently, this page is set to XHTML5, and I’ve even added a nav element, though not the script that will “trigger” its inclusion in IE. I’m trying to work out how one can use X/HTML5 elements without being forced to incorporate JavaScript, which is not a progressive solution.

Without getting into too many details on the book, it does get into discussions on the future of the web, though I find that when it comes to page markup, I’m having an increasingly difficult time determining what to include. I find I have little faith in the ongoing effort at the W3C and WhatWG on a new HTML, especially after recent discussions in the IRC and mailing lists.

For instance, the web site Last Week in HTML5 snarkily points out IRC entries from Mark Pilgrim, where he accuses Sam Ruby of being either intentionally divisive or stupid in relation to a mailing list item that Sam posted *suggests if Sam Ruby had posted the item, it would be intentionally divisive, but since Chris Wilson from Microsoft posted the item, it’s just plain stupid. I thought at one time Sam and Mark were friends, but obviously those times are in the past. Regardless, such petty bickering only undermines the credibility of the entire effort. Is this demonstrative of a new “hip” professionalism in the web world? If so, I can understand now why some Ruby/Rails folk thought it acceptable to incorporate soft porn in a technical presentation. Frankly, I’m finding this whole “rock star on the edge” thing is getting old.

Immaturity of actions aside, there is obviously a split between the leadership of the HTML effort in the W3C, and the editorial leadership in the WhatWG, and though perhaps this is the “norm” in these efforts, it fills me with dismay about the future of web markup. A Hatfield-McCoy type of feud makes for great Americana, but lousy standards.

What adds to my overall sense of discouragement about the HTML5 effort, is knowing that there is no longer even a pretense of openness in the effort. Today, I read in the whatwg IRC (and continued into the microformats IRC channel) a discussion between Tantek Celik and Ian Hickson, where Ian basically lets Tantek completely control what happens with the so-called “microdata requirements document”—a move which demonstrates Ian’s bias, and an absolute disdain for an entire group’s (RDFa) efforts. Not to mention a disdain for the Creative Commons effort, too, which Ian condemns for its “license proliferation”.

I never thought I would someday write that I long for the stodgy, pedantic W3C of yore, where the smallest detail is meticulously discussed and recorded, excessively so at times—a process I felt was comparable to fingernails across a chalk board. However, I have seen the other side now, on IRC and in the mailing lists, where individuals casually scratch their various itches in public, and the result becomes part of the specification we’ll be stuck with for years.

An end result of such shenanigans is that web page validity no longer means what it used to mean. And is no longer as important, either. In the future, to get a “valid” stamp means that we have to adhere to a small, controlling group of people’s interpretation of what the web should be, and I’m just not willing to go there. Not for a stupid graphic that states “This page is valid in crappy biased markup”; markup, whose only purpose now, from what I can see, is cool, oversexed, overscripted Ajax applications, and stuff Google wants.

Obviously, this state is acceptable to the representatives from Mozilla, Opera, Apple, and Microsoft, not to mention other members of the W3C. If it weren’t, the companies would, I presume, use their collective clout to tell the members of this effort to grow the hell up. No, these companies would rather go off and do their own thing, in competition, rather than in cooperation, and we will face a future of the web, where cross-browser issues are the norm, rather than the exception. Yup, keep your eye on the ACID prize, and just forget about the real world, where people have to make things work.

As for the SVG working group, and the RDFa working group, not to mention MathML and Web Accessibility groups, and others that have tried to incorporate some form of extensibility and usability, not to mention accessibility, into this mishmash of a standard: I admire your patience and determination to look beyond the pettiness in order to appease one small group’s idea of what is an ideal web, but I can’t help thinking that if you bend over backwards too much, you only end up kissing your own ass good-bye.

*Sorry, misread the original IRC entry.

Categories
Diversity RDF W3C

The intent speaks louder than words

Recovered from the Wayback Machine.

I was thinking about taking a shot at writing my own use case or use cases for RDFa in HTML5 until I spotted the recent entry at Last Week in HTML. The site posts an excerpt from an IRC discussion related to the ongoing exchange about RDFa and HTML5.

* hsivonen is surprised to see Shelley Powers use a pharse like ” most pedantic specification ever derived by man”
hsivonen: (the “by man” part)

annevk: hsivonen, what is special about that part?

hsivonen: annevk: she has a history of pointing out sexism, and expressions like “by man” where ‘man’ means humans in general are generally frowned upon by English-language feminists

annevk: oh, didn’t know that

Rather than respond to any of the arguments and concerns expressed in several comments at Sam’s, or my own long writings on the issue of RDFa and HTML5, the only part of my writing that’s mentioned or referenced in this ongoing discussion is the fact that I used the generic “Man”, to represent humankind.

Actually most English-language feminists aren’t necessarily uptight about the use of “Man” when used in the generic sense, such as in the common phrase, “known to Man”. Or, at a minimum, the use of this common phrase isn’t one of our more pressing concerns. We’re more uptight about our work, writings, and opinions being undermined via the use of irrelevancies. To the true feminist, intent means more than words.

So, to return to the use case: I could spend a considerable amount of time trying to recap the issues related to Qnames and CURIEs, technical concerns versus biases, and generate a longer, thoughtful use case, but unless I use a Word in it, or perhaps a humorous misspelling or funny use of grammar, the work would most likely be disregarded.

Categories
Graphics/CSS W3C

The semantic web gots badges

Congratulations to the W3C for finally reclaiming the semantic web back from the drug industry. Seriously, the new logos are a good idea, and they’re quite attractive.

W3C logo

The only thing that gave me pause about the logos are the terms of use:

  • When used on the Web, the logo must be an active link to http://www.w3.org/2001/sw/
  • The logo must not be used in any manner which implies W3C sponsorship or endorsement of your product, service, or Internet site.
  • The logo may not be used to disparage W3C, its Member organizations, services, or products.
  • The logo must stand alone: it cannot be combined with any other design element such as photography, type, borders, nor can it be incorporated into another logo.

Not disparage the W3C…hmmmm. Taking a cue from my boy, Danny, who interpreted the terms of use thus and thus, I’m promoting the release of the stylish new logos in my own, uniquely Burningbird, way:

Semantic Web

W3C Semantic Web
Microformats site

More interpretations

twist and spin semantic web
2007 The Semantic Web: Do you know where your lawyers are?

Categories
W3C

The whole thing

The Architecture of the World Wide Web, First Edition was just issued as a W3C recommendation. I love that title — it reminds me of Monty Python’s “The Meaning of Life”, volume one.

Interesting bit about URIs in the document. To address the ‘resource as something on the web’ as compared to ‘resource as something that can be discussed on the web’ issue, the document describes a resource thusly:

By design a URI identifies one resource. We do not limit the scope of what might be a resource. The term “resource” is used in a general sense for whatever might be identified by a URI. It is conventional on the hypertext Web to describe Web pages, images, product catalogs, etc. as “resources”?. The distinguishing characteristic of these resources is that all of their essential characteristics can be conveyed in a message. We identify this set as “information resources”.

This document is an example of an information resource. It consists of words and punctuation symbols and graphics and other artifacts that can be encoded, with varying degrees of fidelity, into a sequence of bits. There is nothing about the essential information content of this document that cannot in principle be transfered in a representation.

However, our use of the term resource is intentionally more broad. Other things, such as cars and dogs (and, if you’ve printed this document on physical sheets of paper, the artifact that you are holding in your hand), are resources too. They are not information resources, however, because their essence is not information. Although it is possible to describe a great many things about a car or a dog in a sequence of bits, the sum of those things will invariably be an approximation of the essential character of the resource.

The document then gets into URI collision:

By design, a URI identifies one resource. Using the same URI to directly identify different resources produces a URI collision. Collision often imposes a cost in communication due to the effort required to resolve ambiguities.

Suppose, for example, that one organization makes use of a URI to refer to the movie The Sting, and another organization uses the same URI to refer to a discussion forum about The Sting. To a third party, aware of both organizations, this collision creates confusion about what the URI identifies, undermining the value of the URI. If one wanted to talk about the creation date of the resource identified by the URI, for instance, it would not be clear whether this meant “when the movie was created” or “when the discussion forum about the movie was created.”

Social and technical solutions have been devised to help avoid URI collision. However, the success or failure of these different approaches depends on the extent to which there is consensus in the Internet community on abiding by the defining specifications.