Annotation

(This document is part of an effort to flesh out use cases for microdata inclusion in HTML5. See the original use case document, and the background material document as well as the email correspondence that best describes this process.)

————–

USE CASE: Allow authors to annotate their documents to highlight the key
parts, e.g. as when a student highlights parts of a printed page, but in a
hypertext-aware fashion.

SCENARIOS:

* Fred writes a page about Napoleon. He can highlight the word Napoleon
in a way that indicates to the reader that that is a person. Fred can
also annotate the page to indicate that Napoleon and France are
related concepts.

—————

Ian has already provided his summary of this use case in the What WG group list. His summary

This use case isn’t altogether clear, but if the target audience of the
annotations is human readers (as opposed to machines and readers using
automated processing tools), then it seems like this is already possible
in a number of ways in HTML5.

…

In conclusion, this use case doesn’t seem to need any new changes to the
language.

This use case was submitted by Kingsley Idehen, who said considerably more than was entered into the summary user case. Kingsley wrote:

When writing HTML (by hand or indirectly via a program) I want to
isolate at describe what the content is about in terms of people,
places, and other real-world things. I want to isolate “Napoleon” from a
paragraph or heading, and state that the aforementioned entity is: is
of type “Person” and he is associated with another entity “France”.

The use-case above is like taking a highlighter and making notes while
reading about “Napoleon”. This is what we all do when studying, but when
we were kids, we never actually shared that part of our endeavors since
it was typically the route to competitive advantage i.e., being top
student in the class.

What I state above is antithetical to the essence of the World Wide Web,
as vital infrastructure harnessing collective intelligence.

RDFa is about the ability to share what never used to be shared. It
provides a simple HTML friendly mechanism that enables Web Users or
Developers to describe things using the Entity-Attribute-Value approach
(or Subject, Predicate, Object) without the tedium associated with
RDF/XML (one of the other methods of making statements for the
underlying graph model that is RDF).

This use case could have used some more discussion between Ian and Kingsley, because, in my opinion, Ian’s interpretation doesn’t match what Kingsley wrote.

Kingsley wrote about annotating the information within the publication, as one would use a highlighter, but he didn’t mean that this information actually has to be highlighted and made visible to the person reading the text. I believe he meant that the annotation would be visible to processes that could then be made available, both to the individual who made the annotation (most likely at a later time, as notes), or perhaps others when aggregated (the latter is my own interpretation).

The question then, is there a mechanism currently in HTML5 where one can annotate the data within a writing, in a non-visible manner, and which one then be used to make an assertion, such as Napoleon is the name of a person, and the person Napoleon is related to another entity, this one named France (which is the name of a country, and so on).

So, let me take another try at this use case:

Within a writing published on the web, I want to add annotation into the text to highlight specific facts, but I don't want such highlighting to distract from the text, so I don't want it to be visible. An example of the type of annotation I may make is to highlight the word "Napoleon" and annotate this word with an assertion that Napoleon is a person, and to add further information, that the person, Napoleon, is related to France (a country).

I write on many topics, and so I may make use of several different vocabularies in order to perform my annotation. In addition, I may have to create my own vocabulary if the annotation I want to make doesn't match any of the known and previously published vocabularies. If I do, I'll do so in such a way that there can't be a possible conflict with any other vocabulary.

Once my text is documented, I want to be able to access this annotation at a later time, separate from the document. To do this, I'll process each of my writings with an application that will pull out this specialized annotation, for aggregation and later query. In addition, by using a standard metadata annotation technique and model, the data can also be accessed by search engines, making the data also available to others.

It would help to get concurrence from Kingsley as to the accuracy of my assessment, but I do feel comfortable that my use case is a closer approximation to what Kingsley meant. If this is so, Ian’s concluding statement about this use case, including the fact that it would require no change to HTML5 could be in error.