Categories
HTML5 W3C

Annotation

(This document is part of an effort to flesh out use cases for microdata inclusion in HTML5. See the original use case document, and the background material document as well as the email correspondence that best describes this process.)

————–

USE CASE: Allow authors to annotate their documents to highlight the key
parts, e.g. as when a student highlights parts of a printed page, but in a
hypertext-aware fashion.

SCENARIOS:

* Fred writes a page about Napoleon. He can highlight the word Napoleon
in a way that indicates to the reader that that is a person. Fred can
also annotate the page to indicate that Napoleon and France are
related concepts.

—————

Ian has already provided his summary of this use case in the What WG group list. His summary

This use case isn’t altogether clear, but if the target audience of the
annotations is human readers (as opposed to machines and readers using
automated processing tools), then it seems like this is already possible
in a number of ways in HTML5.

In conclusion, this use case doesn’t seem to need any new changes to the
language.

This use case was submitted by Kingsley Idehen, who said considerably more than was entered into the summary user case. Kingsley wrote:

When writing HTML (by hand or indirectly via a program) I want to
isolate at describe what the content is about in terms of people,
places, and other real-world things. I want to isolate “Napoleon” from a
paragraph or heading, and state that the aforementioned entity is: is
of type “Person” and he is associated with another entity “France”.

The use-case above is like taking a highlighter and making notes while
reading about “Napoleon”. This is what we all do when studying, but when
we were kids, we never actually shared that part of our endeavors since
it was typically the route to competitive advantage i.e., being top
student in the class.

What I state above is antithetical to the essence of the World Wide Web,
as vital infrastructure harnessing collective intelligence.

RDFa is about the ability to share what never used to be shared. It
provides a simple HTML friendly mechanism that enables Web Users or
Developers to describe things using the Entity-Attribute-Value approach
(or Subject, Predicate, Object) without the tedium associated with
RDF/XML (one of the other methods of making statements for the
underlying graph model that is RDF).

This use case could have used some more discussion between Ian and Kingsley, because, in my opinion, Ian’s interpretation doesn’t match what Kingsley wrote.

Kingsley wrote about annotating the information within the publication, as one would use a highlighter, but he didn’t mean that this information actually has to be highlighted and made visible to the person reading the text. I believe he meant that the annotation would be visible to processes that could then be made available, both to the individual who made the annotation (most likely at a later time, as notes), or perhaps others when aggregated (the latter is my own interpretation).

The question then, is there a mechanism currently in HTML5 where one can annotate the data within a writing, in a non-visible manner, and which one then be used to make an assertion, such as Napoleon is the name of a person, and the person Napoleon is related to another entity, this one named France (which is the name of a country, and so on).

So, let me take another try at this use case:

Within a writing published on the web, I want to add annotation into the text to highlight specific facts, but I don't want such highlighting to distract from the text, so I don't want it to be visible. An example of the type of annotation I may make is to highlight the word "Napoleon" and annotate this word with an assertion that Napoleon is a person, and to add further information, that the person, Napoleon, is related to France (a country).

I write on many topics, and so I may make use of several different vocabularies in order to perform my annotation. In addition, I may have to create my own vocabulary if the annotation I want to make doesn't match any of the known and previously published vocabularies. If I do, I'll do so in such a way that there can't be a possible conflict with any other vocabulary.

Once my text is documented, I want to be able to access this annotation at a later time, separate from the document. To do this, I'll process each of my writings with an application that will pull out this specialized annotation, for aggregation and later query. In addition, by using a standard metadata annotation technique and model, the data can also be accessed by search engines, making the data also available to others.

It would help to get concurrence from Kingsley as to the accuracy of my assessment, but I do feel comfortable that my use case is a closer approximation to what Kingsley meant. If this is so, Ian’s concluding statement about this use case, including the fact that it would require no change to HTML5 could be in error.

Categories
HTML5 W3C

Going non-standard

As you may have noticed and will continue to notice, my sites are changing. Sometimes not for the better, as I try something out and it doesn’t work. All of this effort is for my new book, which will include coverage of web page markup, in addition to other technologies, though the book will end up being more of a narrative than the tutorials and how-tos I’ve done in the past.

Currently, this page is set to XHTML5, and I’ve even added a nav element, though not the script that will “trigger” its inclusion in IE. I’m trying to work out how one can use X/HTML5 elements without being forced to incorporate JavaScript, which is not a progressive solution.

Without getting into too many details on the book, it does get into discussions on the future of the web, though I find that when it comes to page markup, I’m having an increasingly difficult time determining what to include. I find I have little faith in the ongoing effort at the W3C and WhatWG on a new HTML, especially after recent discussions in the IRC and mailing lists.

For instance, the web site Last Week in HTML5 snarkily points out IRC entries from Mark Pilgrim, where he accuses Sam Ruby of being either intentionally divisive or stupid in relation to a mailing list item that Sam posted *suggests if Sam Ruby had posted the item, it would be intentionally divisive, but since Chris Wilson from Microsoft posted the item, it’s just plain stupid. I thought at one time Sam and Mark were friends, but obviously those times are in the past. Regardless, such petty bickering only undermines the credibility of the entire effort. Is this demonstrative of a new “hip” professionalism in the web world? If so, I can understand now why some Ruby/Rails folk thought it acceptable to incorporate soft porn in a technical presentation. Frankly, I’m finding this whole “rock star on the edge” thing is getting old.

Immaturity of actions aside, there is obviously a split between the leadership of the HTML effort in the W3C, and the editorial leadership in the WhatWG, and though perhaps this is the “norm” in these efforts, it fills me with dismay about the future of web markup. A Hatfield-McCoy type of feud makes for great Americana, but lousy standards.

What adds to my overall sense of discouragement about the HTML5 effort, is knowing that there is no longer even a pretense of openness in the effort. Today, I read in the whatwg IRC (and continued into the microformats IRC channel) a discussion between Tantek Celik and Ian Hickson, where Ian basically lets Tantek completely control what happens with the so-called “microdata requirements document”—a move which demonstrates Ian’s bias, and an absolute disdain for an entire group’s (RDFa) efforts. Not to mention a disdain for the Creative Commons effort, too, which Ian condemns for its “license proliferation”.

I never thought I would someday write that I long for the stodgy, pedantic W3C of yore, where the smallest detail is meticulously discussed and recorded, excessively so at times—a process I felt was comparable to fingernails across a chalk board. However, I have seen the other side now, on IRC and in the mailing lists, where individuals casually scratch their various itches in public, and the result becomes part of the specification we’ll be stuck with for years.

An end result of such shenanigans is that web page validity no longer means what it used to mean. And is no longer as important, either. In the future, to get a “valid” stamp means that we have to adhere to a small, controlling group of people’s interpretation of what the web should be, and I’m just not willing to go there. Not for a stupid graphic that states “This page is valid in crappy biased markup”; markup, whose only purpose now, from what I can see, is cool, oversexed, overscripted Ajax applications, and stuff Google wants.

Obviously, this state is acceptable to the representatives from Mozilla, Opera, Apple, and Microsoft, not to mention other members of the W3C. If it weren’t, the companies would, I presume, use their collective clout to tell the members of this effort to grow the hell up. No, these companies would rather go off and do their own thing, in competition, rather than in cooperation, and we will face a future of the web, where cross-browser issues are the norm, rather than the exception. Yup, keep your eye on the ACID prize, and just forget about the real world, where people have to make things work.

As for the SVG working group, and the RDFa working group, not to mention MathML and Web Accessibility groups, and others that have tried to incorporate some form of extensibility and usability, not to mention accessibility, into this mishmash of a standard: I admire your patience and determination to look beyond the pettiness in order to appease one small group’s idea of what is an ideal web, but I can’t help thinking that if you bend over backwards too much, you only end up kissing your own ass good-bye.

*Sorry, misread the original IRC entry.

Categories
HTML5

HTML4 is to markup

In an interview at WebScienceMan titled, XHTML Users: Grow up!, the interviewee, Sitepoint’s Tommy Olsson answers a question as to whether he likes XHTML with, Grow up! 🙂 Seriously, XHTML is long dead, due to a decade of horrible abuse. Not even the bleached bones remain..

Mr. Olsson believes that we should be using HTML 4, strict HTML 4, because HTML5 is still a bit of whimsy, and XHTML is a pile of dead bones. As I wrote in comments, HTML 4 is to markup, like 8-track is to music.

8-track cartridge

Categories
HTML5

HTML5: Put up or shut up

Sam Ruby

I question the presumption implicit in the notions of “the” editor, and “the” spec. I reluctantly accept the notion that any individual spec development process need not employ processes requiring consensus or voting, but I reject any implication, however subtle, of inevitability or entitlement.

Simply put, there needs to be a recourse if a person or a group disagrees with a decision made by the editor of the WHATWG document. That recourse is forking.

I realize that that is a very high bar, and will say that is intentionally so. Simply put, specs don’t write themselves… I don’t care how good you think your idea is, either you need to step up and directly write the spec text yourself, or accept that you need to be persuasive.

Quite simply, that is the most absurd set of statements I have ever read. What Sam is saying, if you don’t like it, fork, or shut up.

Have to be persuasive? How can one be persuasive when there are underlying biases and prejudices in play that makes it impossible to ever…ever persuade the gatekeepers to change their mind? Or even open their minds?

So the alternative that Sam allow us, is to fork the entire HTML specification. Contrary to some people involved in this discussion, most of us are not employed by large corporations and can spend all of our time reading mailing lists or participating in specification work. Most of us have to do other things in order to pay the rent, or buy food.

But we are still dependent on the same specifications, still concerned that what comes out of a group such as the HTML5 working group is the best specification for as many people as possible—not just representatives from one or two companies who control the HTML5 specification development with a fist clad in an arrogance as dense as the thickest iron.

As for contributing to the group, the HTML5 editor did put something out, recently, on the mailing list about other editors. The requirements demanded for these voluteers were such that few of us could even consider applying. I can’t guarantee I have 20+ hours to devote every week. I can’t guarantee that I can fly to meetings with other editors, no, not even once a year. The most I, and others like me, can guarantee is that we would try our best, but keeping the roofs over our heads has to be our first priority. When was the last time the powers-to-be behind the HTML5 effort opened their windows and got a good whiff of our troubled times?

I also resent the assumption that those of us not directly contributing to the editing of a specification are not contributing. Contrary to what Sam seems to believe, we don’t need to be a member of a specification group, or an editor of a specification, to contribute to the overall success of the specification. People who write about the specifications, in books or articles, or who provide tutorials, example applications, libraries, help others—we contribute just as much as those who formally create the specs. The only difference is that our names don’t get listed, we rarely get credit, and evidently, according to Sam, we shouldn’t express any concerns, or frustrations, either.

Well, perhaps that is the way of the world for HTML5, but thankfully it hasn’t been that way for any other web specification I use, including XHTML, CSS, RDF, SVG, and so on. Oh, we still may not be able to influence these specifications, but I’ve not seen any of these groups give so much power over the direction of the specifications to so few. I’ve not heard once, from any of the people behind the specifications, to either put up, or shut up.

Categories
HTML5

Extensibility and markup, again and again

Recovered from the Wayback Machine.

Proving that the issues with extensibility will never go away until faced, and resolved:

  • Anne van Kesteren: Concerns that HTML5 does not have distributed extensibility. That is, namespaces. What people seem to want is to extend the browser with hundreds of markup languages. (How this keeps things simple to answer was not something I saw addressed.) You need something else than namespaces for that though, to start with. Also, what is wrong with using XML for this?
  • Sam Ruby: It seems that the distributed extensibility discussion won’t go away like apparently some would hope it would […] It occurs to me that Anne may be intentionally being thick here. what is wrong with using XML for this? Come on. I can answer that with two words: IE, and Postel. Next question?
  • Dave Orchard: While I agree with Sam’s assertion that misdirection is going on and IE8 is crucial, I think the real issue is that the anti-distributed extensibility crowd want control over all the languages that could be added into HTML. There’s no changing XML that would make them happy. I think the goal is that the HTML WG becomes the gatekeeper over any new languages that get added into the browser. We’ve seen it with aria-, SVG, MathML. Note that IE8 has a form of namespaces, and Chris Wilson was a supporter of distributed extensibility on the HTML WG list.

I’m not sure we need another form of namespaces. What we need is to address the concept of extensibility, without looking at the mechanics. Is extensibility good? Years ago, I would have been puzzled at even asking this question. Of course extensibility is good. Now I’m not sure that this opinion is shared by one and all. So, perhaps we should ask, Is extensibility bad? From the answers, we might find out where the problems exist, and maybe generate a dialog that results in solutions.