Categories
Specs W3C XHTML/HTML

The HTML5 silly season

Cynthia Shelly released an alternative proposed HTML5 draft that addresses the table summary attribute. The responses to her draft have been less than edifying, and demonstrate rather succinctly most things wrong with the HTML WG.

If you follow along in the thread, you’ll see Apple’s Maciej and IBM’s Sam Ruby go back and forth on protocol, a discussion Maciej ends with a suggestion to focus on salvaging a “proposal” from the work. The thing is, providing alternative text in a specification is the proposal, the only that is deemed acceptable to the HTML WG. At least, that’s what we’ve been told in the past.

Not that Cynthia is demanding that the text be used as is. This was a suggested text, addressing how summary could be discussed in the HTML5 specification, in order to ensure proper use. The proposal also removes summary from the obsolete list. Cynthia proposed this alternative text in order to generate discussion, leading to its refinement; to encourage team effort. Simple enough to understand, but then we’re subjected to the typical Ian Hickson disingenuous approach to anything he disagrees with: pretend he doesn’t understand what the proposal is all about.

I couldn’t find any description of what problem this proposal is trying to solve. Could you point me to the description of the issue that is being resolved here? Why is the text currently in the HTML5 spec not considered acceptable middle ground?

It is difficult to evaluate proposals without understanding what problems they are trying to solve.

Incidentally, I believe the process that we are supposed to be following these days is that when there is a problem in the spec, a bug should be filed describing the problem, so that the issue can be tracked. If you could file a bug (or point me to the relevant bug if one is already filed), that would be very helpful.

(At this point I would like to inform my readers: everyone can file a bug, you don’t have to be a member of any W3C organization to do so.)

So, HTML WG team members are told by one of the HTML WG Chairs to provide alternative specification text, while the HTML5 author countermands such a recommendation, with a note that we file bugs, instead. Seriously, I keep expecting the third stooge to enter the scene, stage left.

And he does. The author of validator.nu, worker extraordinaire for Mozilla, Henri Sivonen, puts on his court jester cap to derail even the potential for worthwhile discussion:

Further quotes are from the proposed text--not from Maciej:
> Summary is one way to provide explanatory information about tables  
> that consist of more than just a grid of cells with headers in the  
> first row and headers in the first column.
>
Does this intend to say that using @summary is categorically  
unnecessary when headers appear in the first column and/or first row?  
If so, it would be good to make this clear.
> Such explanatory information should introduce the purpose of the  
> table,
>
Shouldn't the purpose be stated to all readers?
> outline its basic cell structure,
>
Shouldn't this be generated by the AT from the table model?
> The information provided by the summary is needed by users who  
> cannot see the table, but would usually be redundant for those who  
> can.
>
This sentence sticks out as non-spec-like. It doesn't state a  
requirement, so it looks odd in the middle of a paragraph that states  
requirements.
> This must be done in a way that is associated with the table via  
> markup, such that user agents and assistive technology can  
> programmatically determine the relationship.
>
This sentence could make sense in WCAG-like contexts where things are  
defined in terms of what available software happens to support. It  
doesn't make sense in a spec that defines what software must support.  
(Furthermore, "programmatically determine" is a special term from  
other specs but isn't defined as part of the special vocabulary of the  
HTML5 spec.)

The proposed text seems to imply (in the edits done on examples) that  
having the explanation in a paragraph preceding the table isn't  
sufficient without an explicit aria-describedby link (misspelled in  
the proposed text as aria-described-by). Why is that not sufficient?
> When using summary in combination with another technique, authors  
> must not use the duplicate text, but instead use summary for the  
> parts of the description that are only useful to users who cannot  
> see the table.
>
What about duplicating information that AT should be able to voice  
based on the table model?
> <table summary="The table is divided into six columns: Map number,  
> Date, Area or stream with flooding, Reported deaths, Approximate  
> costs (uninflated), and Comments. The rows are grouped by flood  
> types into six subcategories: Regional flood, Flash flood, Ice-jam  
> flood, Storm-surge flood, Dam-failure flood and Mudflow flood." 
>
In this case, the first sentence clearly duplicates information that  
are trivially programmatically determinable by the AT from the table  
model (given proper <th> markup). As for the second sentence, I think  
it would be worth investigating if the salient content of the second  
sentence is also realistically programmatically determinable from the  
table model. On the face of it, discovering the content of the second  
sentence from the table model doesn't seem like an overly hard  
software problem.

So, the text of the proposal that Cynthia provides is addressed to humans, which Henri rejects, because Cynthia’s text should be addressed to machines. She discusses declarative markup, and addresses this discussion to people, in order to ensure that the summary attribute is properly used by web page authors and designers. Henri reduces the whole to algorithms, care and feeding of.

This is a perfect lead in to another discussion about HTML5 taking place elsewhere, in the W3C TAG, which has ultimate responsibility for ensuring the many specifications such as HTML5 work in a complementary manner for the web. The focus of the TAG at this time is detailing issues this group has with the current HTML5 draft, a discussion generating a typically mature level of discussion in the WhatWG IRC channel.

One such issue, as I have noted, as others have noted, is the fact that the specification is given in algorithmic terms, rather than as declarative text—based on discussions of a Document Object Model (DOM) with HTML markup given as a distant secondary item (barely covered, and leaving ripples of confusion in its wake).

The current rendering of the specification is considered more precise for the browser companies, for Mozilla, Google, Microsoft, Opera, and Apple, but the precision completely obfuscates the information needed by thousands, perhaps millions of web page authors and designers.

In the past, the main specification would be about the markup, with a secondary document describing the DOM. And oddly enough, this has worked, if we can believe the evidence of our eyes. Evidently, this wasn’t to the taste of the browser companies, who believe that it is more important that their needs be met, rather than the needs of the thousands, perhaps millions of web page authors and designers.

In addition, rather than leave many decisions up to the implementors of the specification, the editor’s draft seeks to detail, in minute detail, how everything is to be handled by implementors. Precise, very precise. Good luck with the 50,000 or so test cases.

So far, I have submitted three HTML5 bugs:

  1. When Web Workers was removed from the spec, orphan references were left – clean up is needed
  2. To remove the Microdata section, as it isn’t necessary, nor widely supported
  3. To allow other namespaced elements in SVG, since the use of these elements is valid within SVG

And I just submitted a bug for table summary. There will be others. Too bad I’m not one of the elite.

Categories
W3C XHTML/HTML

XHTML2 is dead

XHTML2 news on Twitter

I have mixed feelings on this news.

On the one hand, I think it’s a good idea to focus on one X/HTML path.

On the other, I’ve been a part of the HTML WG for a little while now, and I don’t feel entirely happy, or comfortable with many of the decisions for X/HTML5, or for the fact that it is, for all intents and purposes, authored by one person. One person who works for Google, a company that can be aggressively competitive.

Categories
RDF Standards XHTML/HTML

A Loose Set of Notes on RDFa, XHTML, and HTML5

There’s been a great deal of discussion about RDFa, HTML5, and microdata the last few days, on email lists and elsewhere. I wanted to write down notes of the discussions here, for future reference. Those working issues with RDFa in Drupal 7 should pay particular attention, but the material is relevant to anyone incorporating RDFa.

Shane McCarron released a proposal for RDFa in HTML4, which is based on creating a DTD that extends support for RDFa in HTML4. He does address some issues related to the differences in how certain data is handled in HTML4 and XHTML, but for the most part, his document refers processing issues to the original RDFaSyntax document.

Philip Taylor responded with some questions, specifically about how xml:lang is handled by HTML5 parsers, as compared to XML parsers. His second concern was how to handle XMLLiteral in HTML5, because the assumption is that RDFa extractors in JavaScript would be getting their data from the DOM, not processing the characters in the page.

“If the object of a triple would be an XMLLiteral, and the input to the processor is not well-formed [XML]” – I don’t understand what that means in an HTML context. Is it meant to mean something like “the bytes in the HTML file that correspond to the contents of the relevant element could be parsed as well-formed XML (modulo various namespace declaration issues)”? If so, that seems impossible to implement. The input to the RDFa processor will most likely be a DOM, possibly manipulated by the DOM APIs rather than coming straight from an HTML parser, so it may never have had a byte representation at all.

There’s a lively little sub-thread related to this one issue, but the one response I’ll focus on is Shane, who replied, RDFa does not pre-suppose a processing model in which there is a DOM. The issue of xml:lang is also still under discussion, but I want to move on to new issues.

While the discussion related to Shane’s document was ongoing, Philip released his own first look at RDFa in HTML5. Concern was immediately expressed about Philip’s copying of some of Shane’s material, in order to create a new processing rule section. The concern wasn’t because of any issue to do with copyright, but the problems that can occur when you have two sets of processing rules for the same data and the same underlying data model. No matter how careful you are, at some point the two are likely to diverge, and the underlying data model corrupted.

Rather than spend time on Philip’s specification directly at this time, I want to focus, instead, on a note he attached to the email entry providing the link to the spec proposal. In it he wrote:

There are several unresolved design issues (e.g. handling of case-sensitivity, use of xmlns:* vs other mechanisms that cause fewer problems, etc) – I haven’t intended to make any decisions on such issues, I’ve just attempted to define the behaviour with sufficient detail that it should make those issues visible.

More on case sensitivity in a moment.

Discussion started a little more slowly for Philip’s document, but is ongoing. In addition, both Philip and Manu Sporney released test suites. Philip’s is focused on highlighting problems when parsing RDFa in HTML as compared to XHTML; The one that Manu posted, created by Shane, focused on a basic set of test cases for RDFa, generally, but migrated into the RDFa in HTML4 document space.

Returning to Philip’s issue with case sensitivity, I took one of Shane’s RDFa in HTML test cases, and the rdfquery JavaScript from Philip’s test suit, and created pages demonstrating the case sensitivity issue. One such is the following:

<!DOCTYPE HTML PUBLIC "-//ApTest//DTD HTML4+RDFa 1.0//EN" "http://www3.aptest.com/standards/DTD/html4-rdfa-1.dtd">
<html
xmlns:t="http://test1.org/something/"
xmlns:T="http://test2.org/something/"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<head>
<title>Test 0011</title>
</head>
<body>
<div about="">
Author: <span property="dc:creator t:apple T:banana">Albert Einstein</span>
<h2 property="dc:title">E = mc<sup>2</sup>: The Most Urgent Problem of Our Time</h2>
</div>
</body>
</html>

Notice the two namespace declarations, one for “t” and one for “T”. Both are used to provide properties for the object being described in the document: t:apple and T:banana. Parsing the document with a RDFa application that applies XML rules, treats the namespaces, “t” and “T” as two different namespaces. It has no problem with the RDFa annotation.

However, using the rdfquery JavaScript library, which treats “t” and “T” the same because of HTML case insensitivity, an exception results: Malformed CURIE: No namespace binding for T in CURIE T:banana. Stripping away the RDFa aspects, and focusing on the namespaces, you can see how browsers handle namespace case in an HTML document and in a document served up as XHTML. To make matter more interesting, check out the two pages using Opera 10, Firefox 3.5, and the latest Safari. Opera preserves the case, while both Safari and Firefox lowercase the prefix. Even within the HTML world, the browsers handle namespace case in HTML differently. However, all handle the prefixes the same, and correctly in XHTML. So does the rdfquery JavaScript library, as this test page demonstrates.

Returning to the discussion, there is some back and forth on how to handle case sensitivity issues related to HTML, with suggestions varying as widely as: tossing the RDFa in XHTML spec out and creating a new onetossing RDFa out in favor of Microdatacreating a best practices document that details the problem and provides appropriate warnings; creating a new RDFa in HTML document (or modifying existing profile document) specifying that all conforming applications must treat prefix names as case insensitive in HTML, (possibly cross-referencing the RDFa in XHTML document, which allows case sensitive prefixes). I am not in favor of the first two options. I do favor the latter two options, though I think the best practices document should strongly recommend using lowercase prefix names, and definitely not using two prefixes that differ only by case. During the discussion, a new conforming RDFa test case was proposed that tests based on case. This has now started its own discussion.

I think the problem of case and namespace prefixes (not to mention xmlns as compared to XMLNS) is very much an edge issue, not a show stopper. However, until a solution is formalized, be aware that xmlns prefix case is handled differently in XHTML and HTML. Since all things are equal, consider using lowercase prefixes, only, when embedding RDFa (or any other namespace-based functionality). In addition, do not use XMLNS. Ever. If not for yourself, do it for the kittens.

Speaking of RDFa in HTML issues, there is now a new RDFa in HTML issues wiki page. Knock yourselves out.

updatenew version of the RDFa in HTML4 profile has been released. It addresses a some of the concerns expressed earlier, including the issue of case and XMLLiteral. Though HTML5 doesn’t support DTDs, as HTML4 does, the conformance rules should still be good for HTML5.

Categories
SVG XHTML/HTML

Whipping boy

I noticed a passing twitter message from Laura Scott. It said One word: standards. Firefox follows w3c standards. Internet Explorer does not. She wrote it in response to another Twitter message from tutu4lu, who was having problems with a web page appearing differently with IE than Firefox.

It is true that Firefox implements more standards than IE, especially in when it comes to some of my favorites, such as SVG. And I appreciate the fact.

Firefox does not necessarily get an A+ for all of its effort, though. In particular, if Microsoft’s lack of implementation of XHTML has been one force against broader implementation of XHTML at web sites, Firefox’s own handling of XML errors in XHTML is another, more subtle force against XHTML.

Here’s an example. I added an ampersand (&) to a URL in one of my posts, which generates an XHTML error. The following are three screen shots from Chrome, Opera, and Safari, respectively, that demonstrate how they handle the error:

XHTML error in Chrome
Opera XHTML error
Safari error

Safari and Chrome are both built on WebKit, which handles XHTML errors by parsing, and rendering, the document up to the error. This has the advantage of providing some content, as well as being able to more quickly find the error when you’re debugging.

Opera doesn’t render the document, but it does provide a display of the source with highlighting where the error occurs. This is extremely helpful when you’re debugging a larger document. In addition, Opera also provides an option to render the document in HTML, rather than XHTML, which is helpful for everyone else.

Contrast and compare these screenshots with the following, from Firefox.

Firefox error handling

The Firefox XHTML error handling is also known as YSOD, or Yellow Screen of Death. It’s harsh, abrupt, and somewhat punishing in nature, with its sickly yellow background, and bright red text. The message is typically cut off by the edge of the browser window, so one can’t easily see where the error has occurred. It’s most definitely intimidating for readers who accidentally stumble on to an XHTML page currently in a broken state.

All four of the browsers do support the XHTML standard, and all stop processing the XHTML when an error occurs, as is proper. But where Safari/Webkit, Chrome/Webkit, and Opera try to provide a useful web page, Firefox picks up a ruler and gives the owner of the web site a good whacking.

It’s easy to fall into the trap of blaming all web development and design problems on Microsoft and IE, and to use IE as a whipping boy—to the exclusion of looking, critically, at the other browsers in the web space. If the lack of support for XHTML in IE is a primary inhibitor of the spread of XHTML, Firefox’s YSOD has to take the second place prize. Support for XHTML doesn’t end at the parser.

Categories
XHTML/HTML

On the Myths and Realities of XHTML

Recovered from the Wayback Machine.

Tina Holmboe from the XHTML WG has written a concise overview of XHTML titled XHTML—Myths and Realities. She’s provided a nice overview of the markup, including the purpose behind the development of XHTML and the state of XHTML today. The only somewhat jarring note I found about the overview is it seems that Tina went a bit out of her way not to sell XHTML. Perhaps this seeming “you should really need it before using it” push is the reality part of the topic.

I use content negotiation for my sites, serving up XHTML for those browsers and agents that can process XHTML, and HTML for the rest. I’m looking into embedded RDFa into my text in a new iteration of yet another site design, but my main reason for using XHTML is that I like to keep open the possibility of using inline SVG. I also think that support for XHTML seems to be broader than is implied by Tina, but again that could be her trying to downplay any hyperbole about XHTML—there’s hyperbole about XHTML?

Though I know this is outside of Tina’s overview, I would have like to have more focus on the differences between the HTML5/WhatWG stuff and XHTML 2.0. It’s confusing that we have one group working supposedly on an “XHTML 5.0”, and another on XHTML 2.0. Especially when one of the main issues to do with XHTML 2.0 was XForms, while a milestone reached with HTML5 recently was the incorporation of Web Forms 2.0—but don’t let the “forms” that appears in both fool you into thinking we have any form of consensus or agreement.

I’m beginning to think that the HTML5 working group should completely and thoroughly remove all support for, and even mention of, XHTML from the HTML5 specification. The group finds extensibility to be anathema, but extensibility through namespaces is the heart and soul of XHTML. Seems to me that any form of XHTML, or nod to XHTML, coming out of the group would be a bastard cousin, at best.

Instead of XHTML coming out of the HTML5 group, perhaps we could look at ways to incorporate the new HTML5 objects via namespace to XHTML, but via the W3C XHTML path. In other words, honor the extensibility of XHTML, accept the necessity of a closed world for HTML5 and have one path for HTML, one separate path for XHTML, with the twain meeting via DOM. After all, it’s only serialization differences between XForms and Web Forms 2.0, right?

Or, conversely, we abandon the separate XHTML 2.0 path, and incorporate and embrace extensibility into HTML5. But I’m not one to bank on pigs flying.

I’m not a markup expert, nor am I involved in developing browsers, so perhaps my view is both simplistic and naive. But I can’t help thinking that the HTML5 working group does not have the mindset or interest in extensibility, and at most, will toss bits of seeming extensibility in to placate the noisy. However, this group’s continuing reference to an “XHTML 5” is confusing when you consider there’s a separate, formal upgrade path for XHTML 2.0. The W3C says there’s nothing to worry about because it’s all just serialization under the skin—but it goes beyond just basic serialization techniques, doesn’t it? If it were just serialization technique differences, would the same topics keep arising in the HTML5 WG threads? I mean, if working with RDF has taught me one thing, it’s that converting between two different forms of serialization is trivial—it’s the underlying model that matters.

Really, the W3C is leaving all of this in a bit of a mess.

However, I both digress and am going off on a tangent. This post was about Tina Holmboe’s XHTML overview, which is excellent and worth a read.

(via Simon Willison)