On the Myths and Realities of XHTML

Recovered from the Wayback Machine.

Tina Holmboe from the XHTML WG has written a concise overview of XHTML titled XHTML—Myths and Realities. She’s provided a nice overview of the markup, including the purpose behind the development of XHTML and the state of XHTML today. The only somewhat jarring note I found about the overview is it seems that Tina went a bit out of her way not to sell XHTML. Perhaps this seeming “you should really need it before using it” push is the reality part of the topic.

I use content negotiation for my sites, serving up XHTML for those browsers and agents that can process XHTML, and HTML for the rest. I’m looking into embedded RDFa into my text in a new iteration of yet another site design, but my main reason for using XHTML is that I like to keep open the possibility of using inline SVG. I also think that support for XHTML seems to be broader than is implied by Tina, but again that could be her trying to downplay any hyperbole about XHTML—there’s hyperbole about XHTML?

Though I know this is outside of Tina’s overview, I would have like to have more focus on the differences between the HTML5/WhatWG stuff and XHTML 2.0. It’s confusing that we have one group working supposedly on an “XHTML 5.0”, and another on XHTML 2.0. Especially when one of the main issues to do with XHTML 2.0 was XForms, while a milestone reached with HTML5 recently was the incorporation of Web Forms 2.0—but don’t let the “forms” that appears in both fool you into thinking we have any form of consensus or agreement.

I’m beginning to think that the HTML5 working group should completely and thoroughly remove all support for, and even mention of, XHTML from the HTML5 specification. The group finds extensibility to be anathema, but extensibility through namespaces is the heart and soul of XHTML. Seems to me that any form of XHTML, or nod to XHTML, coming out of the group would be a bastard cousin, at best.

Instead of XHTML coming out of the HTML5 group, perhaps we could look at ways to incorporate the new HTML5 objects via namespace to XHTML, but via the W3C XHTML path. In other words, honor the extensibility of XHTML, accept the necessity of a closed world for HTML5 and have one path for HTML, one separate path for XHTML, with the twain meeting via DOM. After all, it’s only serialization differences between XForms and Web Forms 2.0, right?

Or, conversely, we abandon the separate XHTML 2.0 path, and incorporate and embrace extensibility into HTML5. But I’m not one to bank on pigs flying.

I’m not a markup expert, nor am I involved in developing browsers, so perhaps my view is both simplistic and naive. But I can’t help thinking that the HTML5 working group does not have the mindset or interest in extensibility, and at most, will toss bits of seeming extensibility in to placate the noisy. However, this group’s continuing reference to an “XHTML 5” is confusing when you consider there’s a separate, formal upgrade path for XHTML 2.0. The W3C says there’s nothing to worry about because it’s all just serialization under the skin—but it goes beyond just basic serialization techniques, doesn’t it? If it were just serialization technique differences, would the same topics keep arising in the HTML5 WG threads? I mean, if working with RDF has taught me one thing, it’s that converting between two different forms of serialization is trivial—it’s the underlying model that matters.

Really, the W3C is leaving all of this in a bit of a mess.

However, I both digress and am going off on a tangent. This post was about Tina Holmboe’s XHTML overview, which is excellent and worth a read.

(via Simon Willison)


Distributed Extensibility

While I appreciate Mark Pilgrim’s This week in HTML5 land weekly reports, there’s one underlying thread that occurs every month that Mark doesn’t necessarily touch on: the issue of distributed extensibility. You know, the namespace, XHTML, SVG and MathML et al thing that doesn’t go away.

For instance, catching up on my HTML5 Working Group public archives reading, I found this gem from Chris Wilson of Microsoft:

You are correct, we cannot definitively say why XHTML has not been successful on the Web. However, I do believe that part of that lack of success is due to the less-forgiving XML syntax, and part of it is due to the degradation story (or lack thereof) in browsers and versions that don’t support it. (I don’t want to turn this into a pro/con XML debate either.) Part of its success in the future will be due to the important and focus it is lent by all of the major browsers. Perhaps I am misreading the tea leaves; I don’t see much interest in XHTML’s future from the other browsers. I do think XHTML would have a lot of positives as a basis; however, it does have a few negatives, and it would need to be a universal push if it were to be successful.

I would say that we can definitively state why XHTML has had limited success on the web: lack of implementation and support in IE, one of the web’s major browsers. In addition, none of the other browsers have said that they aren’t interested in supporting XHTML in the future. The fact that Microsoft’s main IE architect would make this statement leads me to believe he should be in politics.

And I’m only up to August in the archives. What other delights await in September and October…


HTML and XHTML and bears, oh my!

James Bennett writes on why HTML is the markup for him. There really isn’t anything to agree or disagree with, because he’s expressing his personal preferences. To him, the fact that you can co-mingle different vocabularies, such as XHTML, SVG, RDF, and MathML, isn’t enough to overcome the draconian error handling (there’s that term again, death to the term). Fair enough: XHTML isn’t for everyone.

One point of clarification, though: HTML5 isn’t just HTML, it’s also XHTML5. I know that the specification is misleadingly named, and seems to implicitly promise a path away from XHTML in the future, but I’d hate that those who prefer HTML would close that road for the rest of us; somehow helping to remove the option of using XHTML for those who have worked through the XML error handling in order to reach the advantages of a truly open page markup.

Working through the XML processing becomes less of a challenge as time goes on, as tools undertake the “burden” of ensuring proper markup so that we don’t have to be so encumbered. I’ve found the htmLawed Drupal plug-in to be wonderfully adapted to solving so many of the problems I’ve had with character encoding in the past. As for generating proper markup in the post, I can either manage the markup myself, which typically consists of paragraph and hypertext links, with an occasional image or SVG document; or I can have the filtered HTML option handle the markup, as it seems to respect and not munge SVG documents.

As for site design, every Drupal theme I’ve adapted so far has validated as strict XHTML. Makes my job pretty easy.

The point isn’t that HTML is better than XHTML, or that XHTML is better than HTML. The point is we all have our preferences, and we should expect browsers to properly handle both—now and in the future.

(via Simon)


Sometimes simplicity is the answer

Recovered from the Wayback Machine.

I never realized before that the difficulty with XHTML and allowing comments has a solution so breathlessly simple that I hit myself for not having seen it before.

I have configured the htmLawed module to “scrub” comments, but that wasn’t the solution. The solution is not to allow a person to save a comment until they preview the comment, first. If the input is invalid XHTML, they won’t see the form, or the form save button, in order to save the comment.

htmLawed should help with the accidentally invalid XHTML, and preview should help eliminate the deliberately invalid XHTML. We hope.

I’ve turned comments on. We’ll see how it goes.


Yesterday I discovered that the htmLawed module was still allowing the infamous U+FFFF et al through, and submitted a bug. Today, the htmLawed Drupal module was just updated to point to htmLawed source 1.0.9, which neutralizes the illegal Unicode characters that caused so many problems with my WordPress installations.

I am absolutely astonished at how fast and how responsive the htmLawed Drupal module developers are. I submitted a bug yesterday, and it was fixed by today. My comments should now be XHTML safe.


Run for the web

Recovered from the Wayback Machine.

A gentleman from the W3C was kind enough to point me to a newly tracked issue for the HTML5 working group related to namespaces in HTML5, entered by James Graham. I’m not a player in this game, because I can continue to use XHTML 1.1 until they pry it out of my cold, dead browser. However, it is good to see some concerted effort in adding SVG and MathML to HTML5, as well as XHTML5. The two are nothing more than serialization formats, and it shouldn’t matter which we use. However, as it stands now, the data model changes based on the serialization, and that’s not a particularly good thing.

In the meantime, XHTML is getting more kicks because of the draconian error handling. Seriously, I’d love to know who coined this term, so we can take them out behind the barn. Whether the comment was facetious or not, Ian Hickson’s statement that the great thing about XML’s well-formedness requirements is that this kind of thing can’t happen, because the author would catch this kind of error straight away, is true. Errors don’t creep in, they trumpet for attention. But, to each their own. I’m not a player in this game.