Adventures in XHTML

Recovered from the Wayback Machine.

During the recent light hearted discussions revolving around IE8 and its faithful companion, Wonder Tag, a second topic thread broke out about XHTML. As is typical whenever XHTML is brought up, the talk circles around to the draconian error handling or yellow screen of death when encountering even a small, harmless seeming discrepancy in a page’s markup.

However, the yellow screen of death is a factor of how Firefox deals with problems, not handling that’s inherent to serving XHTML as application/xhtml+xml. Safari’s error handling is much less extreme, attempting to render all of the ‘good’ markup up to the point where the ‘bad’ markup occurs.

Opera’s error handling is even more friendly. It provides the context of the error, which makes it the best tool for debugging a faulty XHTML page. You might say Opera is to XHTML, as Firebug is to JavaScript. The browser also provides an option to process the page as a more forgiving HTML.

To return to the discussion I linked earlier, in response to the mention of the draconian error handling, I wrote:

I can agree that the extreme error handling of the page can be intimidating, but it’s no different than a PHP page that’s broken, or a Java application that’s cracked, or any other product that hasn’t been put together right.

To which one of the commenters responded:

I don’t want to get off-topic either but I hear this nonsense a lot. You can’t simply compare a markup language with a programming language. They have very different intended authors (normal people versus programmers) and very different purposes.

I disagree. I believe you can compare a markup with a programming language. Both are based on technical specifications and both require an agent to process the text in a specific way to get a usable response. As with PHP or Java, you have to know how to arrange XHTML in order to get something useful. Because HTML has a more forgiving processor than the XHTML or PHP doesn’t make it less technical–just inherently more ‘loose’ for lack of a better term.

In my opinion, the commenter, Tino Zijdel, was in error on a second point, as well: markup isn’t specific to programmers. In fact, programmers are no better at markup than ‘normal’ people. Case in point is the error pages I’ve shown in this post.

As most of you are aware, I serve my pages up with the application/xhtml+xml MIME type. For those of you who have tried to access this site using IE, you’re also aware that I don’t use content negotiation, which tests to see if the browser is capable of processing XHTML and returns text/html if not.

Before yesterday, I still served up the WordPress administration pages as text/html, rather than application/xhtml+xml. Yesterday I threw the XHTML switch on the administration pages as well, and ended up with some interesting results. For instance, both plug-ins I use that have an options page had bad markup. In fact one, a very popular plug-in that publishes links into a post, had the following errors:

  • The ‘wrap’ class name wasn’t in quotes.
  • Five input fields were not properly terminated.
  • The script element didn’t have a CDATA wrapper.
  • Properties such as ‘disabled’ and ‘readonly’ were given as standalone values.
  • Two extraneous opening TR tags.
  • One non-terminated TR element.
  • Two terminating label elements without any starting tag.

For all of that, though, it didn’t take me more than about 15 minutes to fix the page, with a little help from Opera.

The WordPress administration pages work except for the Dashboard, where the version of jQuery that comes with WordPress didn’t seem to handle the Ajax calls to fill the page. I updated jQuery with the latest version, and the feed from the WordPress weblog shows, but not the other two items. At least, not with Firefox 3 or Safari, but all the content does show with Opera.

The Text Control plug-in had one minor XHTML error in the options page, but even when that was fixed, selecting a new text formatting option in the post doesn’t work–the selection goes back to the default. That one will end up being more challenging to fix, because I haven’t a clue what’s stopping the update.

WordPress does a decent job of generating proper XHTML content when using the default formatting. In fact the only problem I’ve had, other than when I embed SVG inline, was my own inaccurate use of markup. I used <code> elements, by themselves, when displaying block code. What I should have used is the <code> preceded by <pre>. When I do, the WordPress default formatting works without problems.

remove_filter('comment_text', 'wpautop', 30);
remove_filter('comment_text', 'wptexturize');
add_filter('comment_text', 'tc_comment');

My error, and the errors of the plug-in creators all demonstrate that though programmers might be more familiar with the consequences of making a mistake with technical text, we don’t make fewer mistakes than anyone else when it comes to using web page markup. Our only advantage is we’re not as intimidated by pages with errors. Regardless of how displayed or our relative technical expertise, though, these error messages aren’t necessarily a bad thing.

One of the advantages to serving the pages with application/xhtml+xml is that we catch mistakes before we serve the pages up to our readers. We definitely catch the mistakes before we release code that generates badly formed markup, or providing broken option pages to accompany our coded plug-ins. I can’t for the life of me understand why any programmer, web developer, or designer would want less than 100% accuracy from their web pages. That’s tantamount to saying, “Hire me. I write sloppy shit.”

Of course, being able to program can have advantages when working with XHTML, especially with many of today’s applications. WordPress does a good job at working in an XHTML environment, but not a great one. One example of where the application fails, badly, is in the Atom feed.

In Atom, WordPress outputs the HTML type as an attribute to many of the fields:

<summary type="<?php html_type_rss(); ?>">
<![CDATA[<?php the_excerpt_rss(); ?>]]></summary>
<?php if ( !get_option('rss_use_excerpt') ) : ?>

This is all well and good except for one thing: when the type is returned as ‘xhtml’, Atom feeds are supposed to use the following syntax for the content:

<summary type="xhtml"><div xmlns="">

This is an outright error in how the Atom feed is coded in WordPress. I’ve had to correct this in my own feed, and then remember not to overwrite my copy of the code whenever there’s an update. What the code should be doing is testing the type, and then providing the wrapper accordingly.

A second issue with WordPress is more subtle, and has to do with that part of XML I don’t consider myself overly familiar with: character sets and encoding. As soon as I switched on XHTML at my old weblog, I started to have problems with certain characters in my comments, and had to adjust the WordPress comment processing to allow for UTF-8 encoding. As it is, I’m not sure that I’ve covered all the bases, though I haven’t had any re-occurrence of the initial problems.

However, during the XHTML discussion, Philip Taylor demonstrated another problem in the WP code, in this case sending through a couple of characters that the WP search function did not like.

I checked with one of my two XHTML experts, Jacques Distler (the other being Sam Ruby), and the characters were Unicode, specifically:

utf-8 0xEFBFBE = U+FFFE
utf-8 0xEFBFBF = U+FFFF 

From Jacques I found that Philip likes the U+FFFE and U+FFFF Unicode characters because they’re not part of the W3C’s recommended regular expression for filtering illegal characters.

Unfortunately, to protect against these characters in search as well as comments required code in more than one place, and in fact, having to hack into the back end of WordPress. This is not an option available to someone who isn’t a programmer. However, this example doesn’t demonstrate that you have to be coder to serve pages as XHTML–it demonstrates that applications such as WordPress have a ways to go before being technically, rather than just cosmetically, compliant with XHTML.

Having said that, I can almost hear the voices now: Why bother, they say. After all, no one uses XHTML, do they?

Why bother? Well, for one thing, XHTML served as XML provides a way to integrate other XML-based specifications into the page content, including in-line SVG, as well as MathML, and even RDF/XML if we’re so inclined. The point is, serving XHTML as XML provides an open platform on which to build. Otherwise, we’re dependent on committees to hash through what will or will not be allowed into a specification, based on one company or another’s agenda.

We can include SVG into a page using an object element, but we can’t integrate something like SVG and MathML together without the ability to include both inline. We certainly can’t incorporate SVG into the overall structure of the page–at least not easily using separate files. There is no room in an HTML implementation for all the other XML-based vocabularies, and we can only cram so much into class attributes before the entire infrastructure collapses.

No, we need both: an HTML implementation for those not ready to commit to an XML-based implementation, and XHTML for the rest of us.

During the recent discussions on IE8, several people asked Chris Wilson from Microsoft whether IE8 will support the application/xhtml+xml MIME type. So far, we’ve not had an answer. Whatever the company decides, though, XHTML is not going away. The HTML5 working draft, which was just released, is about a vocabulary, not a specific implementation of that vocabulary. Both HTML and XHTML implementations are covered in the document, though XHTML isn’t covered as fully because most of the aspects of processing XHTML are covered in other documents. At least, that’s what we’re being told.

What’s critical for the HTML5 effort is that browsers support both implementations. Even the smallest mobile device is not going to be so overburdened by the requirements that it can’t consume pages delivered up as proper XHTML. It’s a sure thing that handling clean markup takes less requirements than handling a mess.

I’d also hate to think we’re willing to trade well designed and constructed web sites for pages filled with missing TR end tags, poorly nested elements, and unquoted class names, just because Microsoft can’t commit to the spec, and Firefox took the “bailing out now!” approach to error handling.


And they’re off

The ACID3 race has begun. Coming around the first lap…

Firefox 3 is in first place, with a comendable lead. Way to burn up the track, foxy!

[image gone]

Coming up from behind, we find the ACID crowd favorite, *Opera!

[image gone]

Winded, but still giving it all she’s got…Safari! (Is that a picture of a cat?)

[image gone]

And in the tail position, dragging, but not dead yet…IE!

[image gone]

The next lap is in six months. Get your bets in now.


*Testing with Opera’s 9.5 beta, we have a new winner, going into the first lap…

[image gone]


Macports, Unix, and Graphics

Recovered from the Wayback Machine.

My upcoming book, Painting the Web includes considerable coverage of technology-enabled graphics. Of course, all graphics are technology enabled, but when I say ‘technology-enabled’ I mean graphics via command line tools or accessed through programming language such as PHP.

What to cover wasn’t an easy choice. For instance, how much programming experience should we assume the reader has? Little? Lots? In the end, I focused the writing at a reader who has had exposure to JavaScript and/or PHP, but didn’t have to be either a pro or an expert.

Then there was the issue of the Unix command line, and installation applications for the Mac, such as Macports. Even experienced PHP/JavaScript developers may have no exposure to the Unix command line. Yet there is a wealth of resources available–in Linux and on the Mac–for people interested in graphics who are willing to forgo the desktop interface and get your Unix on, as the saying goes.

In the end, I covered these tools but promised the reader that I would provide web pages with up-to-date links to helpful tutorials and resources that could get them up to speed, either on Unix, or in the programming languages used. This includes one of my most used applications, MacPorts, the installation software useful for installing Unix-based applications on our computers.

Why would you be interested in MacPorts, especially if you’re into graphics?

When I was getting ready for Painting the Web, I spent an entire day downloading and installing software I planned to cover in the book on one of my Macs. An entire day, literally dozens of applications, and yet all combined, none of it took over a gigabyte on my hard drive. That’s one of the real advantages to using an application like MacPorts and free and open source applications that can be installed with this tool. In the graphics port area alone you have applications such as GIMPUFRaw (a RAW editor), Inkscape for vector graphics, the GD graphics library that I use so extensively at this site, libexif for parsing the EXIF section of a photo, and hundreds of other applications, including my favorite, ImageMagick.

Ah, ImageMagick. I can never say enough about ImageMagick. It has got to be one of the most entertaining sets of graphics tools in the living world. Best of all (well, other than it being free) most hosting companies have some version of ImageMagick installed, so you can access the command line tools without having to install them on your own Mac (or Windows, there is a Win version of ImageMagick). Still, if you can get a local copy on your Mac, installing this application pays for the Macports installation, all by itself. When you do install the tool set, make sure to spend time with the online examples, as documentation is a bit light for ImageMagick.

It’s a little ironic that one of the first things I wrote in a book on web graphics was to encourage people interested in graphics to become familiar with the Unix command line. The Unix command line is one of the most non-graphical technologies that exists today. Graphics, though, does not begin and end solely in Photoshop–limiting your tools to those that have a GUI and that are installed with one click of the mouse limits the amount of fun you can have with graphics. And if we’re not having fun, why bother?

  • You will need to install the Apple X11 system using the Mac OS X Install Disc, first. The MacPorts instructions cover this.
  • Next is MacPorts of course. You may have also heard this application called, “DarwinPorts”. The site has a list of ported applications, as well as excellent documentation.
  • Another MacPorts tutorial, providing more of an overview. You can also find an overview of MacPorts at Lockergnome.
  • I don’t use a GUI to MacPorts, but some of you might like one. There are several, including PortAuthority and Porticus. The benefit of a GUI tool is that it can be easier to see, at a glance, what’s installed.
  • One of the advantages of using MacPorts is installing applications that work together, such as the LAMP trifecta: Apache+MySQL+PHP. I found a couple of different tutorials on using MacPorts for installing these three applications: a fairly detailed and involved approach, which might be a little intimidating to new command line users; steps for a Leopard installation. I’m not running Leopard, so I’m not sure how accurate the steps covered are. Frankly, if you don’t need the trifecta, and you’re just playing around with the graphics, I’d get more comfortable with MacPorts and the command line before installing these three. If you want to try some of the PHP-based graphics applications, though, you’ll have to install at least Apache and PHP.
  • One thing about MacPorts is that if there is an application dependency for the application you’re installing, the tool automatically downloads and installs this dependency. I have found that with GIMP, if I use MacPorts to install UFraw, first, it downloads and installs the latest GIMP, and then integrates the two. With this integration, UFraw pre-processes a RAW photo, first, before passing the photo on to GIMP. Regardless, of how you install the tools, you’ll definitely want to be consistent: if you use MacPorts to install UFRaw, don’t use the standalone click installer for GIMP–use MacPorts. Otherwise the GIMP application is installed in the wrong place, and UFRaw can’t find it.
  • ImageMagick is also an available port on MacPorts. There are a significant number of dependencies for ImageMagick, so it make take a considerable amount of time to install this application. May I say, though, that the results are worth the effort? Unfortunately, most of the programming language interfaces to ImageMagick are not in ports. For instance, I use iMagick (source), a PHP-based ImageMagick wrapper, which is accessible via PECL, a PHP extension system, but not MacPorts. No worries, though, as these language-based wrappers are typically quite easy to install. If you’re a Ruby user, you’re in luck: RMagick is a MacPorts port.
  • Throughout all of this, even if you use a GUI MacPorts interface tool, at some point, you’re going to be messing with the Terminal application for the Mac. The Terminal provides an interface into the underlying Unix system, and command line. There are tutorials in using the Terminal, including a TidBits tutorial (part 2 and part 3) and several older articles from O’Reilly.
  • There are a ton of Unix command line how-tos, helps, and tutorials. The nice thing about the Unix command line is the tools you use most, rarely change. Benjamin Han has provided several Mac Unix how-tos, this Mac forum thread provides some nice jumping off points, and there are a couple of books for Mac users covering the command line, though I haven’t read any and so can’t provide a recommendation. You might also want to spend some time with shell scripting especially if you want to package your ImageMagick commands.

This is a start, and I’ll be adding to this list before I formalize it into a separate reference page. If you know of any other resource that should be included, please drop me a note or leave a comment.

Of course, it goes without saying that even the best laid plans go awry, and you’ll want to backup your hard drive before installing MacPorts and any of the applications. I also recommend searching on “MacPorts” and the application name in Google or Yahoo, first. You can sometimes find better ways of installing sets of applications, such as Apache2+PHP5+MySQL. If you’re using Leopard, or running on an Intel-based Mac, you’ll also want to double check that the application does work in your environment.

Happy MacPorting.