December 13th, 2007

What does it take to convert your Wordpress weblog to XHTML?

First, the template has to be valid XHTML. One way to check this is to make sure the page validates as XHTML, first, before actually converting the page to XHTML. I use an XHTML 1.1 DOCTYPE that supports MathML and SVG:


<!DOCTYPE html PUBLIC
    "-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
    "http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
I also add XHTML, SVG, and XLink namespaces:

<html xmlns="http://www.w3.org/1999/xhtml" 
      xmlns:svg="http://www.w3.org/2000/svg"
      xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en">

When you validate the page the validator will let you know that the DOCTYPE differs from the page MIME type, but shouldn't impact on the validation process. Just make sure that the validator is treating your page as XHTML.

The reason why the Validator assumes the page is HTML is because the page is served up as HTML at this point, Wordpress wants to serve pages up as HTML. In fact, Wordpress fights you every step in the way when it comes to serving your page as XHTML. Luckily, there's nice people who build plug-ins to ensure your page is served up as XHTML. However, not every browser supports XHTML. For those limited browsers, we have to serve the pages as HTML. If we don't, the limited browser (that would be, IE) has a problem serving the pages.

Testing to see what a browser can handle is known as content negotiation. There is a way you can implement content negotiation with .htaccess, but this approach doesn't work well with Wordpress. Instead, I use the m0n5t3's nest "content negotation plug-in for Wordpress". I install it, activate it, and it manages the content negotation for me–serving pages as XHTML for browsers that can handle it; and HTML for those that can't (IE).

To ensure the comments work, I added the following line to wp-comments-post.php before saving the comment:

$comment_content = mb_convert_encoding($comment_content, "UTF-8","auto");

If you've followed my steps so far, congratulations! You're now serving your pages as XHTML. Now, go back through your archives. Be prepared for:

  • Yellow screen of death for Firefox
  • Opera's polite, "You're F**cked!" elegant gray
  • Safari's, "You're hurting me!" page
  • IE is reading the page as HTML, which means it doesn't care that your page is crappy.

I've had a weblog for years, other pages even longer. I have used old HTML, dated HTML, and good HTML, used badly. This means I have a lot of pages that will break when served as XHTML.

There might be *nice, automated applications that can fix all my bad uses of HTML. I've not tried to create such an application, nor have I found one. Instead, I fix pages manually, based on someone letting me know they've found a broken page. I also have an application I run that shows me which pages are broken. I run this application when I have time, fixing pages.

The application I use to find bad XHTML pulls the content in from the Wordpress database:


<?php
require_once('./wp-config.php');
require_once('./XhtmlValidator.php');

global $wpdb;

$sql="select ID,post_content from $wpdb->posts 
where post_status = 'publish'
ORDER BY ID ASC ";

$lines = $wpdb->get_results($sql);
if ($lines) {

   foreach ($lines as $line) {
      $post = $line->ID;
      $data = "<div>" . $line->post_content . "</div>";
      $XhtmlValidator = new XhtmlValidator();
      if($XhtmlValidator->validate($data) === false){
         echo "Post $post <br />\n";
         $XhtmlValidator->showErrors();
      }
    }
}

?>

As you can see from accessing the application, I still have work to do. I make use of a PHP class, XhtmlValidator, from Akelos Framework. It works nicely. Too nicely.

Of course, the upside to all of this is that my new posts are XHTML valid, or I wouldn't be able to publish them. To ensure this continues this way, I turn off WP formatting for those posts that Wordpress formats incorrectly. For instance, I can't use Wordpress default formatting when I use CODE elements, because WP wants to insert inappropriate paragraph tags.

Is it work? Yes, but when you're done you know, without a doubt, that all your i's are dotted, your t's crossed. You also know that you can add trees.

Christmas Tree holiday religholiday festive advent christmas christianity recreation Aaron Spike Aaron Spike Aaron Spike image/svg+xml en

And cute, cuddly bears.

image/svg+xml
And choo-choo trains.
image/svg+xml

Which, unfortunately, you can't see if you're using IE. They're cute, take my word for it. And semantical, too, thanks to RDF embedded with the image. All allowed, because the page is served up as XHTML.

(SVG images from Wikipedia. Artists: Aaron Spike, Richard Thompson, and Jarno Vasamaa)

(Per Sam Ruby, HTML5Lib should be able to fix the XHTML. )
October 10th, 2007
  • Open source developers, providers of free or inexpensive shareware applications, those working on open standards and specifications, or providing documentation, tutorials, and help for all of the above: you almost make me believe there is a land over the rainbow, and that it has fairies and unicorns and we never have to wear shoes. I don't thank you, as often enough, and as much as you deserve.
  • Speaking of which: whoever came up with the original idea for CSS, you deserve chocolates
  • Everyone is mad at Apple for iPhone, but I don't care: Safari 3 is a wonderful browser. Color management, far out. And Opera? Thanks for standing up for standards. Firefox, you're cool, too, but you need to commit to implementing one spec before you start on others. Oh, and it would be really nice if you didn't crash so much. No, really that would be cool.
  • The WhatWG and (X)HTML5 efforts are, in my opinion, not the best use of resources. We've spent years separating presentation values from page layout, only to turn around and make the same mistake with semantics. Accessibility is in; accessibility is out. Machine versus human semantics; Indent versus blockquote. Hey! Poem markup! SVG isn't 'semantically rich' . When semantics have to be hard coded into the syntax, satisfaction will never be guaranteed. Open models, not new specs. When will they ever learn? When will they e-v-e-r learn.
  • Regarding microformats: Using "rel", "class", and "profile", as the only available means in which to add semantics to markup is the same as using LOLCats to re-define the Bible: it's pidgin markup. "Me class sitting. Me relate chair. Chair relate desk. Me class watching. Me relate windows. Window relate Woman. Woman class running. Woman relate street. Woman class feeling. Feeling relate weather. Weather class cool. Weather class fall. Me class wistful. Me class wishing. Me relate woman."–this is my sad attempt to describe my sitting in a chair at my desk, looking out through my open window at a woman jogging along in the wonderfully cool fall weather, wishing I was her instead me being here at the computer. At some point in time, simplicity breaks down and you want a richer method in which to express your meaning.
  • Chew on this: pictures as data, as well as visual, entities.
  • Canvas is cool, but SVG is better. It's not just because SVG elements become part of the Document Object Model (DOM) and are easily scriptable. It's because we can find SVG similar to what we want, copy it, manipulate it, and we don't have to know any scripting. I wanted images of musical notes and searched on "music notes svg", which led me to this Wikipedia page and this (as well as this) public domain SVG. I copied the SVG file and deleted the SVG creating the bars–no bitmap tool magic needed to pull the notes separate from the bars. I split the notes into two separate images by coping and pasting the two different elements. I copied the SVG for both into this post, and scaled them into tiny little representations of themselves. Though the browser had to reach to scale them so small, we're not left with a tiny little bitmap blobs.

    I did think about using the following image, copied from this resource. Oh look, the original SVG contains metadata defined using RDF/XML. Isn't it marvelous when you can merge rich, well defined XML vocabularies together? Just like that?

    –svg image–
  • Silverlight: Why? There's nothing in Silverlight 1.0 that doesn't exist as an open standard and can't be supported for IE applications–if Microsoft would just support them. Silverlight as a 2D graphics system? Both SVG and Canvas are 2D graphics systems. Microsoft supports form controls like buttons? Hey! Guess what we've had in HTML for years? Silverlight 1.1 integrates web browser and ASP.NET functionality, which means you can use your Microsoft Visual Studio and Microsoft Web Expression applications to create Rich Internet Applications? Fantastic! It still doesn't change the fact that Microsoft pushed its browser on the same developers it's trying to suck into the Silverlight world, and then abandoned it, and us, for five years; effectively holding up advances in internet development for half a decade.
  • Adobe Flex/AIR: Why? It's true that Flash has done much for us over the years, and we're grateful, but we're ready to move into a new era of open standard applications and, frankly, Adobe, you're rather hit and miss when it come to 'open' and 'standard'. Take your SVG plug-in. It's cool and we thank you for providing it so that IE users could see what they're missing using a half-assed browser. Now you're going to pull the plug-in and your support for it. Why not open source it, and let the open source community decide if it wants to continue to support it? Is it because, as has been noted elsewhere, you want us to consider converting [our] SVG application to an Adobe Flex® application? Golly, I just love these opportunities to get sucked into another bloated, proprietary application environment. It makes me feel so good when you finally, inevitably, stop.