What does it take to convert your Wordpress weblog to XHTML?
First, the template has to be valid XHTML. One way to check this is to make sure the page validates as XHTML, first, before actually converting the page to XHTML. I use an XHTML 1.1 DOCTYPE that supports MathML and SVG:
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
"http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
I also add XHTML, SVG, and XLink namespaces:
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en">
When you validate the page the validator will let you know that the DOCTYPE differs from the page MIME type, but shouldn't impact on the validation process. Just make sure that the validator is treating your page as XHTML.
The reason why the Validator assumes the page is HTML is because the page is served up as HTML at this point, Wordpress wants to serve pages up as HTML. In fact, Wordpress fights you every step in the way when it comes to serving your page as XHTML. Luckily, there's nice people who build plug-ins to ensure your page is served up as XHTML. However, not every browser supports XHTML. For those limited browsers, we have to serve the pages as HTML. If we don't, the limited browser (that would be, IE) has a problem serving the pages.
Testing to see what a browser can handle is known as content negotiation. There is a way you can implement content negotiation with .htaccess, but this approach doesn't work well with Wordpress. Instead, I use the m0n5t3's nest "content negotation plug-in for Wordpress". I install it, activate it, and it manages the content negotation for me–serving pages as XHTML for browsers that can handle it; and HTML for those that can't (IE).
To ensure the comments work, I added the following line to wp-comments-post.php before saving the comment:
$comment_content = mb_convert_encoding($comment_content, "UTF-8","auto");
If you've followed my steps so far, congratulations! You're now serving your pages as XHTML. Now, go back through your archives. Be prepared for:
- Yellow screen of death for Firefox
- Opera's polite, "You're F**cked!" elegant gray
- Safari's, "You're hurting me!" page
- IE is reading the page as HTML, which means it doesn't care that your page is crappy.
I've had a weblog for years, other pages even longer. I have used old HTML, dated HTML, and good HTML, used badly. This means I have a lot of pages that will break when served as XHTML.
There might be *nice, automated applications that can fix all my bad uses of HTML. I've not tried to create such an application, nor have I found one. Instead, I fix pages manually, based on someone letting me know they've found a broken page. I also have an application I run that shows me which pages are broken. I run this application when I have time, fixing pages.
The application I use to find bad XHTML pulls the content in from the Wordpress database:
<?php
require_once('./wp-config.php');
require_once('./XhtmlValidator.php');
global $wpdb;
$sql="select ID,post_content from $wpdb->posts
where post_status = 'publish'
ORDER BY ID ASC ";
$lines = $wpdb->get_results($sql);
if ($lines) {
foreach ($lines as $line) {
$post = $line->ID;
$data = "<div>" . $line->post_content . "</div>";
$XhtmlValidator = new XhtmlValidator();
if($XhtmlValidator->validate($data) === false){
echo "Post $post <br />\n";
$XhtmlValidator->showErrors();
}
}
}
?>
As you can see from accessing the application, I still have work to do. I make use of a PHP class, XhtmlValidator, from Akelos Framework. It works nicely. Too nicely.
Of course, the upside to all of this is that my new posts are XHTML valid, or I wouldn't be able to publish them. To ensure this continues this way, I turn off WP formatting for those posts that Wordpress formats incorrectly. For instance, I can't use Wordpress default formatting when I use CODE elements, because WP wants to insert inappropriate paragraph tags.
Is it work? Yes, but when you're done you know, without a doubt, that all your i's are dotted, your t's crossed. You also know that you can add trees.
And cute, cuddly bears.
Which, unfortunately, you can't see if you're using IE. They're cute, take my word for it. And semantical, too, thanks to RDF embedded with the image. All allowed, because the page is served up as XHTML.
(SVG images from Wikipedia. Artists: Aaron Spike, Richard Thompson, and Jarno Vasamaa)
(Per Sam Ruby, HTML5Lib should be able to fix the XHTML. )