December 14th, 2007

It's nice when I can recycle concepts from many years ago for new releases of technology. Take, for instance, my concept of iron clouds and the release of Google's Knols.

An iron cloud is a cloud–a resource accessible by anyone, anywhere–that is seemingly open and accessible but has, as its core, a heart of iron: it's owned by a single entity. It is centralized by a single entity, regardless of its physical distribution. To me, there can be no true cloud when there's ownership.

Udi Manber introduces Knols with the following:

The web contains an enormous amount of information, and Google has helped to make that information more easily accessible by providing pretty good search facilities. But not everything is written nor is everything well organized to make it easily discoverable. There are millions of people who possess useful knowledge that they would love to share, and there are billions of people who can benefit from it. We believe that many do not share that knowledge today simply because it is not easy enough to do that. The challenge posed to us by Larry, Sergey and Eric was to find a way to help people share their knowledge. This is our main goal

We believe that many do not share that knowledge today simply because it is not easy enough to do that. This just isn't true, and a rather backhanded insult to another Google property, Blogger. If a person is comfortable enough with the Net, they have all they need to be able to start contributing to the Net, through Blogger, Typepad, Wordpress.com, and so on. There may be experts who refuse to put their material online until they're paid for it, but then again, we have sites like Huffington Post, which provides both payment and place in which the experts may dabble their little toes.

If this is a snide aside to Wikipedia's reliance on Wiki technology, I don't think any of us has seen that Wikipedia suffers heavily from too many people constrained from contributing. The problems at Wikipedia are based on organization and clannishness, not technology.

In fact, nothing about what Google is saying about Google Knols makes sense, and therefore one has to treat this new 'gift' with suspicion, and indeed, some alarm.

People have been saying that Knols are a way for Google to get back at Wikipedia, but in actuality, they're a way to get back at us. We have dirtied the pristine, perfect field of search, where only the cream floats to the top. We don't use NOFOLLOW on our links, and link indiscriminately, without a care or thought to how the search engine may suffer under our abuse. We toss our own half baked opinions out into the void and are linked, in turn, to further sully search results. Frankly, we're messy, and muck up the algorithms.

The whole point of RDF/OWL, first, and microformats and even HTML5, was so that we all could eventually annotate our material more properly, helping to make a better, more searchable knowledgebase that expands ever outward over time. We are the cloud. However, rather than trust us to form this knowledgebase on our own, Google has now taken matters into its own hands.

I feel like the time when I was a child, and sought to help my grandmother clean up after a holiday meal. I grabbed a dish towel and was reaching for one of the fine china plates, when my grandmother, reacting in horror, snatched it out of my grasp and told me to go play with the other children; before you break something going unspoken, but understood.

Danny Sullivan commenting on Knols, writes the following:

Why do Knol? Google vice president of engineering, Udi Manber, who heads the project, told me that is designed to help people put knowledge on the web that doesn't currently exist, which in turn should make search better, since there will be better information out there.

Of course, Google already offers other content creation tools, such as Blogger and Google Page Creator. In addition, there are non-Google tools people already use to publish content, not to mention collaborative tools such as those I named at the opening of this article. Why yet another tool?

Manber said that Knol has a special focus on authors and a collection of tools that Google thinks is unique, and which in turn should encourage both content creation and readership.

"Knol is all about the authors," he said. "We believe that knowing who wrote a knol will significantly help users make better use of web content."

I can feel the plate being snatched as I read these words.

Leaving aside the worrisome effect of 'knowledge' being centered in and controlled by Google, via its search engine and now Knols, Google is making the same calculated mistake with Knols, as Microsoft does with IE: rather than work with the community, using community tools and specifications, it goes its own proprietary path–using its considerable market presence to ensure it becomes a force regardless of the soundness, or rightness, of its approach.

Google also undercuts the more or less altruistic nature of the knowledge web in the past, with promises of remuneration for those who choose to contribute Knols (and not so coincidentally, profiting Google at the same time). It reminds me of what someone told me a year or so ago: that not having ads in my sidebar makes my site look amateurish. I guess the days when people shared knowledge just to share are over.

update

Best title: Google Runs Out of Content to Monetize; Wants You to Build More.

December 13th, 2007

What does it take to convert your Wordpress weblog to XHTML?

First, the template has to be valid XHTML. One way to check this is to make sure the page validates as XHTML, first, before actually converting the page to XHTML. I use an XHTML 1.1 DOCTYPE that supports MathML and SVG:


<!DOCTYPE html PUBLIC
    "-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
    "http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
I also add XHTML, SVG, and XLink namespaces:

<html xmlns="http://www.w3.org/1999/xhtml" 
      xmlns:svg="http://www.w3.org/2000/svg"
      xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en">

When you validate the page the validator will let you know that the DOCTYPE differs from the page MIME type, but shouldn't impact on the validation process. Just make sure that the validator is treating your page as XHTML.

The reason why the Validator assumes the page is HTML is because the page is served up as HTML at this point, Wordpress wants to serve pages up as HTML. In fact, Wordpress fights you every step in the way when it comes to serving your page as XHTML. Luckily, there's nice people who build plug-ins to ensure your page is served up as XHTML. However, not every browser supports XHTML. For those limited browsers, we have to serve the pages as HTML. If we don't, the limited browser (that would be, IE) has a problem serving the pages.

Testing to see what a browser can handle is known as content negotiation. There is a way you can implement content negotiation with .htaccess, but this approach doesn't work well with Wordpress. Instead, I use the m0n5t3's nest "content negotation plug-in for Wordpress". I install it, activate it, and it manages the content negotation for me–serving pages as XHTML for browsers that can handle it; and HTML for those that can't (IE).

To ensure the comments work, I added the following line to wp-comments-post.php before saving the comment:

$comment_content = mb_convert_encoding($comment_content, "UTF-8","auto");

If you've followed my steps so far, congratulations! You're now serving your pages as XHTML. Now, go back through your archives. Be prepared for:

  • Yellow screen of death for Firefox
  • Opera's polite, "You're F**cked!" elegant gray
  • Safari's, "You're hurting me!" page
  • IE is reading the page as HTML, which means it doesn't care that your page is crappy.

I've had a weblog for years, other pages even longer. I have used old HTML, dated HTML, and good HTML, used badly. This means I have a lot of pages that will break when served as XHTML.

There might be *nice, automated applications that can fix all my bad uses of HTML. I've not tried to create such an application, nor have I found one. Instead, I fix pages manually, based on someone letting me know they've found a broken page. I also have an application I run that shows me which pages are broken. I run this application when I have time, fixing pages.

The application I use to find bad XHTML pulls the content in from the Wordpress database:


<?php
require_once('./wp-config.php');
require_once('./XhtmlValidator.php');

global $wpdb;

$sql="select ID,post_content from $wpdb->posts 
where post_status = 'publish'
ORDER BY ID ASC ";

$lines = $wpdb->get_results($sql);
if ($lines) {

   foreach ($lines as $line) {
      $post = $line->ID;
      $data = "<div>" . $line->post_content . "</div>";
      $XhtmlValidator = new XhtmlValidator();
      if($XhtmlValidator->validate($data) === false){
         echo "Post $post <br />\n";
         $XhtmlValidator->showErrors();
      }
    }
}

?>

As you can see from accessing the application, I still have work to do. I make use of a PHP class, XhtmlValidator, from Akelos Framework. It works nicely. Too nicely.

Of course, the upside to all of this is that my new posts are XHTML valid, or I wouldn't be able to publish them. To ensure this continues this way, I turn off WP formatting for those posts that Wordpress formats incorrectly. For instance, I can't use Wordpress default formatting when I use CODE elements, because WP wants to insert inappropriate paragraph tags.

Is it work? Yes, but when you're done you know, without a doubt, that all your i's are dotted, your t's crossed. You also know that you can add trees.

Christmas Tree holiday religholiday festive advent christmas christianity recreation Aaron Spike Aaron Spike Aaron Spike image/svg+xml en

And cute, cuddly bears.

image/svg+xml
And choo-choo trains.
image/svg+xml

Which, unfortunately, you can't see if you're using IE. They're cute, take my word for it. And semantical, too, thanks to RDF embedded with the image. All allowed, because the page is served up as XHTML.

(SVG images from Wikipedia. Artists: Aaron Spike, Richard Thompson, and Jarno Vasamaa)

(Per Sam Ruby, HTML5Lib should be able to fix the XHTML. )