RDFaification of Drupal 6

You don’t have to wait for Drupal 7 to RDFaificate your Drupal site. I spent yesterday tweaking my space, and if you access the site now with a tool, such as the Semantic Radar Firefox add-on, you’ll see all sorts of semantic goodness. I used a combination of plug-ins and theme modifications to make my changes, and will probably add to the overall effect over time.

What simplified my RDFa integration is that my site was already being served up as valid XHTML, via a modification to my page.tpl.php file:

<?php
header("Vary: Accept");
if (stristr($_SERVER["HTTP_ACCEPT"], "application/xhtml+xml"))
    header("Content-Type: application/xhtml+xml; charset=utf-8");
else
    header("Content-Type: text/html; charset=utf-8");
?><!DOCTYPE html PUBLIC
    "-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
    "http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" 
xml:lang="<?php print $language->language ?>">

The PHP code checks with the user agent accessing the page. If the user agent accepts XHTML, the code returns the pages as XHTML; otherwise, the pages are returned as HTML. However, the DOCTYPE I had been using was a SVG+MathML DOCTYPE, because of my sometimes use of embedded SVG. To validate as XHTML+RDFa, though, you need to use the RDFa DOCTYPE.


<?php
header("Vary: Accept");
if (stristr($_SERVER["HTTP_ACCEPT"], "application/xhtml+xml"))
    header("Content-Type: application/xhtml+xml; charset=utf-8");
else
    header("Content-Type: text/html; charset=utf-8");
?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" 
   "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" dir="ltr"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<head profile="http://ns.inria.fr/grddl/rdfa/">

The namespaces in the HTML opening tag don’t reflect all that I’ll use in my pages, just the ones I used for RDFa annotation sprinkled, liberally, throughout the page. When I use embedded SVG, I can just add the SVG namespaces directly into the opening SVG element tag. I could add the namespaces now, but I don’t always use embedded SVG.

One unfortunate consequence of switching DOCTYPEs is that when I do use embedded SVG, the page won’t validate. However, this won’t impact on the user agents and their ability to process the SVG correctly, so I’ll just have to live with the invalidation errors. That’s the joy of DOCTYPEs.

Another change is to the opening HEAD tag, where I added the GRDDL profile. This lets data consuming agents know that I’m, first of all, using RDFa, then secondly, using the latest transform profile for RDFa. After all, once the data is provided, we assume someone will want to do something with the data.

I’m ready, now, to begin adding RDFa annotation. Some of the changes I can make directly to the theme pages, such as adding an attribute/value pair of property=”dc:title” to my header element that references my site’s title (“Burningbird’s RealTech”). I also added annotation within the node, via node.tpl.php, again adding property=”dc:title” to each individual site entry’s title.

Other annotation, though, required either the use of a Drupal module, or custom code. For instance, one change I wanted to make was to add a a property=”dc:subject” to my vocabulary terms. In my template.php file (used to override and extend the theme templating engine), I added a taxonomy term function that will not only append the vocabulary to each term, but also annotate the result with the RDFa dc:subject notation:

// split out taxonomy terms by vocabulary
function burningbirds_print_terms($nid) {
     $vocabularies = taxonomy_get_vocabularies();
     $output = '<ul class="links inline">';
     foreach($vocabularies as $vocabulary) {
       if ($vocabularies) {
         $terms = taxonomy_node_get_terms_by_vocabulary($nid, $vocabulary->vid);
         if ($terms) {
           $links = array();
           $output .= '<li property="dc:subject">' . $vocabulary->name . ': ';
           foreach ($terms as $term) {
             $links[] = l($term->name, taxonomy_term_path($term), array('rel' => 'tag', 'title' => strip_tags($term->description)));
           }
           $output .= implode(', ', $links);
           $output .= '</li>';
         }
       }
     }
     $output .= '</ul>';
     return $output;
}

In the node.tpl.php file, I then replaced the existing print $terms line with a reference to my custom terms display function:

    <div class="taxonomy">
      Tagged: <?php print burningbirds_print_terms($node); ?>
    </div>

Other areas that can be annotated with RDFa in an entry are the author and date, but I didn’t have to code these or modify the theme template directly. Instead, I downloaded and installed the Submitted By module. Once installed and activated, this module provides an “Appearance” field in the content type form, which you can use to modify the “submitted by” line in posts.

By default, the template engine generates a line with the author’s username, linked to their user profile, and the date and time when the entry was created. I modified the field to show the author’s name, without linking to the author profile, since I’m the only author. I also modified the post date to just the date. Time, to me, just isn’t relevant for my site. Adding the appropriate RDFa annotation results in the following pattern:

<span property="dc:creator">[author-name-raw]</span> on [day], <span property="dc:date">[yyyy]-[mm]-[dd]</span>

Now that I’ve annotated several elements in the page with RDFa, I went shopping around at various semantic websites to see what else they were providing by way of semantic markup. At Danny Ayers weblog my Semantic Radar toolbar alerted me to the presence of SIOC (Semantically-Interlinked Online Communities Project) data, one of the recommended data types supported by Yahoo’s SearchMonkey. I did a little research and found the SIOC Drupal module, which I downloaded and installed.

The SOIC module automatically generates SIOC, which can be accessed as a direct RDF export. I gather that the module also adds a link to this metadata via the menu system, but I found this only works with a theme like Garland’s. I wanted to be able to integrate a link in the header of my web pages, to page specific SIOC exports, wherever applicable. I checked the module’s documentation, and elsewhere, but couldn’t find anything on automatically adding this link, so decided to add it myself in my theme.

In Drupal, at least 6.x, you can add a preprocess function that will pre-process web page data before the page is displayed. I had such a pre-process function already, to modify my header to a) remove the RSS 2.0 link, and b) modify the content type meta tag to reflect my XHTML content type. It was a simple matter to modify this code to include a conditional check to see if the page being served is the Drupal front page, and if not, whether the page is presenting a node of type story, blog, or user. If the former, I provided a link to the site’s main SIOC export URL; the later, one specific to the node:

function burningbirds_preprocess_page(&$vars) {

  $head = $vars['head'];
  $node = $vars['node'];
  $head = str_replace("<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />",
              "<meta http-equiv=\"Content-Type\" content=\"application/xhtml+xml; charset=UTF-8\" />", $head);
  
  $head = str_replace("<link rel=\"alternate\" type=\"application/rss+xml\" title=\"Burningbird's RealTech RSS\" href=\"http://realtech.burningbird.net/rss.xml\" />\n","",$head);
    
  if (drupal_is_front_page()) {
    $head .= '<link rel="meta" type="application/rdf+xml" title="SIOC" href="http://realtech.burningbird.net/sioc/site" />';
  } else if ($node->type == 'story' || $node->type == 'blog' || $node->type == 'user') {
    $head .= '<link rel="meta" type="application/rdf+xml" title="SIOC" href="http://realtech.burningbird.net/sioc/node/';
    $head .= $node->nid;
    $head .= '" />';
  }

  $vars['head'] = $head;
}

For the node pages, I check to see if the node type is blog, story, or user, as these are the only node types currently supported by the SIOC module. Once this change was in effect, a header link to the SIOC file now appears with the main site page, and with blog, story, and user pages.

This is a start, as I explore other ways to annotate my site with metadata. I also plan on using metadata annotation when I do reviews and other specific types of writing. In addition, I’ll probably add a generic FOAF page, as well as utilize other vocabularies as they present themselves. If you don’t have the Semantic Radar toolbar installed, you can use the W3C RDFa extractor to extract the site’s RDFa. You can see the SIOC by accessing the exporter for the site or an individual entry, such as this story.

Print Friendly, PDF & Email