Oh look it’s not just us Semantic Web dweebs who noticed

A List Apart has a new article out on the Semantics in HTML5. John Allsopp writes

We’ll start by posing the question: “why are we inventing these new elements?” A reasonable answer would be: “because HTML lacks semantic richness, and by adding these elements, we increase the semantic richness of HTML—that can’t be bad, can it?”

By adding these elements, we are addressing the need for greater semantic capability in HTML, but only within a narrow scope. No matter how many elements we bolt on, we will always think of more semantic goodness to add to HTML. And so, having added as many new elements as we like, we still won’t have solved the problem. We don’t need to add specific terms to the vocabulary of HTML, we need to add a mechanism that allows semantic richness to be added to a document as required. In technical terms, we need to make HTML extensible. HTML 5 proposes no mechanism for extensibility.

On reading of which, I hurt my head by banging it, suddenly and with force, against my desk.

RDFaification of Drupal 6

You don’t have to wait for Drupal 7 to RDFaificate your Drupal site. I spent yesterday tweaking my space, and if you access the site now with a tool, such as the Semantic Radar Firefox add-on, you’ll see all sorts of semantic goodness. I used a combination of plug-ins and theme modifications to make my changes, and will probably add to the overall effect over time.

What simplified my RDFa integration is that my site was already being served up as valid XHTML, via a modification to my page.tpl.php file:

<?php
header("Vary: Accept");
if (stristr($_SERVER["HTTP_ACCEPT"], "application/xhtml+xml"))
    header("Content-Type: application/xhtml+xml; charset=utf-8");
else
    header("Content-Type: text/html; charset=utf-8");
?><!DOCTYPE html PUBLIC
    "-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
    "http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" 
xml:lang="<?php print $language->language ?>">

The PHP code checks with the user agent accessing the page. If the user agent accepts XHTML, the code returns the pages as XHTML; otherwise, the pages are returned as HTML. However, the DOCTYPE I had been using was a SVG+MathML DOCTYPE, because of my sometimes use of embedded SVG. To validate as XHTML+RDFa, though, you need to use the RDFa DOCTYPE.


<?php
header("Vary: Accept");
if (stristr($_SERVER["HTTP_ACCEPT"], "application/xhtml+xml"))
    header("Content-Type: application/xhtml+xml; charset=utf-8");
else
    header("Content-Type: text/html; charset=utf-8");
?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" 
   "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" dir="ltr"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<head profile="http://ns.inria.fr/grddl/rdfa/">

The namespaces in the HTML opening tag don’t reflect all that I’ll use in my pages, just the ones I used for RDFa annotation sprinkled, liberally, throughout the page. When I use embedded SVG, I can just add the SVG namespaces directly into the opening SVG element tag. I could add the namespaces now, but I don’t always use embedded SVG.

One unfortunate consequence of switching DOCTYPEs is that when I do use embedded SVG, the page won’t validate. However, this won’t impact on the user agents and their ability to process the SVG correctly, so I’ll just have to live with the invalidation errors. That’s the joy of DOCTYPEs.

Another change is to the opening HEAD tag, where I added the GRDDL profile. This lets data consuming agents know that I’m, first of all, using RDFa, then secondly, using the latest transform profile for RDFa. After all, once the data is provided, we assume someone will want to do something with the data.

I’m ready, now, to begin adding RDFa annotation. Some of the changes I can make directly to the theme pages, such as adding an attribute/value pair of property=”dc:title” to my header element that references my site’s title (“Burningbird’s RealTech”). I also added annotation within the node, via node.tpl.php, again adding property=”dc:title” to each individual site entry’s title.

Other annotation, though, required either the use of a Drupal module, or custom code. For instance, one change I wanted to make was to add a a property=”dc:subject” to my vocabulary terms. In my template.php file (used to override and extend the theme templating engine), I added a taxonomy term function that will not only append the vocabulary to each term, but also annotate the result with the RDFa dc:subject notation:

// split out taxonomy terms by vocabulary
function burningbirds_print_terms($nid) {
     $vocabularies = taxonomy_get_vocabularies();
     $output = '<ul class="links inline">';
     foreach($vocabularies as $vocabulary) {
       if ($vocabularies) {
         $terms = taxonomy_node_get_terms_by_vocabulary($nid, $vocabulary->vid);
         if ($terms) {
           $links = array();
           $output .= '<li property="dc:subject">' . $vocabulary->name . ': ';
           foreach ($terms as $term) {
             $links[] = l($term->name, taxonomy_term_path($term), array('rel' => 'tag', 'title' => strip_tags($term->description)));
           }
           $output .= implode(', ', $links);
           $output .= '</li>';
         }
       }
     }
     $output .= '</ul>';
     return $output;
}

In the node.tpl.php file, I then replaced the existing print $terms line with a reference to my custom terms display function:

    <div class="taxonomy">
      Tagged: <?php print burningbirds_print_terms($node); ?>
    </div>

Other areas that can be annotated with RDFa in an entry are the author and date, but I didn’t have to code these or modify the theme template directly. Instead, I downloaded and installed the Submitted By module. Once installed and activated, this module provides an “Appearance” field in the content type form, which you can use to modify the “submitted by” line in posts.

By default, the template engine generates a line with the author’s username, linked to their user profile, and the date and time when the entry was created. I modified the field to show the author’s name, without linking to the author profile, since I’m the only author. I also modified the post date to just the date. Time, to me, just isn’t relevant for my site. Adding the appropriate RDFa annotation results in the following pattern:

<span property="dc:creator">[author-name-raw]</span> on [day], <span property="dc:date">[yyyy]-[mm]-[dd]</span>

Now that I’ve annotated several elements in the page with RDFa, I went shopping around at various semantic websites to see what else they were providing by way of semantic markup. At Danny Ayers weblog my Semantic Radar toolbar alerted me to the presence of SIOC (Semantically-Interlinked Online Communities Project) data, one of the recommended data types supported by Yahoo’s SearchMonkey. I did a little research and found the SIOC Drupal module, which I downloaded and installed.

The SOIC module automatically generates SIOC, which can be accessed as a direct RDF export. I gather that the module also adds a link to this metadata via the menu system, but I found this only works with a theme like Garland’s. I wanted to be able to integrate a link in the header of my web pages, to page specific SIOC exports, wherever applicable. I checked the module’s documentation, and elsewhere, but couldn’t find anything on automatically adding this link, so decided to add it myself in my theme.

In Drupal, at least 6.x, you can add a preprocess function that will pre-process web page data before the page is displayed. I had such a pre-process function already, to modify my header to a) remove the RSS 2.0 link, and b) modify the content type meta tag to reflect my XHTML content type. It was a simple matter to modify this code to include a conditional check to see if the page being served is the Drupal front page, and if not, whether the page is presenting a node of type story, blog, or user. If the former, I provided a link to the site’s main SIOC export URL; the later, one specific to the node:

function burningbirds_preprocess_page(&$vars) {

  $head = $vars['head'];
  $node = $vars['node'];
  $head = str_replace("<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />",
              "<meta http-equiv=\"Content-Type\" content=\"application/xhtml+xml; charset=UTF-8\" />", $head);
  
  $head = str_replace("<link rel=\"alternate\" type=\"application/rss+xml\" title=\"Burningbird's RealTech RSS\" href=\"http://realtech.burningbird.net/rss.xml\" />\n","",$head);
    
  if (drupal_is_front_page()) {
    $head .= '<link rel="meta" type="application/rdf+xml" title="SIOC" href="http://realtech.burningbird.net/sioc/site" />';
  } else if ($node->type == 'story' || $node->type == 'blog' || $node->type == 'user') {
    $head .= '<link rel="meta" type="application/rdf+xml" title="SIOC" href="http://realtech.burningbird.net/sioc/node/';
    $head .= $node->nid;
    $head .= '" />';
  }

  $vars['head'] = $head;
}

For the node pages, I check to see if the node type is blog, story, or user, as these are the only node types currently supported by the SIOC module. Once this change was in effect, a header link to the SIOC file now appears with the main site page, and with blog, story, and user pages.

This is a start, as I explore other ways to annotate my site with metadata. I also plan on using metadata annotation when I do reviews and other specific types of writing. In addition, I’ll probably add a generic FOAF page, as well as utilize other vocabularies as they present themselves. If you don’t have the Semantic Radar toolbar installed, you can use the W3C RDFa extractor to extract the site’s RDFa. You can see the SIOC by accessing the exporter for the site or an individual entry, such as this story.

How Not to write about the semantic web

How not to attract new semantic web readers, especially among the women. Write the following:

I just thought that this is a smart strategy to make video tutorials about the Semantic Web more appealing to female* or otherwise not so super-tech-savvy* audiences: Just put a Lolcat in it!

Though the author wrote that she matches the “stereotype”, which I guess means women who aren’t tech and like LOLcats, by the time I followed the asterisks, I’d already passed from astonishment to loathing. FYI, I wrote the first book on RDF, babes.

A reference to females was unnecessary. Surprising, too, from the same company featuring an interview with Corinna Bath, author of the thesis, “Towards a De-Gendered Design of Information Technologies”.

Where are the Semantic Web applications

There’s been an increase of interest about the semantic web lately.

One particular question and answer in Danny’s interview has to do with the seeming cultural divisions associated with the semantic web.

Some say: “Europeans have developed the Semantic Web and Americans are going to capitalise it.” What is your opinion?

Danny Ayers: Six months ago I attended the SemTech conference in San José. There were quite a few European folks with solid projects approaching venture capitalists and vice versa. The impression I got was that of a significant culture clash, with the Europeans generally caught on the wrong-foot.

I have also attended most of the Italian SemWeb conferences (SWAP) and there have seen many demos of potentially lucrative applications, which got forgotten once the presenter gained their doctorate.

At the same time, as far as the (Semantic) Web is concerned, national barriers count for nothing. I live in an 8-cat town in Tuscany and work for a UK company, the US-based company OpenLink has an expert in Outer Siberia.

Internationalization of communication aside, I’ve also noticed what seems to be a shift of semantic web research to Europe, while the States focus on, well, Twitter. Google. Facebook. Cloud Computing. I don’t mean that in a derogatory sense—the States seem to be focusing on business based on known, possibly even exhausted technology and knowledge, while Europe is focusing on research for its own sake.

It may not seem like a big deal, but research without an endgame is nothing more than ego wrapped up in white papers. At the same time, a business that is focused purely on the moment isn’t going to take the web in new directions. Instead, we’ll be like the dog chasing its tail—all excited movement that doesn’t really go anywhere.

However, I haven’t been following the semantic web community as closely as I once did, and my view may be skewed because of it. The fact that this question was asked, though, shows I’m not the only one seeing a decidedly cultural bias in focus and interest.

Correlation

noticed a correlation between my last two posts on the lack of women at Ajax Experience and the seeming lack of RDF or semantic web applications. Both are based on perennial questions: Where are the women in technology? Where are the semantic web applications?

Next time I’m asked either, I think I’ll answer that the women in technology are off building RDF-based semantic web applications. Yeah, that’s the ticket.

The women in technology are off building RDF-based semantic web applications. It works better than answering yes, there are women in technology but we’re still not as visible as we should be, and yes there are semantic web and RDF-based applications, but they’re still not as visible as they could be—both of which evidently don’t play well in the dominant technical culture, because the same damn questions keep getting asked, again and again.