Drupal and OpenID

I have been focused on OpenID implementations lately, specifically in WordPress and Drupal. The Drupal effort is for my own sites.

Until this weekend, I had turned off new user registration at my Drupal sites, because I get too many junk user registrations. However, to incorporate OpenID into a Drupal site, you have to allow users to register themselves, regardless of whether they use OpenID or not.

I think this all or nothing approach actually limits the incorporation of OpenID within a Drupal site. If you limit registration to administrator’s only, then people can’t use their OpenIDs unless the administrator gets involved. If you allow people to self-register, there’s nothing to stop the spammy registrations.

I believe that OpenID should be an added, optional field attached to the comment form, allowing one to attach one’s OpenID directly to a comment, which then creates a limited user account within the site specifically for the purposes of commenting. Rather than just providing options to allow a user to register themselves, or not, add another set of options specific to OpenID, and allow us to filter new registrations based on the use of OpenID.

Currently, the new user registration options in Drupal 6x are:

  • Only site administrators can create new user accounts.
  • Visitors can create accounts and no administrator approval is required.
  • Visitors can create accounts but administrator approval is required.

Turn on the latter two options and you’ll get spammy registrations within a day. Not many, but annoying. I believe there should be a fourth and fifth option:

  • Visitors can create accounts using OpenID, only, and no administrator approval is required.
  • Visitors can create accounts using OpenID, only, but administrator approval is required.

With these new options, I could then open up new user registration for OpenID, but without having to allow generic new user registration for the account spammers that seem to be so prevalent with Drupal.

To attempt to implement this customized functionality at my sites, I’ve been playing with Drupal hooks, but the change is a little more extensive than just incorporating a hook handler and a few lines of code, at least for someone who is relatively new to Drupal module development like I am.

Taking the simplest route that I could implement as a stand-alone module, what I’m trying now is to modify the new user registration forms so that only the OpenID registration links display. You’ll see this, currently, in the sidebar if you access the site and you’re not logged in. Unfortunately, you have to click on the OpenID link to open the OpenID field, because I’m still trying to figure out how to remove the OpenID JavaScript that hides the field (there is a function to easily add a JavaScript library, but not one to remove an added library).

With my module-based modifications, rather than a person having to click a link to create a new account, and specify a username and password, they would provide their OpenID, and I would automatically assign them a username via autoregistration. To try my new sidebar module, I decided to turn my Drupal sites into OpenID providers, as well as clients, and use one of them as a test case. Provider functionality is not built in, but there is an OpenID provider module, which I downloaded and activated with my test Drupal installation (MissouriGreen).

I tried my new module and OpenID autoregistration but ran into a problem: the Drupal client does not like either the username or email provided via the Drupal OpenID provider. Why? Because the OpenID identifier used in the registration consists of the URL of the Provider, which is the URL of the Drupal site I used for my test, and the Drupal client does not like my using a URL. In addition, the provider also didn’t provide an email address.

Digging into the client side code, I discovered that the Drupal OpenID client supports an OpenID extension, Simple Registration. Simple Registration provides for an exchange of the 9 most requested information between the OpenID client and provider: nickname, email, full name, dob (date of birth), gender, postcode, country, language, and timezone. With Simple Registration, you can specify which of the items is optional and which mandatory, and the current OpenID client wants nickname and email.

By using Simple Registration on the provider, I could then provide the two things that my Drupal OpenID client wanted: nickname and email. Unfortunately, though, the current version of the OpenID provider doesn’t support Simple Registration. I was a little surprised by this, as I had made an assumption that the Drupal OpenID provider would work with the Drupal OpenID client. However, OpenID is in a state of flux, so such gaps are to be expected.

Further search among the Drupal Modules turned up another module, the Drupal Simple Registration module, which allows one to set the mandatory and optional fields passed as part of the OpenID authentication exchange. The only problem is that the OpenID Provider also doesn’t have any incorporated hooks, which would allow the Simple Registration module to provide the Simple Registration data as part of the response. To add these hooks, the Simple Registration module developer also supplied a patch that can be run against the OpenID Provider code to add the hook.

I applied the patch and opened the module code and confirmed that it had been modified to incorporate the hook. I then tried using the Drupal site as OpenID provider again, but the registration process still failed. Further tests showed that the Simple Registration data still was not being sent.

All I really want to do is test the autoregistration process, so I abandoned the Drupal OpenID provider, and decided to try out some other providers. However, I had no success with either my Yahoo account or my Google GMail account, even though I believed both provided this functionality. The Yahoo account either didn’t send the Simple Registration fields or failed to do so in a manner that the Drupal OpenID client could understand. The Gmail account just failed, completely, with no error message specifying why it failed.

I felt like BarbieOpenID is hard!

I finally decided to use phpMyID, which is a dirt simple, single user OpenID application that we can host, ourselves. I had this installed at one time, pulled it, and have now re-installed at my base burningbird.net root directory. I added the autodiscovery tags to my main web page, and uncommented the lines in the MyID.config.php file for the nickname, full name, and email Simple Registration fields. I then tried “https://burningbird.net” for OpenID autoregistration at RealTech. Eureka! Success.

The new user registration is still currently blocked at creation, but the site now supports autoregistration via OpenID. Unfortunately, though, the registration spammers can still access the full account creation page, so I can still get spammy registrations. However, I believe that this page can be blocked in my mandatory OpenID module, with a little additional work; at least until I can see about possibly creating a module that actually does add the OpenID only options I mentioned above. The people who generate spammy user account registrations could use OpenID themselves, but the process is much more complex, and a lot more controlled at the provider endpoint, so I think this will help me filter out all but the most determined spammy registrations.

Once all of this is working, I’ll see about adding the OpenID login field to the comment form, rather than in the sidebar. If one wonders, though, why there isn’t more use of OpenID, one doesn’t have to search far to find the answers. Luckily for Drupal users, OpenID seems to be an important focus of this week’s DrupalCon in Washington DC, including a specialized Code Sprint.

RDFaification of Drupal 6

You don’t have to wait for Drupal 7 to RDFaificate your Drupal site. I spent yesterday tweaking my space, and if you access the site now with a tool, such as the Semantic Radar Firefox add-on, you’ll see all sorts of semantic goodness. I used a combination of plug-ins and theme modifications to make my changes, and will probably add to the overall effect over time.

What simplified my RDFa integration is that my site was already being served up as valid XHTML, via a modification to my page.tpl.php file:

<?php
header("Vary: Accept");
if (stristr($_SERVER["HTTP_ACCEPT"], "application/xhtml+xml"))
    header("Content-Type: application/xhtml+xml; charset=utf-8");
else
    header("Content-Type: text/html; charset=utf-8");
?><!DOCTYPE html PUBLIC
    "-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN"
    "http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" 
xml:lang="<?php print $language->language ?>">

The PHP code checks with the user agent accessing the page. If the user agent accepts XHTML, the code returns the pages as XHTML; otherwise, the pages are returned as HTML. However, the DOCTYPE I had been using was a SVG+MathML DOCTYPE, because of my sometimes use of embedded SVG. To validate as XHTML+RDFa, though, you need to use the RDFa DOCTYPE.


<?php
header("Vary: Accept");
if (stristr($_SERVER["HTTP_ACCEPT"], "application/xhtml+xml"))
    header("Content-Type: application/xhtml+xml; charset=utf-8");
else
    header("Content-Type: text/html; charset=utf-8");
?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" 
   "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" dir="ltr"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<head profile="http://ns.inria.fr/grddl/rdfa/">

The namespaces in the HTML opening tag don’t reflect all that I’ll use in my pages, just the ones I used for RDFa annotation sprinkled, liberally, throughout the page. When I use embedded SVG, I can just add the SVG namespaces directly into the opening SVG element tag. I could add the namespaces now, but I don’t always use embedded SVG.

One unfortunate consequence of switching DOCTYPEs is that when I do use embedded SVG, the page won’t validate. However, this won’t impact on the user agents and their ability to process the SVG correctly, so I’ll just have to live with the invalidation errors. That’s the joy of DOCTYPEs.

Another change is to the opening HEAD tag, where I added the GRDDL profile. This lets data consuming agents know that I’m, first of all, using RDFa, then secondly, using the latest transform profile for RDFa. After all, once the data is provided, we assume someone will want to do something with the data.

I’m ready, now, to begin adding RDFa annotation. Some of the changes I can make directly to the theme pages, such as adding an attribute/value pair of property=”dc:title” to my header element that references my site’s title (“Burningbird’s RealTech”). I also added annotation within the node, via node.tpl.php, again adding property=”dc:title” to each individual site entry’s title.

Other annotation, though, required either the use of a Drupal module, or custom code. For instance, one change I wanted to make was to add a a property=”dc:subject” to my vocabulary terms. In my template.php file (used to override and extend the theme templating engine), I added a taxonomy term function that will not only append the vocabulary to each term, but also annotate the result with the RDFa dc:subject notation:

// split out taxonomy terms by vocabulary
function burningbirds_print_terms($nid) {
     $vocabularies = taxonomy_get_vocabularies();
     $output = '<ul class="links inline">';
     foreach($vocabularies as $vocabulary) {
       if ($vocabularies) {
         $terms = taxonomy_node_get_terms_by_vocabulary($nid, $vocabulary->vid);
         if ($terms) {
           $links = array();
           $output .= '<li property="dc:subject">' . $vocabulary->name . ': ';
           foreach ($terms as $term) {
             $links[] = l($term->name, taxonomy_term_path($term), array('rel' => 'tag', 'title' => strip_tags($term->description)));
           }
           $output .= implode(', ', $links);
           $output .= '</li>';
         }
       }
     }
     $output .= '</ul>';
     return $output;
}

In the node.tpl.php file, I then replaced the existing print $terms line with a reference to my custom terms display function:

    <div class="taxonomy">
      Tagged: <?php print burningbirds_print_terms($node); ?>
    </div>

Other areas that can be annotated with RDFa in an entry are the author and date, but I didn’t have to code these or modify the theme template directly. Instead, I downloaded and installed the Submitted By module. Once installed and activated, this module provides an “Appearance” field in the content type form, which you can use to modify the “submitted by” line in posts.

By default, the template engine generates a line with the author’s username, linked to their user profile, and the date and time when the entry was created. I modified the field to show the author’s name, without linking to the author profile, since I’m the only author. I also modified the post date to just the date. Time, to me, just isn’t relevant for my site. Adding the appropriate RDFa annotation results in the following pattern:

<span property="dc:creator">[author-name-raw]</span> on [day], <span property="dc:date">[yyyy]-[mm]-[dd]</span>

Now that I’ve annotated several elements in the page with RDFa, I went shopping around at various semantic websites to see what else they were providing by way of semantic markup. At Danny Ayers weblog my Semantic Radar toolbar alerted me to the presence of SIOC (Semantically-Interlinked Online Communities Project) data, one of the recommended data types supported by Yahoo’s SearchMonkey. I did a little research and found the SIOC Drupal module, which I downloaded and installed.

The SOIC module automatically generates SIOC, which can be accessed as a direct RDF export. I gather that the module also adds a link to this metadata via the menu system, but I found this only works with a theme like Garland’s. I wanted to be able to integrate a link in the header of my web pages, to page specific SIOC exports, wherever applicable. I checked the module’s documentation, and elsewhere, but couldn’t find anything on automatically adding this link, so decided to add it myself in my theme.

In Drupal, at least 6.x, you can add a preprocess function that will pre-process web page data before the page is displayed. I had such a pre-process function already, to modify my header to a) remove the RSS 2.0 link, and b) modify the content type meta tag to reflect my XHTML content type. It was a simple matter to modify this code to include a conditional check to see if the page being served is the Drupal front page, and if not, whether the page is presenting a node of type story, blog, or user. If the former, I provided a link to the site’s main SIOC export URL; the later, one specific to the node:

function burningbirds_preprocess_page(&$vars) {

  $head = $vars['head'];
  $node = $vars['node'];
  $head = str_replace("<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />",
              "<meta http-equiv=\"Content-Type\" content=\"application/xhtml+xml; charset=UTF-8\" />", $head);
  
  $head = str_replace("<link rel=\"alternate\" type=\"application/rss+xml\" title=\"Burningbird's RealTech RSS\" href=\"http://realtech.burningbird.net/rss.xml\" />\n","",$head);
    
  if (drupal_is_front_page()) {
    $head .= '<link rel="meta" type="application/rdf+xml" title="SIOC" href="http://realtech.burningbird.net/sioc/site" />';
  } else if ($node->type == 'story' || $node->type == 'blog' || $node->type == 'user') {
    $head .= '<link rel="meta" type="application/rdf+xml" title="SIOC" href="http://realtech.burningbird.net/sioc/node/';
    $head .= $node->nid;
    $head .= '" />';
  }

  $vars['head'] = $head;
}

For the node pages, I check to see if the node type is blog, story, or user, as these are the only node types currently supported by the SIOC module. Once this change was in effect, a header link to the SIOC file now appears with the main site page, and with blog, story, and user pages.

This is a start, as I explore other ways to annotate my site with metadata. I also plan on using metadata annotation when I do reviews and other specific types of writing. In addition, I’ll probably add a generic FOAF page, as well as utilize other vocabularies as they present themselves. If you don’t have the Semantic Radar toolbar installed, you can use the W3C RDFa extractor to extract the site’s RDFa. You can see the SIOC by accessing the exporter for the site or an individual entry, such as this story.

XHTML and the Forum

An issue I had with supporting XHTML in my WordPress weblogs is that it can be difficult to control others’ input in comments. I solved the issue with Drupal by not supporting comments directly on posts. Instead, I’ve provided a forum, using the Drupal Forum module, here at RealTech. I’ve only provided the one forum for all of my sites to make it simpler for people to register, as well as follow any active discussions.

To ensure that Forum pages don’t create XHTML errors, I modified the PHP to serve pages as XHTML to exclude URLs that have the word “forum” in them. I realize this will impact on any page with ‘forum’ in the title, such as this page. However, it’s unlikely that I’ll be using inline SVG and the word “forum” in the title for the same post.


header("Vary: Accept");
if (stristr($_SERVER["HTTP_ACCEPT"], "application/xhtml+xml"))
    if (stristr($_SERVER["REQUEST_URI"],"forum"))
        header("Content-Type: text/html; charset=utf-8");
    else
        header("Content-Type: application/xhtml+xml; charset=utf-8");
else
    header("Content-Type: text/html; charset=utf-8");

In addition, I turned filtered HTML on for forum entries, as well as installed htmLawed, to ensure that the entries are as clean as possible. Regardless, a problem in the forums won’t take down a post, and that was my main criteria for making this change.

The forums should also provide a much more flexible communication system. You can use your OpenID to register, or just register directly. You can still comment anonymously, though the comments are moderated.

Typically, any person who registered is an authorized user and could create forum topics. Well, I wasn’t quite ready to make that leap of faith. I created a “trusted” user who can create forum topics and will reserve this user classification to people I know. I then adjusted the permissions to enable forum topic creation for trusted users and admins, only.

I’ve created main forum categories. Over time, I imagine I’ll need to adjust the forum categories to be general enough to be useful, without being so general that it’s difficult to find discussions of interest.

XHTML and template.php

Since I use inline SVG in all of my sites, I need to serve my pages up as XHTML. I couldn’t find a Drupal module or option that triggers Drupal to serve the pages up as XHTML, so I added the following code at the very top of the page.tpl.php page:

<?php
header("Vary: Accept");
if (stristr($_SERVER["HTTP_ACCEPT"], "application/xhtml+xml"))
    header("Content-Type: application/xhtml+xml; charset=utf-8");
else
    header("Content-Type: text/html; charset=utf-8");
?>

All this code does is check whether the user agent can support XHTML. If it can, the page is served up as XHTML; otherwise HTML. Using embedded PHP in page.tpl.php for the theme works as desired. However, the content type of the page no longer agrees with the content-type meta element, which is included within the $head variable.

  <head>
    <title><?php print $head_title ?></title>
    <?php print $head ?>
    <?php print $styles ?>
    <?php print $scripts ?>
    <!--[if lt IE 7]>
      <?php print phptemplate_get_ie_styles(); ?>
    <![endif]-->
  </head>

One solution would be to not print out the $head variable and just hard code all of the head entries. However, when you add a new module, such as the RDF module, which adds entries to the $head variable, you either have to hard code these in, also, or you lose functionality.

Instead, I used the template.php file, which is used to override default Drupal behavior. Specifically, I created a variant of _preprocess_page for my subtheme, used this to access $head, and alter it.

function flame_preprocess_page(&$vars) {
  $head = $vars['head'];
  $head = str_replace("<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />",
              "<meta http-equiv=\"Content-Type\" content=\"application/xhtml+xml; charset=UTF-8\" />", $head);

  $head = str_replace("<link rel=\"alternate\" type=\"application/rss+xml\" title=\"Burningbird's RealTech RSS\" href=\"http://realtech.burningbird.net/rss.xml\" />\n","",$head);

  $vars['head'] = $head;
}

I am still getting my head around customization in Drupal, and will have a follow-up posting later, after I’ve had more time to work through issues. In the meantime, from my understanding, the _preprocess_page passes in an array of system variables, each of which can be accessed and altered, as I’ve done in the code example. The subtheme naming is essential, otherwise I would overwrite the use of the function within the Garland theme.

There is another theme specific function for overriding variables called _phptemplate_variables, but I gather that was Drupal 5 and the approach I used is Drupal 6.x and forward.

Now, the head entries are as I want them, and modules such as RDF can add their entries without me impacting on their work. More importantly, with the addition of the XHTML support, I can add inline SVG and the SVG will display correctly for those browsers that support inline SVG.

(Note this image is now included using the IMG element. I’m now using WordPress, and WordPress does not allow inline SVG.)

hello world in svg