8 November 2004 Archives

How uncanny that just when I decide to disconnect, my cable internet connectivity bites the bullet. Because of this my posting may become irregular much sooner. A word of advice: if you’re considering DSL or cable for internet connectivity, think twice about cable. Or maybe just reconsider getting the connection, regardless of the technique.

I need to do a new topic post on IT Kitchen, this one on weblogging technology. Yes, this is still going on, thanks to a few folks who have said they would write something when they can. I’ll also be writing though I have been sidetracked recently into helping another group–an effort that ended up being one of those ‘bad energy places’ I talked about last week.

What I would like to do at the Kitchen is start a page at the wiki and have users provide feedback to the weblog tool developers about what they would and would not like to see in a tool. Beyond comment spam, and we know this continues to be a problem. For instance, I’ve heard people say they don’t like MT’s edit window, nor WordPress’ edit space, but I’m not sure of the specifics. Is it because there is some HTML exposure? The appearance? The fact that it’s remote?

However, the Kitchen wiki has not been attracting any activity, so contrary to everyone going gah gah over wikis lately, as witness in the new article at O’Reilly, I’m not sure that a wiki is the best way to get people involved; or maybe it’s use doesn’t suit this particular effort. Still, we’ll give it a shot.

I have spent a little time looking at other approaches to mapping RDF to a web document created as XHTML; approaches such as GRDDL, which uses XSLT to transform basic concepts from (X)HTML into RDF/XML and then provides a link to the transform.

(RAP just released a GRDDL parser, though it’s based on PHP 5.x, which means don’t expect it out on the streets too soon.)

This works, if all you’re doing is pulling out data that can be mapped to valid XHTML structure elements. But it doesn’t work if you want to capture meaning that can’t be constructed from headers and paragraphs, or DIV blocks with specific class names. Still, it meets a criteria of minimal human intervention, which finds favor among Semantic Web developers. If the user is providing a page anyway, might as well glean the meaning of it.

However, as we’ve found with Google, which does basically the same thing except it performs it’s magic after the material is accessed, automated mechanisms only uncover part of the story. This is why I get people searching on the oddest things coming to my site – accidental groupings of words pulled from my pages just happen to meet a word combination on which they’re searching.

In other words, hoping to discover semantics accidentally, only goes so far.

One reason I use a poetry finder as a test of any new semantic web technologies and approaches is that any solution that would work to help people find the right sub-set of poetry, won’t do so because of accidental semantics.

Let’s look at two popular RDF vocabularies: RSS and FOAF. RSS is an accidentially semantic application. The same data that drives an application such as a weblogging tool can be used to create RSS without much intervention on the part of the user. I could also use the same mechanism that drives RSS to drive out something like my Post Content vocabulary, PostCon.

(Though one bit of information I capture in PostCon, such as the fact that a page has been pulled and information as to why it’s been pulled cannot be capture in RSS; RSS implies a specific state for a document: “I exist.”)

FOAF, on the other hand, requires that the user sit down and identify relationships. There really is little or no accidential semantics for this vocabulary, unless you follow some people’s idea that FOAF and blogrolls are one in the same (a hint: they’re not).

So what drives out the need for FOAF? Well, much of it is driven out by people attracted a bright, new, shiny objects. Still, one can see how something like FOAF could be used to drive out systems of social networks, or even *shudder* webs of trust, so there is an added benefit to doing the work for FOAF beyond it being cool and fun.

The key to attracting human intervention, beyond getting someone influential and well known to push it, is to make it easy for the end user–the non-XML, non-RDF end user–to provide the necessary data, and then to provide good reasons why they would do so. The problem with this approach, though, is that many Semantic Web technologists don’t want to work on approaches that require the human as an initial part of the equation. Rightfully so: a solution that requires effort from people, and that won’t have a payback until critical mass is reached, is not something that that’s easy to sell.

Still, I think FOAF has shown a direction to follow – keep it simple, uncomplicated, and perhaps enough people will buy in at first to reach the critical mass needed to bring in others. The question, though, is whether it can attract the interest of the geeks, because it’s not based on XSLT.

With GRDDL, one can attach a class name to a DIV or SPAN element, and then use XSLT to generate matching RDF/XML. This removes some of the accidental discovery by explicitly stating something of interest with that DIV element. More, this doesn’t require that the data be kept separate from the document – it would be embedded directly in the document.

However, rather than making this less complicated, the whole thing strikes me as making the discovery of information much more complicated than it need be.

Now, not only would the end-user have to write the text of a writing, they would have to go through that text and mark specific classes of information about each element within the XHTML. This then exposes the end user to the XHTML, unless one starts getting into a fairly complicated user interface.

Still, this is another approach that could be interesting, especially when one considers the use of Markdown and other HTML transforms used in weblogging tools. How to do something like this and have it map to multiple data models could be challenging.

Don’t mind me, still thinking out loud.