Categories
RDF

Syndication as compared to aggregation

Bobby Masteria writes on common misconceptions about aggregation as compared to syndication. However, though I agree we don’t have to support all feeds, leaving out one of the most widely used is a mistake.

Categories
Technology Web

The whole thing

Recovered from the Wayback Machine.

The Architecture of the World Wide Web, First Edition was just issued as a W3C recommendation. I love that title – it reminds me of Monty Python’s “The Meaning of Life”, volume one.

Interesting bit about URIs in the document. To address the ‘resource as something on the web’ as compared to ‘resource as something that can be discussed on the web’ issue, the document describes a resource thusly:

By design a URI identifies one resource. We do not limit the scope of what might be a resource. The term “resource” is used in a general sense for whatever might be identified by a URI. It is conventional on the hypertext Web to describe Web pages, images, product catalogs, etc. as “resources”. The distinguishing characteristic of these resources is that all of their essential characteristics can be conveyed in a message. We identify this set as “information resources”.

This document is an example of an information resource. It consists of words and punctuation symbols and graphics and other artifacts that can be encoded, with varying degrees of fidelity, into a sequence of bits. There is nothing about the essential information content of this document that cannot in principle be transfered in a representation.

However, our use of the term resource is intentionally more broad. Other things, such as cars and dogs (and, if you’ve printed this document on physical sheets of paper, the artifact that you are holding in your hand), are resources too. They are not information resources, however, because their essence is not information. Although it is possible to describe a great many things about a car or a dog in a sequence of bits, the sum of those things will invariably be an approximation of the essential character of the resource.

The document then gets into URI collision:

By design, a URI identifies one resource. Using the same URI to directly identify different resources produces a URI collision. Collision often imposes a cost in communication due to the effort required to resolve ambiguities.

Suppose, for example, that one organization makes use of a URI to refer to the movie The Sting, and another organization uses the same URI to refer to a discussion forum about The Sting. To a third party, aware of both organizations, this collision creates confusion about what the URI identifies, undermining the value of the URI. If one wanted to talk about the creation date of the resource identified by the URI, for instance, it would not be clear whether this meant “when the movie was created” or “when the discussion forum about the movie was created.”

Social and technical solutions have been devised to help avoid URI collision. However, the success or failure of these different approaches depends on the extent to which there is consensus in the Internet community on abiding by the defining specifications.

Categories
Weblogging

Connectivity

How uncanny that just when I decide to disconnect, my cable internet connectivity bites the bullet. Because of this my posting may become irregular much sooner. A word of advice: if you’re considering DSL or cable for internet connectivity, think twice about cable. Or maybe just reconsider getting the connection, regardless of the technique.

I need to do a new topic post on IT Kitchen, this one on weblogging technology. Yes, this is still going on, thanks to a few folks who have said they would write something when they can. I’ll also be writing though I have been sidetracked recently into helping another group–an effort that ended up being one of those ‘bad energy places’ I talked about last week.

What I would like to do at the Kitchen is start a page at the wiki and have users provide feedback to the weblog tool developers about what they would and would not like to see in a tool. Beyond comment spam, and we know this continues to be a problem. For instance, I’ve heard people say they don’t like MT’s edit window, nor WordPress’ edit space, but I’m not sure of the specifics. Is it because there is some HTML exposure? The appearance? The fact that it’s remote?

However, the Kitchen wiki has not been attracting any activity, so contrary to everyone going gah gah over wikis lately, as witness in the new article at O’Reilly, I’m not sure that a wiki is the best way to get people involved; or maybe it’s use doesn’t suit this particular effort. Still, we’ll give it a shot.

Categories
RDF

Why a processor rather than a transform

I have spent a little time looking at other approaches to mapping RDF to a web document created as XHTML; approaches such as GRDDL, which uses XSLT to transform basic concepts from (X)HTML into RDF/XML and then provides a link to the transform.

(RAP just released a GRDDL parser, though it’s based on PHP 5.x, which means don’t expect it out on the streets too soon.)

This works, if all you’re doing is pulling out data that can be mapped to valid XHTML structure elements. But it doesn’t work if you want to capture meaning that can’t be constructed from headers and paragraphs, or DIV blocks with specific class names. Still, it meets a criteria of minimal human intervention, which finds favor among Semantic Web developers. If the user is providing a page anyway, might as well glean the meaning of it.

However, as we’ve found with Google, which does basically the same thing except it performs it’s magic after the material is accessed, automated mechanisms only uncover part of the story. This is why I get people searching on the oddest things coming to my site – accidental groupings of words pulled from my pages just happen to meet a word combination on which they’re searching.

In other words, hoping to discover semantics accidentally, only goes so far.

One reason I use a poetry finder as a test of any new semantic web technologies and approaches is that any solution that would work to help people find the right sub-set of poetry, won’t do so because of accidental semantics.

Let’s look at two popular RDF vocabularies: RSS and FOAF. RSS is an accidentially semantic application. The same data that drives an application such as a weblogging tool can be used to create RSS without much intervention on the part of the user. I could also use the same mechanism that drives RSS to drive out something like my Post Content vocabulary, PostCon.

(Though one bit of information I capture in PostCon, such as the fact that a page has been pulled and information as to why it’s been pulled cannot be capture in RSS; RSS implies a specific state for a document: “I exist.”)

FOAF, on the other hand, requires that the user sit down and identify relationships. There really is little or no accidential semantics for this vocabulary, unless you follow some people’s idea that FOAF and blogrolls are one in the same (a hint: they’re not).

So what drives out the need for FOAF? Well, much of it is driven out by people attracted a bright, new, shiny objects. Still, one can see how something like FOAF could be used to drive out systems of social networks, or even *shudder* webs of trust, so there is an added benefit to doing the work for FOAF beyond it being cool and fun.

The key to attracting human intervention, beyond getting someone influential and well known to push it, is to make it easy for the end user–the non-XML, non-RDF end user–to provide the necessary data, and then to provide good reasons why they would do so. The problem with this approach, though, is that many Semantic Web technologists don’t want to work on approaches that require the human as an initial part of the equation. Rightfully so: a solution that requires effort from people, and that won’t have a payback until critical mass is reached, is not something that that’s easy to sell.

Still, I think FOAF has shown a direction to follow – keep it simple, uncomplicated, and perhaps enough people will buy in at first to reach the critical mass needed to bring in others. The question, though, is whether it can attract the interest of the geeks, because it’s not based on XSLT.

With GRDDL, one can attach a class name to a DIV or SPAN element, and then use XSLT to generate matching RDF/XML. This removes some of the accidental discovery by explicitly stating something of interest with that DIV element. More, this doesn’t require that the data be kept separate from the document – it would be embedded directly in the document.

However, rather than making this less complicated, the whole thing strikes me as making the discovery of information much more complicated than it need be.

Now, not only would the end-user have to write the text of a writing, they would have to go through that text and mark specific classes of information about each element within the XHTML. This then exposes the end user to the XHTML, unless one starts getting into a fairly complicated user interface.

Still, this is another approach that could be interesting, especially when one considers the use of Markdown and other HTML transforms used in weblogging tools. How to do something like this and have it map to multiple data models could be challenging.

Don’t mind me, still thinking out loud.

Categories
RDF

I need to keep up more

…with the Semantic Web doings at the W3C, though doing so precludes doing much else at times.

However, not keeping up means that I’m losing important bits of information; such as this bit that Danny Ayers named his new kitten after a proposed new query language for RDF.

Ah well, at least he didn’t name her Ontaria.