The Semantics of Starlings

This weekend I played a bit more with the attachment that allows me to take photos of slides with my digital camera. The ones shown here I took years ago when I lived in Portland, Oregon. The subject is a flock of European Starlings at sunset, just after a storm.

Every year our apartment complex in Portland would be overrun with flocks of starlings that swooped and swirled about, covering the trees and darkening the sky — whatever that part of the sky that wasn’t already darkened by the rain clouds that were a part of our life in Portland. Pretty as they may seem from the photos, the Starlings were a pest — a species that didn’t belong in the area, and one that would take food and habitat away from native birds. Their waste was corrosive to cars, and damaging to buildings and streets; additionally, the birds are known to carry and spread disease.

The apartment would bring in a bird specialist who had this explosive air cannon, which he would shoot at the trees to scare the birds off. (Rather unnerving for tenants in addition to birds.) The starlings would leave for a time, but they always managed to find their way back; they are nothing if not tenacious.

Starlings are a flocker, following lead birds almost obsessively, and it was fascinating to watch as one flock of starlings would meet head-on with another flock — thousands of birds racing towards each other in what you would expect would be a collision, but would coalesce into this wonderful ballet of birds flying over and around each other, literally riding the wake in the air each other caused.

What do starlings and their behavior have to do with the Semantic Web? Only in that, I was reminded of this ‘heads on’ behavior this weekend when I was quietly reading the various entries out at the W3C’s TAG (Technical Architecture) email list — all about URIs and resources, and what it all means…and doesn’t mean. The discussions spilled out into weblogging when Tim Bray wrote a posting titled On Resources. Focus on the web of now, Tim says, and document what we have now:

So, explaining the Web-as-it-is would be enough to make me happy. Clearly, we should have an eye to the future, and, in writing down the architecture, try to avoid making life difficult for any others who are working to make something new and important involving the Web. Obvious examples are the Semantic-Web and Web-Services efforts.

But at the end of the day, the success criterion for me is having the success criteria for the Web-as-it-is explained clearly and convincingly.

In other words, the focus of TAG should be on what exists now, not what might exist some day. As for resources and their identifiers — those pesky little devils — Tim wrote:

We could just not talk about resources in the Architecture document. That wouldn’t get in the way of any software that I know of. But I suspect that this would impair the document’s usefulness as people paged frantically back and forth trying to figure out what URIs identify. Perhaps there’s a middle ground, where we say that the nature of resources is outside the scope of this document, aside from the fact that they are what is named by URIs.

Tim Berners-Lee wasn’t particularly happy with Tim Bray’s essay. In the TAG email list he wrote:

You say that the TAG should concentrate on the web as it has been
before the semantic web and web services, and that you will be happy if the architecture works for that, even if it does not work for web
services and semantic web.

That is a pity, partly because the web is no good unless it can be a
sound foundation for the semantic web and web services too. WSDL (ed. Web Services Description Language) and RDF (ed. Resource Description Language) have real serious issues on the table, working groups which need a consistent framework.

At first glance, it seems as if the two Tims were at opposite ends of a circle — the web of the now versus the web of the future. Mathematically defining resources as compared to basically ignoring the concept as one that can’t be effectively defined. One could then assume that their opinions cancel each other out, leaving us a big fat zero in understanding. On the contrary: like the two flocks of starlings converging together from opposite directions — resulting in a thing of great beauty and great destructiveness — Tim B and TimBL have articulated the dichotomy behind the debate of what is a ‘resource’, and how is it identified within the Semantic Web (as introduced earlier this week). But that’s a dry summation — what they’ve really done is articulate the challenges of the Semantic Web:

To be a Semantic Web, it must be mechanical, and therefore precise, mathematical, and ultimately unambiguous. But to be a Semantic Web, it must also encapsulate meaning, context, and embrace ambiguity. Ignore the discontinuities, embrace the discontinuities.

What does this all mean? If a resource is defined to be anything, including something abstract then how can it have an identifier on the web, in the form of a URI? But if a resource within the context of the Semantic Web is defined to be something on the web, then how can it not have a URI? If we limit resources to things on the web, how can we identify things as disparate as a person, a galaxy, and an abstraction such as a metaphor in a poem? And how can one global set of URIs work for all items, at all granularities?

If a resource is a representation of something, and one that exists on the web, then software can be designed with an assumption that if you access the URI, something is returned. But can all ‘resources’ of interest within the Semantic Web be represented with something on the web, and identified by a URI? What about peace — can it be constrained within a representation? It’s hard enough identifying it in ‘real life’, how would it be represented on the web? How about you and I? Can we be represented on the web?

Questions! So many questions. And topmost in your mind might be: Why should I care?

Frankly, I’m not sure you should care about this debate and the Semantic Web — you cannot eat it, sleep with it, or use it to rear your young. However, you might care because what’s being discussed is the scope of what will be a part of the architecture of the Semantic Web. If the Internet and the Web, and all of its simple hyperlinkness has invaded your life to a degree now, how much more so will it if it becomes richer, more complex, and more meaningful?

I personally care about this debate because I want to make sure my metaphor, my syllogism, and my analogy are represented effectively or my own Turing Test for the Semantic Web will never come about. I don’t want these abstract concepts to be discarded because they can’t be mathematically defined.

“What we need to understand may only be expressible in a language that we do not know.”

Anthony Judge

That would be a pity.

In the title, I introduced FOAF, and you might be wondering where this simple RDF-based vocabulary fits into this grand debate. I could wish that the membership of the W3C wasn’t so averse to webloggers — our seeming arrogance and assumptions of our importance on the web, and our messiness — because the issues the TAG members are discussing are related to what’s happening with FOAF among the weblogging community. It is a microcosm of the Semantic Web, with its rich possibilities and its many ambiguities and misunderstandings.

To return to FOAF: FOAF represents both people and relationships, the former being concrete but difficult to physically put on the web, the latter being an abstract concept.

Me. The representation of “me” in this context is that which is described in a FOAF file. I am identified primarily by a hash of my email address — in this case, in this microcosm, I am known as:

cd2b130288f7c417b7321fb51d240d570c520720

You may call me “2b” of “Bb” for short.

In addition, my current FOAF file has a property defined within it — knows — and the object of this property is another person — Simon St. Laurent. In this file, I say, “I know Simon St. Laurent”, and to identify Simon within whatever FOAF system might exist, I use the hash of his email address:

65d7213063e1836b1581de81793bfcb9ad596974

I suppose you could call Simon “e183” for short.

Both Tims should be unhappy with my FOAF file, I would think, following from their arguments described earlier. For instance, there is a resource with a representation on the web — myself — but there is no URI for it; not only that we’re not completely sure what the resource is, but we’re not ignoring it, either.

Within a Semantic Web of moving parts and grinding bits, FOAF doesn’t fit.

Tim Bray took on an action item recently to draft language surrounding information resources as compared to resources. As he wrote in another TAG email:

Many existing Web servers and clients (for example web browsers) do not have any notion of what the Resource identified by a URI is. However, humans and Semantic Web software are strongly concerned with this issue. Some resources are perceived as falling into a class called “Information Resources”. That is to say, they are on-line units of electronic information or service. Examples would include a photograph, a news story, and a weather forecast for Oaxaca. Other resources named by URIs may exist entirely apart from the Web. Examples include an edition of some book identified by urn:isbn:0-395-36341-1, a person identified in an RDF assertion using http://example.com/foaf#Dan, and an XML namespace such as http://www.w3.org/1999/02/22-rdf-syntax-ns#. The
Web may be used to obtain representations of both kinds of resources.

What Tim is saying is that either a resource exists on the web, or a representation of the object exists on the web. If the URI has an associated protocol, such as the FOAF identifier given in Tim’s example, it’s representation is accessible on the web if it isn’t itself.

Or is it?

Not one FOAF file I have seen uses a URI to represent the person. They either don’t use anything, or they use what is known as a blank node identifier, which is only relevant to the file. However, the lack of a URI hasn’t impacted adversely in FOAF because each person identified within the context of FOAF is done so through two alternative keys: the mbox_sha1sum, which is the hashed representation of my email address; and/or the URL (URI) of my FOAF file — http://burningbird.net/foaf.rdf.

Neither key is officially a URI of my representation within the context of either the existing or the semantic web. We have simply worked around the debate and issue of how can one identify a representation of ourselves on the Net by not using an identifier. We could use an analogy of parents arguing about proper diet, while we hungry children raid the fridge and eat all the pie. We could, but there’s no URI for analogies, either, and therefore must not have a proper place in a proper discussion of the proper Semantic Web.

Then there’s the issue of what’s being identified — is the person the resource? Or is the FOAF information the resource, and I’m defined by many such? Additionally, FOAF files also denote other resources — or what we’re assuming are resources because they are, after all, defined within the Resource Description Framework — and they’re parameterized, if that’s the right word, by using the RDF property ‘knows’. However, we don’t have a good understanding of what it is we’re defining with ‘knows’. Is it denoting a relationship? Or is it nothing more than an acknowledgment that I literally know who Simon St. Laurent is? Is Simon a friend, because he’s in a FOAF file? If so then, what were to happen if I wasn’t in Simon’s FOAF file?

If I were to remove Simon from my FOAF file, am I disavowing the friendship? Or am I ‘pretending’ that I don’t ‘know’ Simon? With FOAF, we not only assert the truth, we assert a lie because I know Simon, and him not being in my FOAF file or not doesn’t change this. If I don’t list him, am I lying by omission? What does it mean to be in one of these files? What does it mean when you are not?

Will you ‘feel’ it, when you’re not?

The fact is that FOAF is being used as a representation of something, we’re making assertions about something but we’re not sure of what. Whatever it is, though, it’s loaded with connotations.

Within FOAF we’re representing information about ourselves, but it’s not us — too flat, too two-dimensional to be a representation of us. Additionally, we’re representing relationships with other people, but we’re each bringing our own interpretations of these relationships along for the ride. In other words, we’re making assertions of relationships and attaching social context to them.

In the RDF Concepts working draft, there was a section that discussed the social context of assertions. It is the one and only section of all the RDF documents that brings up the issue of social context about RDF statements. The only one. And of course, it is this section that the Semantic Web Architecture recommended be struck.

Why the cut? From the meeting minutes where the recommendation arose, it would seem to come back to our old debate of URI and context. There was too much confusion about what was meant by ‘identify’, by URI, by resource. As Dan Brickley said in IRC notes:

21:10:54 [bwmscribe]
authoritive definition of URI’s: i.e. who gets to say what a URI denotes
21:12:48 [danbri]
something like “RDF graphs have propositional content. Their meaning is fixed by a bunch of hairy stuff only partly understood and documented (eg. implicit theory of reference associated with URIs). Minor health warning. The End.”

But that doesn’t stop the confusion — ignoring the concept of ‘resource’, postponing the issue of identity, and ignoring social context because it’s too hard to define, won’t prevent problems when people act to fill the void that’s left. As Kendall Clark wrote:

This way of carrying on the social meaning debate was unlikely to lead to a satisfactory resolution, since it was possible to strike the problematic language without solving or addressing the substantive issues which animate the debate in the first place.

Consider FOAF files again: Marc Cantor and Eric Sigler are working on this thing that Marc is calling a “PeopleAggregator”. From bits and pieces I’ve picked up at their weblogs, in emails, and in comments elsewhere, this application will be able to create and consume and maintain FOAF files as well as networks of interlinked people who ‘know’ each other, as defined in these files. More, if someone within the network designates you a ‘friend’ in their FOAF file, the PeopleAggregator sends you an email asking for some form of confirmation.

(Again, this is based on casual discussion in comments and may be incorrect in whole or part.)

Rather than the network of friends being maintained behind walls ala Friendster, it’s out in the open with decentralized FOAF files that anyone can read. Now, what will become the social context of the relationships denoted as resources within these FOAF files? And what can be the social consequences of same?

Personally, I expect the first ‘Technorati of FOAF popularity” before the year is out. I wonder, what crown will we give to the man and woman voted most popular? Prom king and queen? I also wonder, how soon will we get emails saying, “Please remove me from your FOAF file — you don’t really know me” How soon will we get emails saying, “Why am I not in your FOAF file”?

If you doubt this, then look no further for proof than the plain, ordinary, unsemantic hypertext links that form our blogrolls. Remember public delinking, and how in the past this has been used as a measure of censorship, and as a form of punishment and control? I’ve been delinked, publicly and privately, from friend and foe, and believe me when I say there is more to this than a simple hypertext link, and the removal thereof.

Remember also the discussions of the power that these links provide within this communication medium because — as Clay Shirky has demonstrated with his power laws — those with a disproportionate share of weblogging links also have a disproportionate share of attention, and even respect?

Power and pain, reward and punishment, all encapsulated in a simple hypertext link, in a simple blogroll — what can happen within the socially explosive context of FOAF?

Both Tims might say that the FOAF example isn’t relevant — weblogging is its own problem and isn’t really representative of the web as a whole. After all, there are billions of pages on the web, and only about a half million webloggers, if that.

But webloggers are becoming the Semantic Web lab rats — through our curiosity and our interest, we’re the first to test these Semantic Web tools outside of labs and universities. We’re the ones that propagate the data and the technologies. When faced with confusion, we’ll wing it. We did so with RSS 1.0, we’re doing so with Pie/Echo/Atom and now we’re continuing the trend with FOAF.

FOAF is becoming the bastard child that grew from the seeds that fell between the cracks of W3C debates or were discarded with all the other messy ‘touchy-feely’ stuff, such as social context surrounding URIs. It’s the wolf child tempered in the pack, surviving on an existence of “keep what works, throw out the rest”. One can’t blame it, then, if it, and we, don’t behave properly when invited to the Semantic Web tea.

And the more I look at these photos the more I think some are upside down. I can’t really tell for sure, the slides aren’t properly marked — but the images are pretty and my representing them upside down on the web doesn’t stop the birds from flying.