Categories
Semantics

FOAF, Flocking, and the Semantics of Starlings

Recovered from the Wayback Machine.

This weekend I played a bit more with the attachment that allows me to take photos of slides with my digital camera. The ones shown here I took years ago when I lived in Portland, Oregon. The subject is a flock of European Starlings at sunset, just after a storm.

Every year our apartment complex in Portland would be overrun with flocks of starlings that swooped and swirled about, covering the trees and darkening the sky — whatever that part of the sky that wasn’t already darkened by the rain clouds that were a part of our life in Portland. Pretty as they may seem from the photos, the Starlings were a pest — a species that didn’t belong in the area, and one that would take food and habitat away from native birds. Their waste was corrosive to cars, and damaging to buildings and streets; additionally, the birds are known to carry and spread disease.

swallows1.jpg

The apartment would bring in a bird specialist who had this explosive air cannon, which he would shoot at the trees to scare the birds off. (Rather unnerving for tenants in addition to birds.) The starlings would leave for a time, but they always managed to find their way back; they are nothing if not tenacious.

Starlings are a flocker, following lead birds almost obsessively, and it was fascinating to watch as one flock of starlings would meet head-on with another flock — thousands of birds racing towards each other in what you would expect would be a collision, but would coalesce into this wonderful ballet of birds flying over and around each other, literally riding the wake in the air each other caused.

What do starlings and their behavior have to do with the Semantic Web? Only in I was reminded of this ‘heads on’ behavior this weekend when I was quietly reading the various entries out at the W3C’s TAG (Technical Architecture) email list — all about URIs and resources, and what it all means…and doesn’t mean. The discussions spilled out into weblogging when Tim Bray wrote a posting titled On Resources. Focus on the web of now, Tim says, and document what we have now:

So, explaining the Web-as-it-is would be enough to make me happy. Clearly, we should have an eye to the future, and, in writing down the architecture, try to avoid making life difficult for any others who are working to make something new and important involving the Web. Obvious examples are the Semantic-Web and Web-Services efforts.

But at the end of the day, the success criterion for me is having the success criteria for the Web-as-it-is explained clearly and convincingly.

In other words, the focus of TAG should be on what exists now, not what might exist some day. As for resources and their identifiers — those pesky little devils — Tim wrote:

We could just not talk about resources in the Architecture document. That wouldn’t get in the way of any software that I know of. But I suspect that this would impair the document’s usefulness as people paged frantically back and forth trying to figure out what URIs identify. Perhaps there’s a middle ground, where we say that the nature of resources is outside the scope of this document, aside from the fact that they are what is named by URIs.

Tim Berners-Lee wasn’t particularly happy with Tim Bray’s essay. In the TAG email list he wrote:

You say that the TAG should concentrate on the web as it has been
before the semantic web and web services, and that you will be happy if
the architecture works for that, even if it does not work for web
services and semantic web.

That is a pity, partly because the web is no good unless it can be a
sound foundation for the semantic web and web services too. WSDL (ed. Web Services Description Language) and RDF (ed. Resource Description Language) have real serious issues on the table, working groups which need a consistent framework.

At first glance, it seems as if the two Tims were at opposite ends of a circle — the web of the now versus the web of the future. Mathematically defining resources as compared to basically ignoring the concept as one that can’t be effectively defined. One could then assume that their opinions cancel each other out, leaving us a big fat zero in understanding. On the contrary: like the two flocks of starlings converging together from opposite directions — resulting in a thing of great beauty and great destructiveness — Tim B and TimBL have articulated the dichotomy behind the debate of what is a ‘resource’, and how is it identified within the Semantic Web (as introduced earlier this week). But that’s a dry summation — what they’ve really done is articulate the challenges of the Semantic Web:

To be a Semantic Web, it must be mechanical, and therefore precise, mathematical, and ultimately unambiguous. But to be a Semantic Web, it must also encapsulate meaning, context, and embrace ambiguity. Ignore the discontinuities, embrace the discontinuities.

swallows2.jpg

What does this all mean? If a resource is defined to be anything, including something abstract then how can it have an identifier on the web, in the form of a URI? But if a resource within the context of the Semantic Web is defined to be something on the web, then how can it not have a URI? If we limit resources to things on the web, how can we identify things as disparate as a person, a galaxy, and an abstraction such as a metaphor in a poem? And how can one global set of URIs work for all items, at all granularities?

If a resource is a representation of something, and one that exists on the web, then software can be designed with an assumption that if you access the URI, something is returned. But can all ‘resources’ of interest within the Semantic Web be represented with something on the web, and identified by a URI? What about peace — can it be constrained within a representation? It’s hard enough identifying it in ‘real life’, how would it be represented on the web? How about you and I? Can we be represented on the web?

Questions! So many questions. And topmost in your mind might be: Why should I care?

Frankly, I’m not sure you should care about this debate and the Semantic Web — you cannot eat it, sleep with it, or use it to rear your young. However, you might care because what’s being discussed is the scope of what will be a part of the architecture of the Semantic Web. If the Internet and the Web, and all of its simple hyperlinkness has invaded your life to a degree now, how much more so will it if it becomes richer, more complex, and more meaningful?

I personally care about this debate because I want to make sure my metaphor, my syllogism, and my analogy are represented effectively or my own Turing Test for the Semantic Web will never come about. I don’t want these abstract concepts to be discarded because they can’t be mathematically defined.

“What we need to understand may only be expressible in a language that we do not know.”

Anthony Judge

That would be a pity.

swallows4.jpg

In the title, I introduced FOAF, and you might be wondering where this simple RDF-based vocabulary fits into this grand debate. I could wish that the membership of the W3C wasn’t so averse to webloggers — our seeming arrogance and assumptions of our importance on the web, and our messiness — because the issues the TAG members are discussing are related to what’s happening with FOAF among the weblogging community. It is a microcosm of the Semantic Web, with its rich possibilities and its many ambiguities and misunderstandings.

To return to FOAF: FOAF represents both people and relationships, the former being concrete but difficult to physically put on the web, the latter being an abstract concept.

Me. The representation of “me” in this context is that which is described in a FOAF file. I am identified primarily by a hash of my email address — in this case, in this microcosm, I am known as:

cd2b130288f7c417b7321fb51d240d570c520720

You may call me “2b” of “Bb” for short.

In addition, my current FOAF file has a property defined within it — knows — and the object of this property is another person — Simon St. Laurent. In this file, I say, “I know Simon St. Laurent”, and to identify Simon within whatever FOAF system might exist, I use the hash of his email address:

65d7213063e1836b1581de81793bfcb9ad596974

I suppose you could call Simon “e183” for short.

Both Tims should be unhappy with my FOAF file, I would think, following from their arguments described earlier. For instance, there is a resource with a representation on the web — myself — but there is no URI for it; not only that we’re not completely sure what the resource is, but we’re not ignoring it, either.

Within a Semantic Web of moving parts and grinding bits, FOAF doesn’t fit.

Tim Bray took on an action item recently to draft language surrounding information resources as compared to resources. As he wrote in another TAG email:

Many existing Web servers and clients (for example web browsers) do not have any notion of what the Resource identified by a URI is. However, humans and Semantic Web software are strongly concerned with this issue. Some resources are perceived as falling into a class called “Information Resources”. That is to say, they are on-line units of electronic information or service. Examples would include a photograph, a news story, and a weather forecast for Oaxaca. Other resources named by URIs may exist entirely apart from the Web. Examples include an edition of some book identified by urn:isbn:0-395-36341-1, a person identified in an RDF assertion using http://example.com/foaf#Dan, and an XML namespace such as http://www.w3.org/1999/02/22-rdf-syntax-ns#. The Web may be used to obtain representations of both kinds of resources.

What Tim is saying is that either a resource exists on the web, or a representation of the object exists on the web. If the URI has an associated protocol, such as the FOAF identifier given in Tim’s example, its representation is accessible on the web if it isn’t itself.

Or is it?

Not one FOAF file I have seen uses a URI to represent the person. They either don’t use anything, or they use what is known as a blank node identifier, which is only relevant to the file. However, the lack of a URI hasn’t impacted adversely in FOAF because each person identified within the context of FOAF is done so through two alternative keys: the mbox_sha1sum, which is the hashed representation of my email address; and/or the URL (URI) of my FOAF file — http://burningbird.net/foaf.rdf.

Neither key is officially a URI of my representation within the context of either the existing or the semantic web. We have simply worked around the debate and issue of how can one identify a representation of ourselves on the Net by not using an identifier. We could use an analogy of parents arguing about proper diet, while we hungry children raid the fridge and eat all the pie. We could, but there’s no URI for analogies, either, and therefore must not have a proper place in a proper discussion of the proper Semantic Web.

Then there’s the issue of what’s being identified — is the person the resource? Or is the FOAF information the resource, and I’m defined by many such? Additionally, FOAF files also denote other resources — or what we’re assuming are resources because they are, after all, defined within the Resource Description Framework — and they’re parameterized, if that’s the right word, by using the RDF property ‘knows’. However, we don’t have a good understanding of what it is we’re defining with ‘knows’. Is it denoting a relationship? Or is it nothing more than an acknowledgment that I literally know who Simon St. Laurent is? Is Simon a friend, because he’s in a FOAF file? If so then, what were to happen if I wasn’t in Simon’s FOAF file?

If I were to remove Simon from my FOAF file, am I disavowing the friendship? Or am I ‘pretending’ that I don’t ‘know’ Simon? With FOAF, we not only assert truth, we assert a lie, because I know Simon, and him not being in my FOAF file, or not, doesn’t change this. If I don’t list him, am I lying by omission? What does it mean to be in one of these files? What does it mean when you are not?

Will you ‘feel’ it, when you’re not?

The fact is that FOAF is being used as a representation of something, we’re making assertions about something but we’re not sure of what. Whatever it is, though, it’s loaded with connotations.

Within FOAF we’re representing information about ourselves, but it’s not us — too flat, too two-dimensional to be a representation of us. Additionally, we’re representing relationships with other people, but we’re each bringing our own interpretations of these relationships along for the ride. In other words, we’re making assertions of relationships and attaching social context to them.

In the RDF Concepts working draft, there was a section that discussed the social context of assertions. It is the one and only section of all the RDF documents that brings up the issue of social context about RDF statements. The only one. And of course, it is this section that the Semantic Web Architecture recommended be struck.

Why the cut? From the meeting minutes where the recommendation arose, it would seem to come back to our old debate of URI and context. There was too much confusion about what was meant by ‘identify’, by URI, by resource. As Dan Brickley said in IRC notes:

21:10:54 [bwmscribe]
authoritive definition of URI’s: i.e. who gets to say what a URI denotes
21:12:48 [danbri]
something like “RDF graphs have propositional content. Their meaning is fixed by a bunch of hairy stuff only partly understood and documented (eg. implicit theory of reference associated with URIs). Minor health warning. The End.”

But that doesn’t stop the confusion — ignoring the concept of ‘resource’, postponing the issue of identity, and ignoring social context because it’s too hard to define, won’t prevent problems when people act to fill the void that’s left. As Kendall Clark wrote:

This way of carrying on the social meaning debate was unlikely to lead to a satisfactory resolution, since it was possible to strike the problematic language without solving or addressing the substantive issues which animate the debate in the first place.

swallows5.jpg

Consider FOAF files again: Marc Cantor and Eric Sigler are working on this thing that Marc is calling a “PeopleAggregator”. From bits and pieces I’ve picked up at their weblogs, in emails, and in comments elsewhere, this application will be able to create and consume and maintain FOAF files as well as networks of interlinked people who ‘know’ each other, as defined in these files. More, if someone within the network designates you a ‘friend’ in their FOAF file, the PeopleAggregator sends you an email asking for some form of confirmation.

(Again, this is based on casual discussion in comments and may be incorrect in whole or part.)

Rather than the network of friends being maintained behind walls ala Friendster, it’s out in the open with decentralized FOAF files that anyone can read. Now, what will become the social context of the relationships denoted as resources within these FOAF files? And what can be the social consequences of same?

Personally, I expect the first ‘Technorati of FOAF popularity” before the year is out. I wonder, what crown will we give to the man and woman voted most popular? Prom king and queen? I also wonder, how soon will we get emails saying, “Please remove me from your FOAF file — you don’t really know me” How soon will we get emails saying, “Why am I not in your FOAF file”?

If you doubt this, then look no further for proof than the plain, ordinary, unsemantic hypertext links that form our blogrolls. Remember public delinking, and how in the past this has been used as a measure of censorship, and as a form of punishment and control? I’ve been delinked, publicly and privately, from friend and foe, and believe me when I say there is more to this than a simple hypertext link, and the removal thereof.

Remember also the discussions of the power that these links provide within this communication medium because — as Clay Shirky has demonstrated with his power laws — those with a disproportionate share of weblogging links also have a disproportionate share of attention, and even respect?

Power and pain, reward and punishment, all encapsulated in a simple hypertext link, in a simple blogroll — what can happen within the socially explosive context of FOAF?

Both Tims might say that the FOAF example isn’t relevant — weblogging is its own problem and isn’t really representative of the web as a whole. After all, there are billions of pages on the web, and only about a half million webloggers, if that.

But webloggers are becoming the Semantic Web lab rats — through our curiosity and our interest, we’re the first to test these Semantic Web tools outside of labs and universities. We’re the ones that propagate the data and the technologies. When faced with confusion, we’ll wing it. We did so with RSS 1.0, we’re doing so with Pie/Echo/Atom and now we’re continuing the trend with FOAF.

FOAF is becoming the bastard child that grew from the seeds that fell between the cracks of W3C debates or were discarded with all the other messy ‘touchy-feely’ stuff, such as social context surrounding URIs. It’s the wolf child tempered in the pack, surviving on an existence of “keep what works, throw out the rest”. One can’t blame it, then, if it, and we, don’t behave properly when invited to the Semantic Web tea.

And the more I look at these photos the more I think some are upside down. I can’t really tell for sure, the slides aren’t properly marked — but the images are pretty and my representing them upside down on the web doesn’t stop the birds from flying.

swallows3.jpg

Categories
Semantics

The Semantics of Starlings

This weekend I played a bit more with the attachment that allows me to take photos of slides with my digital camera. The ones shown here I took years ago when I lived in Portland, Oregon. The subject is a flock of European Starlings at sunset, just after a storm.

Every year our apartment complex in Portland would be overrun with flocks of starlings that swooped and swirled about, covering the trees and darkening the sky — whatever that part of the sky that wasn’t already darkened by the rain clouds that were a part of our life in Portland. Pretty as they may seem from the photos, the Starlings were a pest — a species that didn’t belong in the area, and one that would take food and habitat away from native birds. Their waste was corrosive to cars, and damaging to buildings and streets; additionally, the birds are known to carry and spread disease.

swallows1.jpg

The apartment would bring in a bird specialist who had this explosive air cannon, which he would shoot at the trees to scare the birds off. (Rather unnerving for tenants in addition to birds.) The starlings would leave for a time, but they always managed to find their way back; they are nothing if not tenacious.

Starlings are a flocker, following lead birds almost obsessively, and it was fascinating to watch as one flock of starlings would meet head-on with another flock — thousands of birds racing towards each other in what you would expect would be a collision, but would coalesce into this wonderful ballet of birds flying over and around each other, literally riding the wake in the air each other caused.

What do starlings and their behavior have to do with the Semantic Web? Only in that, I was reminded of this ‘heads on’ behavior this weekend when I was quietly reading the various entries out at the W3C’s TAG (Technical Architecture) email list — all about URIs and resources, and what it all means…and doesn’t mean. The discussions spilled out into weblogging when Tim Bray wrote a posting titled On Resources. Focus on the web of now, Tim says, and document what we have now:

So, explaining the Web-as-it-is would be enough to make me happy. Clearly, we should have an eye to the future, and, in writing down the architecture, try to avoid making life difficult for any others who are working to make something new and important involving the Web. Obvious examples are the Semantic-Web and Web-Services efforts.

But at the end of the day, the success criterion for me is having the success criteria for the Web-as-it-is explained clearly and convincingly.

In other words, the focus of TAG should be on what exists now, not what might exist some day. As for resources and their identifiers — those pesky little devils — Tim wrote:

We could just not talk about resources in the Architecture document. That wouldn’t get in the way of any software that I know of. But I suspect that this would impair the document’s usefulness as people paged frantically back and forth trying to figure out what URIs identify. Perhaps there’s a middle ground, where we say that the nature of resources is outside the scope of this document, aside from the fact that they are what is named by URIs.

Tim Berners-Lee wasn’t particularly happy with Tim Bray’s essay. In the TAG email list he wrote:

You say that the TAG should concentrate on the web as it has been
before the semantic web and web services, and that you will be happy if the architecture works for that, even if it does not work for web
services and semantic web.

That is a pity, partly because the web is no good unless it can be a
sound foundation for the semantic web and web services too. WSDL (ed. Web Services Description Language) and RDF (ed. Resource Description Language) have real serious issues on the table, working groups which need a consistent framework.

At first glance, it seems as if the two Tims were at opposite ends of a circle — the web of the now versus the web of the future. Mathematically defining resources as compared to basically ignoring the concept as one that can’t be effectively defined. One could then assume that their opinions cancel each other out, leaving us a big fat zero in understanding. On the contrary: like the two flocks of starlings converging together from opposite directions — resulting in a thing of great beauty and great destructiveness — Tim B and TimBL have articulated the dichotomy behind the debate of what is a ‘resource’, and how is it identified within the Semantic Web (as introduced earlier this week). But that’s a dry summation — what they’ve really done is articulate the challenges of the Semantic Web:

To be a Semantic Web, it must be mechanical, and therefore precise, mathematical, and ultimately unambiguous. But to be a Semantic Web, it must also encapsulate meaning, context, and embrace ambiguity. Ignore the discontinuities, embrace the discontinuities.

What does this all mean? If a resource is defined to be anything, including something abstract then how can it have an identifier on the web, in the form of a URI? But if a resource within the context of the Semantic Web is defined to be something on the web, then how can it not have a URI? If we limit resources to things on the web, how can we identify things as disparate as a person, a galaxy, and an abstraction such as a metaphor in a poem? And how can one global set of URIs work for all items, at all granularities?

If a resource is a representation of something, and one that exists on the web, then software can be designed with an assumption that if you access the URI, something is returned. But can all ‘resources’ of interest within the Semantic Web be represented with something on the web, and identified by a URI? What about peace — can it be constrained within a representation? It’s hard enough identifying it in ‘real life’, how would it be represented on the web? How about you and I? Can we be represented on the web?

Questions! So many questions. And topmost in your mind might be: Why should I care?

Frankly, I’m not sure you should care about this debate and the Semantic Web — you cannot eat it, sleep with it, or use it to rear your young. However, you might care because what’s being discussed is the scope of what will be a part of the architecture of the Semantic Web. If the Internet and the Web, and all of its simple hyperlinkness has invaded your life to a degree now, how much more so will it if it becomes richer, more complex, and more meaningful?

I personally care about this debate because I want to make sure my metaphor, my syllogism, and my analogy are represented effectively or my own Turing Test for the Semantic Web will never come about. I don’t want these abstract concepts to be discarded because they can’t be mathematically defined.

“What we need to understand may only be expressible in a language that we do not know.”

Anthony Judge

That would be a pity.

In the title, I introduced FOAF, and you might be wondering where this simple RDF-based vocabulary fits into this grand debate. I could wish that the membership of the W3C wasn’t so averse to webloggers — our seeming arrogance and assumptions of our importance on the web, and our messiness — because the issues the TAG members are discussing are related to what’s happening with FOAF among the weblogging community. It is a microcosm of the Semantic Web, with its rich possibilities and its many ambiguities and misunderstandings.

To return to FOAF: FOAF represents both people and relationships, the former being concrete but difficult to physically put on the web, the latter being an abstract concept.

Me. The representation of “me” in this context is that which is described in a FOAF file. I am identified primarily by a hash of my email address — in this case, in this microcosm, I am known as:

cd2b130288f7c417b7321fb51d240d570c520720

You may call me “2b” of “Bb” for short.

In addition, my current FOAF file has a property defined within it — knows — and the object of this property is another person — Simon St. Laurent. In this file, I say, “I know Simon St. Laurent”, and to identify Simon within whatever FOAF system might exist, I use the hash of his email address:

65d7213063e1836b1581de81793bfcb9ad596974

I suppose you could call Simon “e183” for short.

Both Tims should be unhappy with my FOAF file, I would think, following from their arguments described earlier. For instance, there is a resource with a representation on the web — myself — but there is no URI for it; not only that we’re not completely sure what the resource is, but we’re not ignoring it, either.

Within a Semantic Web of moving parts and grinding bits, FOAF doesn’t fit.

Tim Bray took on an action item recently to draft language surrounding information resources as compared to resources. As he wrote in another TAG email:

Many existing Web servers and clients (for example web browsers) do not have any notion of what the Resource identified by a URI is. However, humans and Semantic Web software are strongly concerned with this issue. Some resources are perceived as falling into a class called “Information Resources”. That is to say, they are on-line units of electronic information or service. Examples would include a photograph, a news story, and a weather forecast for Oaxaca. Other resources named by URIs may exist entirely apart from the Web. Examples include an edition of some book identified by urn:isbn:0-395-36341-1, a person identified in an RDF assertion using http://example.com/foaf#Dan, and an XML namespace such as http://www.w3.org/1999/02/22-rdf-syntax-ns#. The
Web may be used to obtain representations of both kinds of resources.

What Tim is saying is that either a resource exists on the web, or a representation of the object exists on the web. If the URI has an associated protocol, such as the FOAF identifier given in Tim’s example, it’s representation is accessible on the web if it isn’t itself.

Or is it?

Not one FOAF file I have seen uses a URI to represent the person. They either don’t use anything, or they use what is known as a blank node identifier, which is only relevant to the file. However, the lack of a URI hasn’t impacted adversely in FOAF because each person identified within the context of FOAF is done so through two alternative keys: the mbox_sha1sum, which is the hashed representation of my email address; and/or the URL (URI) of my FOAF file — http://burningbird.net/foaf.rdf.

Neither key is officially a URI of my representation within the context of either the existing or the semantic web. We have simply worked around the debate and issue of how can one identify a representation of ourselves on the Net by not using an identifier. We could use an analogy of parents arguing about proper diet, while we hungry children raid the fridge and eat all the pie. We could, but there’s no URI for analogies, either, and therefore must not have a proper place in a proper discussion of the proper Semantic Web.

Then there’s the issue of what’s being identified — is the person the resource? Or is the FOAF information the resource, and I’m defined by many such? Additionally, FOAF files also denote other resources — or what we’re assuming are resources because they are, after all, defined within the Resource Description Framework — and they’re parameterized, if that’s the right word, by using the RDF property ‘knows’. However, we don’t have a good understanding of what it is we’re defining with ‘knows’. Is it denoting a relationship? Or is it nothing more than an acknowledgment that I literally know who Simon St. Laurent is? Is Simon a friend, because he’s in a FOAF file? If so then, what were to happen if I wasn’t in Simon’s FOAF file?

If I were to remove Simon from my FOAF file, am I disavowing the friendship? Or am I ‘pretending’ that I don’t ‘know’ Simon? With FOAF, we not only assert the truth, we assert a lie because I know Simon, and him not being in my FOAF file or not doesn’t change this. If I don’t list him, am I lying by omission? What does it mean to be in one of these files? What does it mean when you are not?

Will you ‘feel’ it, when you’re not?

The fact is that FOAF is being used as a representation of something, we’re making assertions about something but we’re not sure of what. Whatever it is, though, it’s loaded with connotations.

Within FOAF we’re representing information about ourselves, but it’s not us — too flat, too two-dimensional to be a representation of us. Additionally, we’re representing relationships with other people, but we’re each bringing our own interpretations of these relationships along for the ride. In other words, we’re making assertions of relationships and attaching social context to them.

In the RDF Concepts working draft, there was a section that discussed the social context of assertions. It is the one and only section of all the RDF documents that brings up the issue of social context about RDF statements. The only one. And of course, it is this section that the Semantic Web Architecture recommended be struck.

Why the cut? From the meeting minutes where the recommendation arose, it would seem to come back to our old debate of URI and context. There was too much confusion about what was meant by ‘identify’, by URI, by resource. As Dan Brickley said in IRC notes:

21:10:54 [bwmscribe]
authoritive definition of URI’s: i.e. who gets to say what a URI denotes
21:12:48 [danbri]
something like “RDF graphs have propositional content. Their meaning is fixed by a bunch of hairy stuff only partly understood and documented (eg. implicit theory of reference associated with URIs). Minor health warning. The End.”

But that doesn’t stop the confusion — ignoring the concept of ‘resource’, postponing the issue of identity, and ignoring social context because it’s too hard to define, won’t prevent problems when people act to fill the void that’s left. As Kendall Clark wrote:

This way of carrying on the social meaning debate was unlikely to lead to a satisfactory resolution, since it was possible to strike the problematic language without solving or addressing the substantive issues which animate the debate in the first place.

Consider FOAF files again: Marc Cantor and Eric Sigler are working on this thing that Marc is calling a “PeopleAggregator”. From bits and pieces I’ve picked up at their weblogs, in emails, and in comments elsewhere, this application will be able to create and consume and maintain FOAF files as well as networks of interlinked people who ‘know’ each other, as defined in these files. More, if someone within the network designates you a ‘friend’ in their FOAF file, the PeopleAggregator sends you an email asking for some form of confirmation.

(Again, this is based on casual discussion in comments and may be incorrect in whole or part.)

Rather than the network of friends being maintained behind walls ala Friendster, it’s out in the open with decentralized FOAF files that anyone can read. Now, what will become the social context of the relationships denoted as resources within these FOAF files? And what can be the social consequences of same?

Personally, I expect the first ‘Technorati of FOAF popularity” before the year is out. I wonder, what crown will we give to the man and woman voted most popular? Prom king and queen? I also wonder, how soon will we get emails saying, “Please remove me from your FOAF file — you don’t really know me” How soon will we get emails saying, “Why am I not in your FOAF file”?

If you doubt this, then look no further for proof than the plain, ordinary, unsemantic hypertext links that form our blogrolls. Remember public delinking, and how in the past this has been used as a measure of censorship, and as a form of punishment and control? I’ve been delinked, publicly and privately, from friend and foe, and believe me when I say there is more to this than a simple hypertext link, and the removal thereof.

Remember also the discussions of the power that these links provide within this communication medium because — as Clay Shirky has demonstrated with his power laws — those with a disproportionate share of weblogging links also have a disproportionate share of attention, and even respect?

Power and pain, reward and punishment, all encapsulated in a simple hypertext link, in a simple blogroll — what can happen within the socially explosive context of FOAF?

Both Tims might say that the FOAF example isn’t relevant — weblogging is its own problem and isn’t really representative of the web as a whole. After all, there are billions of pages on the web, and only about a half million webloggers, if that.

But webloggers are becoming the Semantic Web lab rats — through our curiosity and our interest, we’re the first to test these Semantic Web tools outside of labs and universities. We’re the ones that propagate the data and the technologies. When faced with confusion, we’ll wing it. We did so with RSS 1.0, we’re doing so with Pie/Echo/Atom and now we’re continuing the trend with FOAF.

FOAF is becoming the bastard child that grew from the seeds that fell between the cracks of W3C debates or were discarded with all the other messy ‘touchy-feely’ stuff, such as social context surrounding URIs. It’s the wolf child tempered in the pack, surviving on an existence of “keep what works, throw out the rest”. One can’t blame it, then, if it, and we, don’t behave properly when invited to the Semantic Web tea.

And the more I look at these photos the more I think some are upside down. I can’t really tell for sure, the slides aren’t properly marked — but the images are pretty and my representing them upside down on the web doesn’t stop the birds from flying.

 

Categories
Semantics

Context and Meaning

In the comments to FOAF Girl!, Joseph Duemer wrote:

The idea that a link represents “friendship” is so bizarre it had to come from some geek’s stunted view of social relations.

A link is an association, the literary sense of that word, but only the context of the link can provide the meaning, the implication, the “spin.” To engineer friendliness into a link, even on a blogroll, demonstrates a profoundly impoverished social imagination. How come so much web psychology seems to have been theorized by seventeen year old boys who don’t get out enough?

Obvious disdain for the premise aside, Joseph has reached what is the heart, the most key element, of the Semantic Web — how do we capture the context of information, because it is the context, not the data itself, that brings in semantics.

Lately, with FOAF and other uses of RDF, there is an assumption that if we just capture enough metadata, identified uniquely by URI (Uniform Resource Identifier) and throw it together, documented with RDF/XML, we’ll have the Semantic Web. More, if we just create enough web services that use this data, we’ll have the Semantic Web. The Semantic Web is about technology.

This couldn’t be further from the truth — a link is a link is a link. The Semantic Web isn’t about technology, it’s about people and about communication.

Saying that the link is a unique representation of a person, and that the existence of this link within a FOAF file denotes that this person is a ‘friend’ of the person who put it there, without understanding the context behind this link being within this file or representing this person, all we have is a pairing between what could be a useful link and an ambiguous and somewhat overused term.

As serendipity, my old friend would have it the release of Kendall Clark’s new article at xml.com titled “Social Meaning and the Cult of Tim” covers a debate that has much of this issue at its core. The debate is between Pat Hayes, the man behind the semantics documents for RDF and Tim Berners-Lee, the head of the W3C and the founder of today’s Web.

Ostensibly, the debate is about URIs, but looking closely, reading the words, it is nothing less than the issue of this ‘context’ mentioned earlier, the semantics, if you will behind the Semantic Web.

(In addition, in his article, Clark not only introduces the topic of this conversation, he also introduces the concept that there is an unspoken rule, that one does not publicly criticize TimBL because it is on TimBL’s reputation that the Semantic Web will be built. I am unaware that such a taboo existed and if does, it’s ridiculous — anything of such far-reaching importance and impact as the Semantic Web cannot exist or depend on any one person. If debates are being weighed and measured based on TimBL’s agreement and disagreement, then the debaters are fla d, and should retire from the debates and stay home. Growing tomatoes or some such thing. )

TimBL has good arguments, but so does Pat, who impressed me when I was writing Practical RDF, and continues to impress me with the arguments I see him making in this debate, and in other related ones.

[The debate starts here, and continues on via a thread between TimBL and Pat (follow reply links at the bottom, note others also join in with excellent insight, but follow the TimBL/Pat Hayes thread all the way through, first). Another debate to read starts at this point, and covers much of the same issues.]

This debate rages around the concept of a unique URI identifying, or dare I say, ‘denoting’ a single resource on the web, and that a resource must have a unique URI in order for it to be part of the Semantic Web. This is a basic concept in RDF, and is one that allows us to create RDF vocabularies that can work together.

Consider this: Within RDF/XML there is a URI that represents me. It’s used in FOAF files, but could also be used in RSS files, in Creative Commons licenses, in any vocabulary that includes URIs representing a unique person. When you see this URI, you assume it is a representation for me, and since I’m unique, it’s unique.

Ultimately TimBL wants to fix this as a given, a standard, a law if you will. Eventually there will be a global set of URIs identifying unique resources on the Web and this will form the basis of the Semantic Web. If I read Tim correctly, context cannot enter into the equation because this makes the Semantic Web difficult to engineer.

Pat responds with:

…I insist that this stipulation of identifying one thing
isn’t sensible or even desireable. Well, at least, unless that word
“identify” means something different from “refer to” or “name” or
“denote” . What might indeed be true is that in many circumstances,
a URI somehow provides access to information which is sufficient to
enable someone or something to uniquely identify a particular thing
(that the representation accessed via that URI is in some sense
about), but even there the thing identified might vary between
contexts (such as when we use someones email address to refer to the
person) without harm. This kind of ambiguity resolved by context is
at the very basis of human communication: it works in human life, it
works on the Web, it will work on the semantic Web. Why do you want
to try to legislate it out of existence? You will not be able to, any
more than you will be able to stop people falling in love. All that
your ‘ideal design’ will accomplish is to make the architectural
pronouncements of the W3C more and more out of line with the way that
the Web is actually being used by real people.

TimBL responds with:

No. We are defining the semantic web NOT to work like natural language, but to work like mathematics.

Any system of mathematics has to be able to use symbols to denote things in the universe of discourse. You as a philosopher can perhaps handle a mathematics in which symbols denote whatever anyone likes at any point, but I as an engineer find it less useful.

When Pat writes, This kind of ambiguity resolved by context is at the very basis of human communication: it works in human life…, TimBL responds with, Yes, with natural language and peotry(sic) to which Pat replies, Never mind poetry, it works for all communication.

TimBL defends the concept of a global identifier system, primarily because it’s ncessary from an engineering perspective. Pat doesn’t necessarily disagree with the fact that’s necessary, but the assumption that there is a truth to this. As he writes:

We seem to be at cross purposes. Im not saying that the ‘unique
identification’ condition is an unattainable ideal: Im saying that it
doesn’t make sense, that it isn’t true, and that it could not
possibly be true. Im saying that it is *crazy*.

At first you might think that Pat is following the theoretical too much — the way of the semantician, the way of the linguist — until you start to see what exactly he’s saying. And he’s right. Pat is right.

What Pay is saying is that the concept of a URI identifying a single resource works, but it’s broken; but that’s alright, because it works, but don’t make any additional assumptions of truth based on this. In other words, there is a URI (the URL — the address you type into the browser, or the permalink) for this page, and this page can be identified by this URI. For the most part, as long as you and I agree on this, we can work together and create vocabularies and technologies that work together. But the concept is flawed because it does not take into account the contextof the URI — that thing that Joeseph pointed out in my comments. The “spin”.

It is the importance of the context that Pat defends, and it is this very context that TimBL says must not be taken into account, otherwise the system can’t be engineered. But it is the context that sets the assumptions we can make, and to use a universal set of assumptions is just as meaningless as to depend on a universal set of identifiers regardless of the context of their use.

Pat expanded on this in another thread:

BTW, the current usage of “resource” in the SW specifications is
vacuous: a SW Resource can be anything whatsoever, real or imaginary,  on or off the Web, in the past or future, of any nature, with or without a URI. So to claim that all SW resources ‘contain’ a Web
resource sounds like it would also have to be vacuous or else would
be obviously false (depending on what a ‘web resource’ is, which of
course I have no idea about, this never having been defined or
elucidated anywhere.)

I have no idea what an interface to an object could possibly be.
What kinds of interface do the following objects have: a grain of
sand, a galaxy, an imaginary detective ?

There’s the key, the understanding — what kind of global system can assimilate objects of such differing contexts as that of the micro (the sand), the macro (the galaxy), and the virtual (the imaginary detective)? No one system can, but no one system has to. What Pat is saying is that the current system works ‘good enough’. It works though it’s based on a broken premise of a global system of identifiers that can denote any one thing regardless of context. It’s okay that it works, and it’s okay that it’s broken — but don’t base laws and assumptions on a broken premise. Don’t attach meaning to the system, just use it.

This returns again to what Joseph said. How can we ‘denote’ a friend, just through a link labeled as such? It doesn’t take into account the context of the label and the system. By taking a set of links in a blogroll and creating a FOAF file and saying we ‘know’ each of the people listed in this blogroll, we’re taking the links out of context — a link in a blogroll is not the same thing as making a statement that I know this person, or this person is my friend. The only statement I’m making, taking into account ‘context’, is that I listed this person in my blogroll for some unspecified reason of which no one can truly make an assumption.

There may be an assumption that I did so because I read them, or that I like to read the person, or that I like the person. But that’s all this is, an assumption. Without the context behind my reasoning why I put links into my blogroll, how can one then extrapolate out that these links should then go into a FOAF file? Or vice versa? How can one extrapolate that the context of the FOAF file and the context of the blogroll are the same? Because the same identifiers are used in each?

Yes, the syntactic string representing a person may be the same in the FOAF file as in the blogroll, and because of this and the use of RDF, we can ‘technically’ combine and extrapolate this information — but without the context surrounding the use of the identifier in each case, you can’t make an assumption that the one ‘means’ the same as the other. In other words, you can’t extrapolate, meaningfully, from my URI appearing in a FOAF file to my URI appearing in a blogroll, because neither is ‘me’ — only me as I am represented within the context of each vocabulary.

Pat wrote (and I can’t find the exact email message):

How can one consider a link to be ‘the person’, when it is nothing more than a proxy, a representation that the Semantic Web requires because we have no other way to represent the person within the Semantic Web.

Within the Semantic Web, a URI is a proxy for, a representation of, something that can’t be represented any other way due to the liminations of the medium. And that’s okay, because it works. But, if I read Pat correctly, don’t add any additional ‘meaning’ to this representation other than the fact that it is a representation. To do so will perpetuate the broken premise.

FOAF can’t represent a friend, or a relationship directly. What it can do is provide a proxy for an association between two people, as marked by one of these people, and as a labeled friend, aquaintance, co-worker or whatever. It is not the actual relationship itself, and to see it as such, to treat it as such — to make it real because of this association in the file — removes the context of the FOAF file, which could have significant impact on the truth of the assertion.

Because I am listed as a ‘friend’ in AKMA’s FOAF file, does this make it real? You can’t assume it’s real just because it says so, in a FOAF file, with a link, representing a URI. It may be real — it is real to me because of my association and appreciation and affection I feel for both AKMA and his wife, Margaret — but the link, and the existence of the file, don’t make it real.

Moreover, you can’t extrapolate any additional meaning out of the FOAF file other than what is narrowly defined within the context of the vocabulary — FOAF shows that one person is making a statement that they know another person. Nothing more. Nothing less.

Pat isn’t being arbitrary, he’s making a critical point: we can only assume so much from a URI within context of a RDF vocabulary. To make additional assumptions is as false as to make an assumption that the context of FOAF and a blogroll overlap, and that my relationship to one person in this way, must mean that it’s the same as my relationship to this person in another context.

Dammit, I’m not saying this well, but it’s the very use of FOAF for other things outside of the context of this specific RDF vocabulary that forms the basis — in my interpretation, and I could be wrong — for Pat’s continued and persistent argument about the URI of an object being a proxy for that object, and that a URI has context. To ignore the context is to literally throw out the true semantics, leaving nothing in its place but a smarter, but still dumb, web.

Smarter web is okay, but I want a semantic web! I don’t care if the Semantic Web works for the technology if it doesn’t also work for the people.

By seeing the URI as a representation of an object that transcends context, we then erroneously make extrapolations, such as FOAF and the blogroll — harmless in this case, but not so harmless when you start bringing in issues of trust, and see FOAF as the basis of the web of trust.

What Pat is arguing about, the point he is trying to make, forms the basis of what is happening now. We only have a few RDF/XML vocabularies in wider use and already we’re seeing abuse because of making assertions based on flawed premises. This isn’t a semantics argument or a esoteric debate between philosphers — this is real stuff being implemented, a perpetuation of a premise that’s flawed.

I have more to say on this, later. I must read all the notes, think on this further. If I’ve mispresented either TimBL or Pat, my apologies to both and blame it on my interest and excitement about this debate.

Stay tuned.

Archive with comments at Wayback Machine

Categories
Specs

Well FOAF you too!

Recovered from the Wayback Machine

It would seem that there are folks out and about playing with RDF, in particular FOAF,a Friend-of-a-Friend RDF vocabulary. Mark Pilgrim’s playing with it. So is Sam Ruby and Phil.

Phil had some problems with the original FOAF file generated for him by the FOAF-o-matic in that it includes blank nodes — equivalent to a subject-predicate-object (noun-property-value) that doesn’t have a specified subject. He provided his own ‘label’ to the nodes so that they then wouldn’t be blank.

Usually a blank node is used when a label doesn’t serve a purpose or doesn’t yet exist. For instance, I might use a blank node (these used to be called anonymous nodes) to represent a “location” object. I don’t really care about accessing the location, I want to access the location’s parts: the city, the state, the zip code. I only use an object to group these items schematically, but I’m not interested in actually accessing the grouping directly.

If I decide to have multiple locations and I do want to identify them individually, then I would add labels to the nodes for the locations and they would no longer be blank.

One assumes the FOAF designers didn’t see one accessing specific person’s as much as one would access those attributes of person: name, SSN, etc.

In relational database systems, the concept behind a blank node is analogous to dummy keys or auto-generated identifiers given to uniquely identify a row in a database table. This identifier is mainly used by the database system, rarely be applications built against it, and never directly by people.

Anyway, back to FOAF. Friend of a friend. The purpose behind FOAF is increasing our knowledge about people in a community according to an article by Edd Dumbill. I’m most interested in FOAF because of the possibility of using it to build a complex web of trust based on the idea of this person knows someone, who knows someone else, who knows someone else, who knows someone else, who knows you, and so on. If you know and trust me and I know and trust Phil and he knows and trusts Joe down the road, you’re more likely to trust Joe because of this indirect relationship then if you just found him by happenstance.

FOAF becomes more usable, as with most RDF, when data from the various FOAF files are parsed and merged into a common data source, and then the recursive querying can occur. Who knows this Joe? Well, Phil knows Joe. I don’t know Phil, so who knows Phil? Shelley knows Phil, and on and on. It’s handy being able to query for Edd’s email address and nickname with FOAF, but it’s handier knowing who Edd trusts.

FOAF files are easy to generate and fairly easily to consume with any number of RDF APIs and tools (in Perl, Java, Python, PHP, and so on).

It’s an interesting vocabulary with some potentially interesting uses. I’ll be curious to see what uses the weblogging community come up with in their current explorations.