Categories
RDF Semantics

I love you 25% of the time

Oooo. This is fun. *claps hands*

David Weinberger asks:

Let’s say I want to express in an RDF triple not simply that A relates to B, but the degree of A’s relationship to B. E.g.:

Bill is 85% committed to Mary

The tint of paint called Purple Dawn is 30% red

Frenchie is 75% likely to beat Lefty

Niagara Falls is 80% in Canada

Other than making up a set of 100 different relationships (e.g., “is in 1%,” “is in 2%,” etc.), how can that crucial bit of metadata about the relationship be captured in RDF?

In my opinion, there is no one way to record a percentage in RDF. That’s the same as saying that being faithful to a lover 50% of the time is equivalent to eating only 50% of a banana split.

So let’s take just one of the examples David gives us: Niagra Falls is 80% in Canada. At first glance, if we wanted to limit ourselves to recording this fact, using one and only one triple, we could do the following:

Niagra Falls — has an 80% existence — in Canada.

That records the fact. If I were specifically looking this information up, I would have it. The only point is, that’s all I would have. I could continue this, as David says, with an 81% existence, and an 82% existence and so on. How tedious. Humans don’t work this way. We don’t memorize every single number in existence. No we memorize ten characters, and we devise a numeric system to derive the rest–learning how to use this number system instead of memorizing all possible numbers.

What we need is a way of capturing that ability to derive new concepts from existing facts using a set of triples in the form: subject predicate object.

Rather than dive straight into the triples, let’s look at the question from a perspective of David, being David, and me being me, and this being April, 2006. In other words–let’s look at what David is really saying when he gives the sentence: Niagra Falls is 80% in Canada.

When David said Niagra Falls is 80% in Canada, what he’s saying, in an assumed short-hand way, the following:

Niagra Falls exists 80% in Canada.

This statement was made in 2006.

Canada is a country.
A country is a political entity, which may, or may not have, a fixed physical location.

Niagra Falls is a physical entity.
Niagra Falls has a physical location.
Niagra Falls has an area, bounded by longitude and latitude.

Niagra Falls’ physical location has nothern terminus longitude of ____.
Niagra Falls’ physical location has a southern terminus longitude of ____.
Niagra Falls’ physical location has a western terminus latitude of _____.

In 2006 Canada’s southern most border is at longitude ____.
In 2006 Canada’s western border is at latitude of ____.
In 2006 Canada’s northern most border is at longitude ____.
In 2006 Canada’s eastern most border is at latitude of ____.

Why all of the different sentences? Because there’s more to the statement “Niagra Falls is 80% in Canada” than first appears from just the words. We want to capture not only the essence of the words, but also the assumptions and inferences that we, as humans, make based on the words.

Given David’s statement that Niagra Falls is 80% in Canada, what can we infer?

That the statement about Niagra Falls being 80% in Canada was made in 2006.
That Niagra Falls has an area bordered by such and such latitude and such and such longitude. This is a physical, fixed, location (though not immutable).
That in 2006, Canada has an area border by such and such latitude and such and such longitude. This is a mutable, political border, though rarely changing.

Based on all of these, we can determine that 80% of Niagra Falls is in Canada.

The semantic web means capturing information so that we can make inferences based on conclusions. Since wetware is still experimental, and we haven’t yet created machines that can build inferences without a little help from us’ons, we provide enough of the other details to reach a point where we can infer all the facts from a given statement.

Therefore, we have the following triples (using English syntax rather than Turtle or some other mechanistic format, since I’m writing for people not machines right at the moment):

A geographical object has a physical existence at a point in time.
A geographical object’s physical existence can be measured in area.
The area of a geographical physical object’s physical existence is found by taking the length of one side and multiplying it by the length of the other (broadly speaking).
The length of one side can be found by finding the difference of it’s boundaries, as measured by it’s southern and nothern longitudes.
The length of the other side can be found by finding the difference of it’s boundaries, as measured by it’s western and eastern latitudes.

A geopolitical object is also a geographical object.
A country is a geopolitical object.
Canada is a country.

Canada’s 2006 border has a northern most longitude of ____.
Canada’s 2006 border has a southern most longitude of ____.
Canada’s 2006 border has a western most latitude of _____.
Canada’s 2006 border has a eastern most latitude of ______.

Niagra has a northern most longitude of ______.
Niagra has a southern most longitude of ______.
Niagra has a western most latitude of _____.
Niagra has an eastern most latitude of _____.

Seems like a lot, but this is actually capturing what David is saying; he just doesn’t know he’s saying it. If we just recorded the fact Niagra Falls is 80% in Canada, we would be leaving all the important bits behind.

There’s better schema folk than I, and they can, most likely, come up with better triples. The point is that RDF doesn’t record facts. We have existing models that do a dandy job of recording facts. Given an infinitely long, one-dimensional flat plane where all facts have a single point of existence, we have systems that can capture snapshots of this plane far more efficiently than RDF.

Consider instead, a model of knowledge that consists of an infinite number of finite planes of information, intersecting infinitely. That’s RDF’s space, recording these points of intersection.