Context and Meaning

In the comments to FOAF Girl!, Joseph Duemer wrote:

The idea that a link represents “friendship” is so bizarre it had to come from some geek’s stunted view of social relations.

A link is an association, the literary sense of that word, but only the context of the link can provide the meaning, the implication, the “spin.” To engineer friendliness into a link, even on a blogroll, demonstrates a profoundly impoverished social imagination. How come so much web psychology seems to have been theorized by seventeen year old boys who don’t get out enough?

Obvious disdain for the premise aside, Joseph has reached what is the heart, the most key element, of the Semantic Web — how do we capture the context of information, because it is the context, not the data itself, that brings in semantics.

Lately, with FOAF and other uses of RDF, there is an assumption that if we just capture enough metadata, identified uniquely by URI (Uniform Resource Identifier) and throw it together, documented with RDF/XML, we’ll have the Semantic Web. More, if we just create enough web services that use this data, we’ll have the Semantic Web. The Semantic Web is about technology.

This couldn’t be further from the truth — a link is a link is a link. The Semantic Web isn’t about technology, it’s about people and about communication.

Saying that the link is a unique representation of a person, and that the existence of this link within a FOAF file denotes that this person is a ‘friend’ of the person who put it there, without understanding the context behind this link being within this file or representing this person, all we have is a pairing between what could be a useful link and an ambiguous and somewhat overused term.

As serendipity, my old friend would have it the release of Kendall Clark’s new article at xml.com titled “Social Meaning and the Cult of Tim” covers a debate that has much of this issue at its core. The debate is between Pat Hayes, the man behind the semantics documents for RDF and Tim Berners-Lee, the head of the W3C and the founder of today’s Web.

Ostensibly, the debate is about URIs, but looking closely, reading the words, it is nothing less than the issue of this ‘context’ mentioned earlier, the semantics, if you will behind the Semantic Web.

(In addition, in his article, Clark not only introduces the topic of this conversation, he also introduces the concept that there is an unspoken rule, that one does not publicly criticize TimBL because it is on TimBL’s reputation that the Semantic Web will be built. I am unaware that such a taboo existed and if does, it’s ridiculous — anything of such far-reaching importance and impact as the Semantic Web cannot exist or depend on any one person. If debates are being weighed and measured based on TimBL’s agreement and disagreement, then the debaters are fla d, and should retire from the debates and stay home. Growing tomatoes or some such thing. )

TimBL has good arguments, but so does Pat, who impressed me when I was writing Practical RDF, and continues to impress me with the arguments I see him making in this debate, and in other related ones.

[The debate starts here, and continues on via a thread between TimBL and Pat (follow reply links at the bottom, note others also join in with excellent insight, but follow the TimBL/Pat Hayes thread all the way through, first). Another debate to read starts at this point, and covers much of the same issues.]

This debate rages around the concept of a unique URI identifying, or dare I say, ‘denoting’ a single resource on the web, and that a resource must have a unique URI in order for it to be part of the Semantic Web. This is a basic concept in RDF, and is one that allows us to create RDF vocabularies that can work together.

Consider this: Within RDF/XML there is a URI that represents me. It’s used in FOAF files, but could also be used in RSS files, in Creative Commons licenses, in any vocabulary that includes URIs representing a unique person. When you see this URI, you assume it is a representation for me, and since I’m unique, it’s unique.

Ultimately TimBL wants to fix this as a given, a standard, a law if you will. Eventually there will be a global set of URIs identifying unique resources on the Web and this will form the basis of the Semantic Web. If I read Tim correctly, context cannot enter into the equation because this makes the Semantic Web difficult to engineer.

Pat responds with:

…I insist that this stipulation of identifying one thing
isn’t sensible or even desireable. Well, at least, unless that word
“identify” means something different from “refer to” or “name” or
“denote” . What might indeed be true is that in many circumstances,
a URI somehow provides access to information which is sufficient to
enable someone or something to uniquely identify a particular thing
(that the representation accessed via that URI is in some sense
about), but even there the thing identified might vary between
contexts (such as when we use someones email address to refer to the
person) without harm. This kind of ambiguity resolved by context is
at the very basis of human communication: it works in human life, it
works on the Web, it will work on the semantic Web. Why do you want
to try to legislate it out of existence? You will not be able to, any
more than you will be able to stop people falling in love. All that
your ‘ideal design’ will accomplish is to make the architectural
pronouncements of the W3C more and more out of line with the way that
the Web is actually being used by real people.

TimBL responds with:

No. We are defining the semantic web NOT to work like natural language, but to work like mathematics.

Any system of mathematics has to be able to use symbols to denote things in the universe of discourse. You as a philosopher can perhaps handle a mathematics in which symbols denote whatever anyone likes at any point, but I as an engineer find it less useful.

When Pat writes, This kind of ambiguity resolved by context is at the very basis of human communication: it works in human life…, TimBL responds with, Yes, with natural language and peotry(sic) to which Pat replies, Never mind poetry, it works for all communication.

TimBL defends the concept of a global identifier system, primarily because it’s ncessary from an engineering perspective. Pat doesn’t necessarily disagree with the fact that’s necessary, but the assumption that there is a truth to this. As he writes:

We seem to be at cross purposes. Im not saying that the ‘unique
identification’ condition is an unattainable ideal: Im saying that it
doesn’t make sense, that it isn’t true, and that it could not
possibly be true. Im saying that it is *crazy*.

At first you might think that Pat is following the theoretical too much — the way of the semantician, the way of the linguist — until you start to see what exactly he’s saying. And he’s right. Pat is right.

What Pay is saying is that the concept of a URI identifying a single resource works, but it’s broken; but that’s alright, because it works, but don’t make any additional assumptions of truth based on this. In other words, there is a URI (the URL — the address you type into the browser, or the permalink) for this page, and this page can be identified by this URI. For the most part, as long as you and I agree on this, we can work together and create vocabularies and technologies that work together. But the concept is flawed because it does not take into account the contextof the URI — that thing that Joeseph pointed out in my comments. The “spin”.

It is the importance of the context that Pat defends, and it is this very context that TimBL says must not be taken into account, otherwise the system can’t be engineered. But it is the context that sets the assumptions we can make, and to use a universal set of assumptions is just as meaningless as to depend on a universal set of identifiers regardless of the context of their use.

Pat expanded on this in another thread:

BTW, the current usage of “resource” in the SW specifications is
vacuous: a SW Resource can be anything whatsoever, real or imaginary,  on or off the Web, in the past or future, of any nature, with or without a URI. So to claim that all SW resources ‘contain’ a Web
resource sounds like it would also have to be vacuous or else would
be obviously false (depending on what a ‘web resource’ is, which of
course I have no idea about, this never having been defined or
elucidated anywhere.)

I have no idea what an interface to an object could possibly be.
What kinds of interface do the following objects have: a grain of
sand, a galaxy, an imaginary detective ?

There’s the key, the understanding — what kind of global system can assimilate objects of such differing contexts as that of the micro (the sand), the macro (the galaxy), and the virtual (the imaginary detective)? No one system can, but no one system has to. What Pat is saying is that the current system works ‘good enough’. It works though it’s based on a broken premise of a global system of identifiers that can denote any one thing regardless of context. It’s okay that it works, and it’s okay that it’s broken — but don’t base laws and assumptions on a broken premise. Don’t attach meaning to the system, just use it.

This returns again to what Joseph said. How can we ‘denote’ a friend, just through a link labeled as such? It doesn’t take into account the context of the label and the system. By taking a set of links in a blogroll and creating a FOAF file and saying we ‘know’ each of the people listed in this blogroll, we’re taking the links out of context — a link in a blogroll is not the same thing as making a statement that I know this person, or this person is my friend. The only statement I’m making, taking into account ‘context’, is that I listed this person in my blogroll for some unspecified reason of which no one can truly make an assumption.

There may be an assumption that I did so because I read them, or that I like to read the person, or that I like the person. But that’s all this is, an assumption. Without the context behind my reasoning why I put links into my blogroll, how can one then extrapolate out that these links should then go into a FOAF file? Or vice versa? How can one extrapolate that the context of the FOAF file and the context of the blogroll are the same? Because the same identifiers are used in each?

Yes, the syntactic string representing a person may be the same in the FOAF file as in the blogroll, and because of this and the use of RDF, we can ‘technically’ combine and extrapolate this information — but without the context surrounding the use of the identifier in each case, you can’t make an assumption that the one ‘means’ the same as the other. In other words, you can’t extrapolate, meaningfully, from my URI appearing in a FOAF file to my URI appearing in a blogroll, because neither is ‘me’ — only me as I am represented within the context of each vocabulary.

Pat wrote (and I can’t find the exact email message):

How can one consider a link to be ‘the person’, when it is nothing more than a proxy, a representation that the Semantic Web requires because we have no other way to represent the person within the Semantic Web.

Within the Semantic Web, a URI is a proxy for, a representation of, something that can’t be represented any other way due to the liminations of the medium. And that’s okay, because it works. But, if I read Pat correctly, don’t add any additional ‘meaning’ to this representation other than the fact that it is a representation. To do so will perpetuate the broken premise.

FOAF can’t represent a friend, or a relationship directly. What it can do is provide a proxy for an association between two people, as marked by one of these people, and as a labeled friend, aquaintance, co-worker or whatever. It is not the actual relationship itself, and to see it as such, to treat it as such — to make it real because of this association in the file — removes the context of the FOAF file, which could have significant impact on the truth of the assertion.

Because I am listed as a ‘friend’ in AKMA’s FOAF file, does this make it real? You can’t assume it’s real just because it says so, in a FOAF file, with a link, representing a URI. It may be real — it is real to me because of my association and appreciation and affection I feel for both AKMA and his wife, Margaret — but the link, and the existence of the file, don’t make it real.

Moreover, you can’t extrapolate any additional meaning out of the FOAF file other than what is narrowly defined within the context of the vocabulary — FOAF shows that one person is making a statement that they know another person. Nothing more. Nothing less.

Pat isn’t being arbitrary, he’s making a critical point: we can only assume so much from a URI within context of a RDF vocabulary. To make additional assumptions is as false as to make an assumption that the context of FOAF and a blogroll overlap, and that my relationship to one person in this way, must mean that it’s the same as my relationship to this person in another context.

Dammit, I’m not saying this well, but it’s the very use of FOAF for other things outside of the context of this specific RDF vocabulary that forms the basis — in my interpretation, and I could be wrong — for Pat’s continued and persistent argument about the URI of an object being a proxy for that object, and that a URI has context. To ignore the context is to literally throw out the true semantics, leaving nothing in its place but a smarter, but still dumb, web.

Smarter web is okay, but I want a semantic web! I don’t care if the Semantic Web works for the technology if it doesn’t also work for the people.

By seeing the URI as a representation of an object that transcends context, we then erroneously make extrapolations, such as FOAF and the blogroll — harmless in this case, but not so harmless when you start bringing in issues of trust, and see FOAF as the basis of the web of trust.

What Pat is arguing about, the point he is trying to make, forms the basis of what is happening now. We only have a few RDF/XML vocabularies in wider use and already we’re seeing abuse because of making assertions based on flawed premises. This isn’t a semantics argument or a esoteric debate between philosphers — this is real stuff being implemented, a perpetuation of a premise that’s flawed.

I have more to say on this, later. I must read all the notes, think on this further. If I’ve mispresented either TimBL or Pat, my apologies to both and blame it on my interest and excitement about this debate.

Stay tuned.

Archive with comments at Wayback Machine

It’s all angle brackets

In his recent post, Mark Pilgrim writes that he is amazed, bordering on appalled because of reaction to his posting about the CITE tag. I was a bit surprised myself because the posting wasn’t necessarily about revolutionary uses of technology. However, what Mark did do, in just a few words, was hit the hot spot in several debates: XML versus HTML, machine readability versus human readability, the semantic web, RDF, and any combination of these topics. And for the cherry to complete this semantic sundae, he threw in some code. If his post was fishing instead of writing, it would be equivalent to using five different fishing poles, each with a different lure. And did he come home with a catch.

Semantics. People start talking semantics, and each person doesn’t understand what the other people mean by semantics, and therein lies the wonderful irony that seems to weave in and out of the web. Semantics is all about meaning, but eactly what does it ‘mean’? We have no problems with small ‘s’ semantics in the everyday world, but put semantics on the web, and it becomes big ‘S’ Semantics.

Mark uses CITE as an example of semantic markup in HTML. He has a point: CITE does carry with it meaning — that which is marked up with this tag is a ‘citation’. By defining the context of the element, we can, for example, discriminate between hypertext links that are just ‘links’ and links that are associated with citations.

I went back to one of my postings and added CITE to specific URLs that I wanted to designate as citations. With an itty bitty Perl CGI app, I can find all the citations in the page — as shown here. Embed the CITE within a hypertext link and I can also easily associate those citations with the author’s post, as shown here.

By using CITE in conjuction with a hypertext link, I attach special significance to the link, something I can’t really do with just a straight hypertext link tag, as shown here. CITE provides context for the link. Context provides meaning, and meaning is semantics. Works nicely.

However, and you knew there was a however, I am a greedy person. I want to know more, and at some point HTML just doesn’t have the items that can convey the ‘meaning’ that I’m after.

Sure, I can create little bots that go out and scrape HTML and return with all sorts of data. I can then create a huge database and push this data into it. And once I have mined all that data, I can then create these huge, twistie, complex algorithms and set myself up as a competitor for Google. I mean, all that’s missing is someone to do the graphics for me for holidays, and such.

But, you see, that’s not what I’m after. I’d like to be able associate new and even more complex forms of ‘meaning’ to web resources without having to store huge amounts of data, or to create ever increasingly complex algorithms, including finding devious ways of filtering out what amounts to “weblog spam”.

Ultimately, I want to record and find meaning without having to get VC funding, first.

That’s when something like RDF/XML enters the picture. Of course, you knew I was going to bring in RDF/XML — look to your left. The cover on the book doesn’t say “Practical meaning in a loosely connected environment filled with lots of data”. It says “Practical RDF”.

Let’s say I want to be able to find out Creative Commons license information for a specific posting. I could put this information into meta tags, or try and scrape it from the HTML. However, by embedding the information into RDF/XML, which is then embedded in the HTML, I can easily use one of my RDF APIs, such as my RDF PHP-based Query-o-Matic Lite, to pull the information out about the license — such as the required license information. Since I also store the RSS channel information within the page, I can also query this information.

Of course, I could get this RSS channel information directly from my RDF/RSS file, but I’d rather get specific information for a specific resource than my current running list of aggregated items.

The point of all of this, besides having a little fun with Perl and PHP and various forms of markup, is that all of this stuff is data and all of this stuff can record ‘meaning’, at least some forms of meaning. RDF/XML doesn’t replace the ‘meaning’ that HTML provides — it just adds a way to record new meanings that HTML can’t, or doesn’t provide.

I agree with Jon Udell — there’s no need for either/or propositions in the world of Semantic markup. It’s really nothing more than angle brackets, data, and a few rules depending on the specific markup used. Add a smidgeon of code and there you have it — rich, meaningful data. Sure beats the heck out of web consisting purely of Adobe PDF and Macromedia Flash files; all we’d have then is a bunch of loosely connected black holes.

(g’zipped and tarred file with itty bitty Perl CGI apps used as examples — requires HTML::BuildTree. g’zipped and tarred file of RDF Query-o-Matic files. Requires PHP XML classes from Source Forge.)

Archived at Wayback Machine

RDF: As simple as A, B, C

When I demonstrated a very simplified RDF/RSS model last week, in the comments attached to the post, Ziv asked the following question:

One question of an RDF newbie: Why do we need that (rdf:Description) element? Why can’t we simply put the @rdf:about attribute on the (item)?

As I started to answer the question in the comments, I kept finding myself taking the question deeper and deeper into the meanings of RDF:

-The rdf:about attribute can’t be used directly on a property led to

-The RDF/XML follows a striped XML syntax of class/property/class/property, regardless of shortcuts led to

-The striped XML syntax is based on the pattern of node-edge-node in RDF led to

-The node-edge-node of RDF is based on a model

All of which led me to a truly definitive question about RDF — why? Why the use of rdf:about here rather than there. Why the syntax? Why the model? After all, XML is a piece of cake — an element here, an attribute there, slam dunk in some text and hey now, we got data. Why make things more complex than they need to be?

Why? Because XML is great about collecting data but is lousy about recording knowledge. There is no facility inherent within the plain vanilla flavor of XML that allows one to write or read assertions in such a way that these assertions (read this as ‘statements’) can be machine produced and machine-readable. And the machines need all the help they can get.

We humans don’t need a rigorous model to communicate. We have phonemes that form words that make up a vocabulary, members of which are then used to form sentences through the use of this really irritating set of rules called “grammar”. We’re programmed to apply these rules through years of instruction, using a neural networking technique called ‘education’. When programming is finished, and after passing certain quality assurance tests, we’re set upon the world. Once loosed from the constraints of the lab, we promptly and as quickly as possible throw out much of what we’ve learned in favor of imagination, creativity, and a dangerous little nugget called innovation.

I love it.

Dorothea wants to discuss her specific mindset related to ‘sexism’ and the concept of sexiness and uses a new word: grunch. This word doesn’t exist, but we as humans adapt to it, add it to our vocabulary (phonemes: grrr + unch). In future writings based on Dorothea’s original discussion, we know what grunch is. Humans adapt.

In 1986, Hans Gabler made 2000 ‘corrections’ to James Joyce’s Ulysses. Well, thank goodness he did because nobody read it the way it was, all those grammatical errors and typos kept getting in the way. Most likely no one even heard of this book until Mr. Gabler took it in hand. As grateful as I am, though, I have recently discovered an even better re-write of this classic: Ulysses for Dummies.

I digress. XML and RDF.

With XML I can record pieces of data such as date, an excerpt, a title, author, category and so on. The structure of the markup allows machines to read these individual facts, to verify that the recording meets certain simple rules. But what if I want a little more than just plain facts. What if I want to be able to take these facts out for a spin, kick the tires, check under the hood?

I have a web page. Facts about this page are: title, URL, date edited, category, and author.

Page has title. Page has URL. Page has edit date. Page has author.

Tarzan has Jane. Jane has Cheeta. Cheeta has banana. A pattern is beginning to emerge.

Every sentence has a subject and a predicate. The subject is the focus of the sentence, and the predicate says something about the subject. These two basic components work remarkably well in allowing us to communicate, to share amazingly complex knowledge.

Returning to RDF and XML, using straight XML is equivalent to only allowing communication with one verb — To Have. Following this, an XML translation of the previous paragraph would be:

Sentence has subject. Sentence has predicate. Sentence has focus. Subject has focus. Predicate has information. Subject has information. Predicate has subject. Components have power. Communication has components. We have each other.

As you can see, after a time, the simplicity breaks down — we need to increase our capabilities, even though doing so adds complexity.

Enter RDF, providing a structure and a meta-language to XML, a grammar if you will.

RDF has one pattern: (subject)(predicate)(object). However, this pattern gives us the tools to record data in such a way that knowledge can be inferred mechanically, merged via a well understood and defined logic with other knowledge, and so on. The subject is the noun, the focus of the statement; the predicate says something about the subject; the object is what is said.

Taking the test paragraph, it can be re-written into the following RDF-like statements:

(Sentence) (has a component)(which is a subject)
(Sentence) (has a component)(which is a predicate)

— no, no, don’t worry — it does get better

(The subject)(is the focus of)(the sentence)
(The subject)(is described by)(the predicate)
(Sentence Components)(enable)(communication)
(Sentence Components)(enable sharing)(of knowledge)

By providing the ability to record this subject-predicate-object pattern, RDF allows us to expand on the depth of information we gather. The more complex the information, the deeper the pattern is applied, but it is still this triple. In a graphical context, the subject-predicate-object form into a node-edge-node that allows us to build new statements on previously occurring ones.

The focus OF the sentence IS the subject DESCRIBED BY the predicate WHICH IS a component OF a sentence. Consider in this sentence that the predicates are the capitalized value, the graphical notation of this could be: node-predicate-node-predicate-node-predicate-node-predicate-node-predicate. Nothing more than a repetition of our friend the triple, connected end to end.

Representing this within XML requires a set of syntactic rules that ensure we don’t accidentally shove a predicate next to a predicate and so on. There are rules for how to identify a subject, and how to add a predicate. There are rules for how to repeat properties (predicate-object pairs), and how to group properties. There are even rules for how to create a statement about a statement (known in RDF as ‘reification’, though I prefer ‘RDF’s Big Ugly’, myself). But fundamentally the rules break down into nothing more than node-edge-node-edge-node, forming a particularly interesting XML pattern called The Striped RDF/XML syntax.

Rule’s that basically say that predicates can’t be nested directly beneath predicates (edges next to edges) or that whole node-edge-node thing gets blown out of the water. And rules that state when an rdf:about attribute can be applied. In my simplified RDF/RSS, the rdf:about attribute can’t be applied directly to the ITEM element because ITEM in this instance is acting as a predicate, with an implied URI of “item” — it can’t act as a new subject, too. Edge-edge.

So, with a little tweaking (adding the subject within a generic RDF resource statement, as in example 1, or using a shortcut as in example 2), the rules are met and the knowledge can be processed.

(Check out the example RDF files with the RDF Validator to see a graphical demonstration of node-edge-node.)

Once you’ve described one data set with these rules, interferences can be made to other data sets made with the same rules.

As an example, RSS is nothing more than a quick news blurb that gets consumed in less than 24 hours and doesn’t persist. The power of RDF isn’t necessary for RSS used by aggregators, primarily because the data doesn’t persist and one thing about the search for knowledge: it does require that the bits of the knowledge stick around long enough to be discovered.

However, RSS captures a rich set of information about a specific web page or weblog posting: the author and creation date, as well as category, and possibly even links to other resources. What a pity to put this into a form that will only be thrown away.

Well, who says it has to be thrown away? We’s all bosses here, we is. If I says to keep it, I’s boss, and you listen up or Bird be real angry, she will. Real angry. Hissy fit angry.

I modified my individual weblog posting archives to include a bit of RDF in the header that contains the same information used to produce the RSS files that aggregators so callously consume and toss aside. Since this modification was in the template, this RDF is generated for each page automatically. And once persisted in the archive page, it’s there for anyone to discover, providing a richer set of data than just that assumed with keywords pulled from the text.

In this RDF is an identification of the author, an entity which is rounded out by a FOAF (Friend-of-a-Friend) RDF file; knowledge of me, who I am, adds depth and categorization to my Book Recommendation list RDF, and so on and on — a vicious cycle of knowledge acquisition.

(Archived page and comments at Wayback Machine)

RSS: Proof is in the implementation

Sam Ruby had taken a first shot at RSS 2.0 with an RSS document demonstrating the new, simplified RSS syntax. No evidence of RDF, RSS version, no RDF Seq.

Mark expanded on this with what looks to be the same specification, different examples and the use of included HTML (parseLiteral in RDF terms). (Correct me if I misread this Mark).

Since Sam has published an example of his version, allow me to work with the assumption that whatever works with his proposed RSS 2.0 should work with Mark’s, with the addition of HTML literals.

In this weblog page, I have PHP processing for the Book recommendation list. I copied the page and modified it to process Sam’s new proposed RSS file. You can see it in action here. The process took me about 10 minutes because the SHIFT key on my laptop doesn’t work well, and I am using vi to make the edits.

Now, I want to show you something. Here is my MT generated RDF/RSS file. Taking this and Sam’s and Mark’s proposed RSS 2.0, I came up with a simplified RDF/RSS syntax, seen in this file and also duplicated here:

<?xml version=”1.0″?>

<rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:dc=”http://purl.org/dc/elements/1.1/” xmlns=”http://purl.org/rss/1.0/”>

<channel rdf:about=”http://weblog.burningbird.net/”>
<title>Burningbird</title>
<link>http://weblog.burningbird.net/</link>
<description></description>

<item>
<rdf:Description rdf:about=”http://weblog.burningbird.net/archives/000514.php”>
<link>http://weblog.burningbird.net/archives/000514.php</link>
<title>Myths about RDF/RSS</title>
<description>Lots of discussion about the direction that RSS is going to take, which I think is good. However, the first thing that
happens any time a conversation about RSS occurs is people start questioning the use of RDF within the…</description>
<dc:subject>Technology</dc:subject>
<dc:creator>shelley</dc:creator>
<dc:date>2002-09-06T00:53:16-06:00</dc:date>
</rdf:Description>
</item>

<item>
<rdf:Description rdf:about=”http;//weblog.burningbird.net/archives/000515.php”>
<link>http://weblog.burningbird.net/archives/000515.php</link>
<title>ThreadNeedle Status</title>
<description>I provided a status on ThreadNeedle at the QuickTopic discussion group. I wish I had toys for you to play with, but no
such luck. To those who were counting on this technology, my apologies for not having it for…</description>
<dc:subject>Technology</dc:subject>
<dc:creator>shelley</dc:creator>
<dc:date>2002-09-06T00:19:28-06:00</dc:date>
</rdf:Description>
</item>

</channel>

</rdf:RDF>

Differences are:

 

  1. RDF element rather than RSS
  2. No versioning – not necessary with the concept of namespaces
  3. Use of namespaces to differentiate modules
  4. Surrounding the ITEM’s properties with a RDF:Description. The ITEM can have either literal data or XML elements that should be parsed. By using RDF:Description, I’m giving a hint to the processors that what follows is XML data to be parsed for new elements, so turn off literal text processing optimization, and use the more memory and CPU intensive XML parser, please.

Notice that there is no RDF:Seq in this RDF/RSS version. Why? You don’t have to use the Seq element for valid RDF. I believe Seq was used with RSS 1.0 because the originators of RSS 1.0 wanted to provide ordering information to the tool builders. However, this really seems to be an absolute sticking point with everyone. Fine. Dump it.

Run my new RDF/RSS through the RDF validator (here), and you’ll see it’s valid RDF.

Now, I created a third copy of my weblog page with the PHP processing and had it parse and print out this new RSS file. The changes necessary? I changed DC:DATE to DC:CREATOR — I wanted to print out the latter not the former. Here’s the new page.

Next, I copied the PHP page and had the code process my original RDF/RSS 1.0 file, the one that’s generated automatically from MovableType. Changes to the code? Nada. Not one single change other than the name of the RDF file. Time to make change? 4 seconds. See the new page here.

Now, all of these pages (including this one) use PHP-based XML processing to process the data (xml_parser). No specialized RSS or RDF APIs. Pure XML processing. And it took me about, well, honestly, probably a couple of hours to write the original code for my Books RDF/RSS application. That darn shift key you know.

I’m not trying to downplay other’s concerns or existing work or effort, and I realize that I have a better understanding of RDF than most of you (not bragging, but give me this as an accepted for discussion purposes at this moment) and that this gives me an edge when working with RDF.

What I’m trying to show is that keeping RDF in the RSS specification doesn’t nececssarily mean that simplified processing is impossible, or that we can’t use ‘regular’ XML tools, and that there will be a huge burden on tool writers.

We don’t have to keep Seq if it really bothers everyone. Let’s work this change. Let’s. Let us work this change. I like that phrase, don’t you?

By keeping RDF in RSS now — and really are those changes I made to the proposed RSS 2.0 so hard to swallow? — we keep the door open for the benefits that will be accured some day when RDF does have broader use.

I guess what I’m trying to show, demonstrate, prove is that RDF doesn’t have to make things arbitrarily complicated, or confusing. That we can write documentation that clarifies those few bits of RDF in the specification so that it isn’t complicated for folks writing or reading this stuff by hand (or processing it with various languages).

I’m hoping with this demonstration that I’ll convince a few of you that we can keep the door open on this discussion rather than arbitrarily throwing RDF out — a specification I’d like to gently remind you all that’s been in work for years by some of the best markup minds in the business. And as easy as it is to criticize the RDF working group for taking time, remember that they’re trying to create a specification that will stand the test of of time, rather than break through every version, as we had with HTML.

Mark provided a summary of the RSS issue, and I know that this discussion has been going on for years. And I know that there are a lot of people who say, let’s just fork. But folks, this didn’t work for SQL and QUEL (remember QUEL?) years ago when the decision was being made about which query format to use when accessing relational database data. I really do want to see these specs come together, with members and players from all sides.

And I’ll also be honest and say that I really don’t want to see this owned by any private company or person. Sorry, but I just can’t accept this, it goes everything I believe in. I am not belittling Dave’s and Userland’s contribution to RSS. I realize that Userland popularized RSS and a debt is owed.

What I am asking is that Dave become part of a team working on this, a team that’s open to people who literally have something to contribute on this issue, each with an equal vote. Yes, people like me, like Mark, like Sam, Jon, Joe, Bill — all the people who have something to contribute to make this specification rock. And hopefully prevent something like this from happening again in the future.

Am I too late though? Is the decision made? Can’t we talk?

Where’s the fire?

(Archived page and comments at Wayback Machine)