Categories
Semantics Technology

RDF: As simple as A, B, C

When I demonstrated a very simplified RDF/RSS model last week, in the comments attached to the post, Ziv asked the following question:

One question of an RDF newbie: Why do we need that (rdf:Description) element? Why can’t we simply put the @rdf:about attribute on the (item)?

As I started to answer the question in the comments, I kept finding myself taking the question deeper and deeper into the meanings of RDF:

-The rdf:about attribute can’t be used directly on a property led to

-The RDF/XML follows a striped XML syntax of class/property/class/property, regardless of shortcuts led to

-The striped XML syntax is based on the pattern of node-edge-node in RDF led to

-The node-edge-node of RDF is based on a model

All of which led me to a truly definitive question about RDF — why? Why the use of rdf:about here rather than there. Why the syntax? Why the model? After all, XML is a piece of cake — an element here, an attribute there, slam dunk in some text and hey now, we got data. Why make things more complex than they need to be?

Why? Because XML is great about collecting data but is lousy about recording knowledge. There is no facility inherent within the plain vanilla flavor of XML that allows one to write or read assertions in such a way that these assertions (read this as ‘statements’) can be machine produced and machine-readable. And the machines need all the help they can get.

We humans don’t need a rigorous model to communicate. We have phonemes that form words that make up a vocabulary, members of which are then used to form sentences through the use of this really irritating set of rules called “grammar”. We’re programmed to apply these rules through years of instruction, using a neural networking technique called ‘education’. When programming is finished, and after passing certain quality assurance tests, we’re set upon the world. Once loosed from the constraints of the lab, we promptly and as quickly as possible throw out much of what we’ve learned in favor of imagination, creativity, and a dangerous little nugget called innovation.

I love it.

Dorothea wants to discuss her specific mindset related to ‘sexism’ and the concept of sexiness and uses a new word: grunch. This word doesn’t exist, but we as humans adapt to it, add it to our vocabulary (phonemes: grrr + unch). In future writings based on Dorothea’s original discussion, we know what grunch is. Humans adapt.

In 1986, Hans Gabler made 2000 ‘corrections’ to James Joyce’s Ulysses. Well, thank goodness he did because nobody read it the way it was, all those grammatical errors and typos kept getting in the way. Most likely no one even heard of this book until Mr. Gabler took it in hand. As grateful as I am, though, I have recently discovered an even better re-write of this classic: Ulysses for Dummies.

I digress. XML and RDF.

With XML I can record pieces of data such as date, an excerpt, a title, author, category and so on. The structure of the markup allows machines to read these individual facts, to verify that the recording meets certain simple rules. But what if I want a little more than just plain facts. What if I want to be able to take these facts out for a spin, kick the tires, check under the hood?

I have a web page. Facts about this page are: title, URL, date edited, category, and author.

Page has title. Page has URL. Page has edit date. Page has author.

Tarzan has Jane. Jane has Cheeta. Cheeta has banana. A pattern is beginning to emerge.

Every sentence has a subject and a predicate. The subject is the focus of the sentence, and the predicate says something about the subject. These two basic components work remarkably well in allowing us to communicate, to share amazingly complex knowledge.

Returning to RDF and XML, using straight XML is equivalent to only allowing communication with one verb — To Have. Following this, an XML translation of the previous paragraph would be:

Sentence has subject. Sentence has predicate. Sentence has focus. Subject has focus. Predicate has information. Subject has information. Predicate has subject. Components have power. Communication has components. We have each other.

As you can see, after a time, the simplicity breaks down — we need to increase our capabilities, even though doing so adds complexity.

Enter RDF, providing a structure and a meta-language to XML, a grammar if you will.

RDF has one pattern: (subject)(predicate)(object). However, this pattern gives us the tools to record data in such a way that knowledge can be inferred mechanically, merged via a well understood and defined logic with other knowledge, and so on. The subject is the noun, the focus of the statement; the predicate says something about the subject; the object is what is said.

Taking the test paragraph, it can be re-written into the following RDF-like statements:

(Sentence) (has a component)(which is a subject)
(Sentence) (has a component)(which is a predicate)

— no, no, don’t worry — it does get better

(The subject)(is the focus of)(the sentence)
(The subject)(is described by)(the predicate)
(Sentence Components)(enable)(communication)
(Sentence Components)(enable sharing)(of knowledge)

By providing the ability to record this subject-predicate-object pattern, RDF allows us to expand on the depth of information we gather. The more complex the information, the deeper the pattern is applied, but it is still this triple. In a graphical context, the subject-predicate-object form into a node-edge-node that allows us to build new statements on previously occurring ones.

The focus OF the sentence IS the subject DESCRIBED BY the predicate WHICH IS a component OF a sentence. Consider in this sentence that the predicates are the capitalized value, the graphical notation of this could be: node-predicate-node-predicate-node-predicate-node-predicate-node-predicate. Nothing more than a repetition of our friend the triple, connected end to end.

Representing this within XML requires a set of syntactic rules that ensure we don’t accidentally shove a predicate next to a predicate and so on. There are rules for how to identify a subject, and how to add a predicate. There are rules for how to repeat properties (predicate-object pairs), and how to group properties. There are even rules for how to create a statement about a statement (known in RDF as ‘reification’, though I prefer ‘RDF’s Big Ugly’, myself). But fundamentally the rules break down into nothing more than node-edge-node-edge-node, forming a particularly interesting XML pattern called The Striped RDF/XML syntax.

Rule’s that basically say that predicates can’t be nested directly beneath predicates (edges next to edges) or that whole node-edge-node thing gets blown out of the water. And rules that state when an rdf:about attribute can be applied. In my simplified RDF/RSS, the rdf:about attribute can’t be applied directly to the ITEM element because ITEM in this instance is acting as a predicate, with an implied URI of “item” — it can’t act as a new subject, too. Edge-edge.

So, with a little tweaking (adding the subject within a generic RDF resource statement, as in example 1, or using a shortcut as in example 2), the rules are met and the knowledge can be processed.

(Check out the example RDF files with the RDF Validator to see a graphical demonstration of node-edge-node.)

Once you’ve described one data set with these rules, interferences can be made to other data sets made with the same rules.

As an example, RSS is nothing more than a quick news blurb that gets consumed in less than 24 hours and doesn’t persist. The power of RDF isn’t necessary for RSS used by aggregators, primarily because the data doesn’t persist and one thing about the search for knowledge: it does require that the bits of the knowledge stick around long enough to be discovered.

However, RSS captures a rich set of information about a specific web page or weblog posting: the author and creation date, as well as category, and possibly even links to other resources. What a pity to put this into a form that will only be thrown away.

Well, who says it has to be thrown away? We’s all bosses here, we is. If I says to keep it, I’s boss, and you listen up or Bird be real angry, she will. Real angry. Hissy fit angry.

I modified my individual weblog posting archives to include a bit of RDF in the header that contains the same information used to produce the RSS files that aggregators so callously consume and toss aside. Since this modification was in the template, this RDF is generated for each page automatically. And once persisted in the archive page, it’s there for anyone to discover, providing a richer set of data than just that assumed with keywords pulled from the text.

In this RDF is an identification of the author, an entity which is rounded out by a FOAF (Friend-of-a-Friend) RDF file; knowledge of me, who I am, adds depth and categorization to my Book Recommendation list RDF, and so on and on — a vicious cycle of knowledge acquisition.

(Archived page and comments at Wayback Machine)

Categories
Specs

RSS Continues

Recovered from the Wayback Machine.

Regarding RSS, Dave will be releasing his version of RSS tomorrow.

Ben Hammersley has taken on the responsibility to try and get feedback to Dave about the Userland RSS. Good job, Ben, and good on you for taking this on. I’ll even forgive you for eating Marmite for this one.

You can view my opinion of Userland’s RSS within the comments attached to Ben’s postings. And that’s the last thing I have to say on this particular variation of RSS.

Categories
Just Shelley

All work and no play

Recovered from the Wayback Machine.

I have been a busy little worker lately. I spent all weekend reviewing the hard copy of the Unix Power Tools book — all one thousand pages of it — looking for problems, as well as pulls for the book’s web site.

I also made a stab at my first chapter for my online book, but I’m very unhappy with it. Very. The problem with reading wonderful writing by truly great authors is that my own writing suffers, dramatically, in comparison. Everything I write lately just sits on the page, flat, dejected, and suffering. If there was such a thing as a gun for words, I would shoot each of mine and give them a quick and painfree end.

I took a break from writing today to interview at two different consulting companies. If all goes well, I should be back in the land of the employed by month’s end.

Between company appointments, as I was sitting at the computer trying to think of something less than dismal to write into the weblog, my cat Zoe wanted up on my lap for snuggles. Considering that I always interview in a black suit, I wasn’t too happy about her jumping up and getting silvery hairs all over me. I snapped at her, yelling at her to get off my lap.

She left the room and when I went looking for her later, I found her curled up in a small, sad, hurt little ball of fur on my chair down in the living room. What does she know of work? What does she know of suits? All she knows is that I yelled at her just for coming in for snuggles. I felt like such a heel.

She’s sitting on my lap now. She says Hi to everyone.

Zoe

Categories
Specs

RSS and disappointment

Recovered from the Wayback Machine.

I am disappointed.

I am disappointed that the work I did yesterday to show that RDF can work well within a simplified RSS environment is for naught because assumptions have already been made, decisions sealed. Jon Udell writes, paraphrasing Sam RubyAssuming that the RSS core is now frozen…. Why is there an assumption that the core is frozen? Why is there an assumption that Userland owns RSS 2.0? Because Dave Winer says so? Because a few – a very few– other people say so?

What of the community, who must continue to be faced with issues of two different RSS specifications; who will have to face the difficulties inherent with this again in the future?

I’m disappointed because assumptions have been made that the efforts of the RSS 1.0 working group and Userland can never merge. The result of this assumption is that those who wish to write or read RSS in the future must bear the burden of both groups lack of cooperation.

I am disappointed because we were starting to see such good questions from the user community — questions such as those that appeared in the comments attached to my postings. Questions that allow us to define why some of these issues are important to many of us. Questions and comments that serve to make technologists take a good hard look at what we arrogantly decide is ‘good’ for the community.

Both RSS groups have been working far too long in a vacuum, and this week the lid got popped and fresh air came in. And I have never seen groups, normally so diametrically opposed, work together so well as these two did this week, trying to put that lid back on as quickly as possible.

I am disappointed that the RDF working group didn’t join the debate and benefit from such an open discourse with the user community, in addition to taking this opportunity to clarify much of the confusion and complexity about RDF. However, the debate was so short, the working group may not even be aware that it happened.

 

 

 

Categories
Weblogging

Note from Management

Recovered from the Wayback Machine.

I get private email communications all the time based on my postings, and most are great.

Sometimes people will write because I have made a typo or a grammatical error, and I really appreciate this. I prefer not to make these kinds of mistakes, but can get excited when I write and not notice the problem at first. These kinds of emails are very helpful.

Sometimes people will send gentle notes to let me know I’ve gone over the edge, I’ve lost my perspective, or I’ve been unnecessarily rough. Again, I appreciate this. I am nothing if not a passionate person, but I genuinely don’t want to be mean or cruel, or pedantic or tiresome. Only a friend would take the time to let me know that I’m heading in a direction they know I’ll regret at some point.

Sometimes people will want to agree or disagree with that I write and want to chat offline. Well, I consider this a treat. I am a richer person by hearing your views, and being allowed to discuss mine. Most sincere thanks for this gift of time you give me.

However, there are times when I get people who want to say hurtful, vicious, demeaning, and abusive things offline. By doing so, they can dump on me but still maintain a persona of sweetness and light with the world. This passive aggressive technique is, to me, about one of the most dispicable things a person can do.

When people (a very few people) indulge in these sorts of emails, it leaves me tired, hurt, and very touchy. Then I react online and the rest of my readers haven’t a clue why I’m so cranky, or why I’m reacting so strongly to certain events. Or worse; they wonder why am I lashing out at such as generous and kind hearted person.

I don’t like getting emails that tell me that I’m sick, I’m sad, no wonder I’m single because I’m such a bitch and nobody would have me, I’m a loser, I have no life, or today’s particular treasure which stated that I blamed this person for all the problems in my life, and that this was pathetic.

Say what? No offense to any of you, but none of you have that kind of power over me. But these kind of emails wear me down.

So, here’s my new plan. I’ve replied to the sender of the recent email the following:

No more. If you want to talk with me, do it in public. No more of these personal attacks in my email. If you’re so proud of what you have to say to me, say it in public.

I have received abuse from this person for months. Next time I get an email from this same source, it goes online. And this person is more than welcome to print anything I say privately online if they wish. I am not ashamed of what I write.

There is a difference between disagreeing with a person and abusing them. And I’m tired of being abused.

Thanks for your time. End of management memo.