Categories
Semantics Technology

RDF: As simple as A, B, C

When I demonstrated a very simplified RDF/RSS model last week, in the comments attached to the post, Ziv asked the following question:

One question of an RDF newbie: Why do we need that (rdf:Description) element? Why can’t we simply put the @rdf:about attribute on the (item)?

As I started to answer the question in the comments, I kept finding myself taking the question deeper and deeper into the meanings of RDF:

-The rdf:about attribute can’t be used directly on a property led to

-The RDF/XML follows a striped XML syntax of class/property/class/property, regardless of shortcuts led to

-The striped XML syntax is based on the pattern of node-edge-node in RDF led to

-The node-edge-node of RDF is based on a model

All of which led me to a truly definitive question about RDF — why? Why the use of rdf:about here rather than there. Why the syntax? Why the model? After all, XML is a piece of cake — an element here, an attribute there, slam dunk in some text and hey now, we got data. Why make things more complex than they need to be?

Why? Because XML is great about collecting data but is lousy about recording knowledge. There is no facility inherent within the plain vanilla flavor of XML that allows one to write or read assertions in such a way that these assertions (read this as ‘statements’) can be machine produced and machine-readable. And the machines need all the help they can get.

We humans don’t need a rigorous model to communicate. We have phonemes that form words that make up a vocabulary, members of which are then used to form sentences through the use of this really irritating set of rules called “grammar”. We’re programmed to apply these rules through years of instruction, using a neural networking technique called ‘education’. When programming is finished, and after passing certain quality assurance tests, we’re set upon the world. Once loosed from the constraints of the lab, we promptly and as quickly as possible throw out much of what we’ve learned in favor of imagination, creativity, and a dangerous little nugget called innovation.

I love it.

Dorothea wants to discuss her specific mindset related to ‘sexism’ and the concept of sexiness and uses a new word: grunch. This word doesn’t exist, but we as humans adapt to it, add it to our vocabulary (phonemes: grrr + unch). In future writings based on Dorothea’s original discussion, we know what grunch is. Humans adapt.

In 1986, Hans Gabler made 2000 ‘corrections’ to James Joyce’s Ulysses. Well, thank goodness he did because nobody read it the way it was, all those grammatical errors and typos kept getting in the way. Most likely no one even heard of this book until Mr. Gabler took it in hand. As grateful as I am, though, I have recently discovered an even better re-write of this classic: Ulysses for Dummies.

I digress. XML and RDF.

With XML I can record pieces of data such as date, an excerpt, a title, author, category and so on. The structure of the markup allows machines to read these individual facts, to verify that the recording meets certain simple rules. But what if I want a little more than just plain facts. What if I want to be able to take these facts out for a spin, kick the tires, check under the hood?

I have a web page. Facts about this page are: title, URL, date edited, category, and author.

Page has title. Page has URL. Page has edit date. Page has author.

Tarzan has Jane. Jane has Cheeta. Cheeta has banana. A pattern is beginning to emerge.

Every sentence has a subject and a predicate. The subject is the focus of the sentence, and the predicate says something about the subject. These two basic components work remarkably well in allowing us to communicate, to share amazingly complex knowledge.

Returning to RDF and XML, using straight XML is equivalent to only allowing communication with one verb — To Have. Following this, an XML translation of the previous paragraph would be:

Sentence has subject. Sentence has predicate. Sentence has focus. Subject has focus. Predicate has information. Subject has information. Predicate has subject. Components have power. Communication has components. We have each other.

As you can see, after a time, the simplicity breaks down — we need to increase our capabilities, even though doing so adds complexity.

Enter RDF, providing a structure and a meta-language to XML, a grammar if you will.

RDF has one pattern: (subject)(predicate)(object). However, this pattern gives us the tools to record data in such a way that knowledge can be inferred mechanically, merged via a well understood and defined logic with other knowledge, and so on. The subject is the noun, the focus of the statement; the predicate says something about the subject; the object is what is said.

Taking the test paragraph, it can be re-written into the following RDF-like statements:

(Sentence) (has a component)(which is a subject)
(Sentence) (has a component)(which is a predicate)

— no, no, don’t worry — it does get better

(The subject)(is the focus of)(the sentence)
(The subject)(is described by)(the predicate)
(Sentence Components)(enable)(communication)
(Sentence Components)(enable sharing)(of knowledge)

By providing the ability to record this subject-predicate-object pattern, RDF allows us to expand on the depth of information we gather. The more complex the information, the deeper the pattern is applied, but it is still this triple. In a graphical context, the subject-predicate-object form into a node-edge-node that allows us to build new statements on previously occurring ones.

The focus OF the sentence IS the subject DESCRIBED BY the predicate WHICH IS a component OF a sentence. Consider in this sentence that the predicates are the capitalized value, the graphical notation of this could be: node-predicate-node-predicate-node-predicate-node-predicate-node-predicate. Nothing more than a repetition of our friend the triple, connected end to end.

Representing this within XML requires a set of syntactic rules that ensure we don’t accidentally shove a predicate next to a predicate and so on. There are rules for how to identify a subject, and how to add a predicate. There are rules for how to repeat properties (predicate-object pairs), and how to group properties. There are even rules for how to create a statement about a statement (known in RDF as ‘reification’, though I prefer ‘RDF’s Big Ugly’, myself). But fundamentally the rules break down into nothing more than node-edge-node-edge-node, forming a particularly interesting XML pattern called The Striped RDF/XML syntax.

Rule’s that basically say that predicates can’t be nested directly beneath predicates (edges next to edges) or that whole node-edge-node thing gets blown out of the water. And rules that state when an rdf:about attribute can be applied. In my simplified RDF/RSS, the rdf:about attribute can’t be applied directly to the ITEM element because ITEM in this instance is acting as a predicate, with an implied URI of “item” — it can’t act as a new subject, too. Edge-edge.

So, with a little tweaking (adding the subject within a generic RDF resource statement, as in example 1, or using a shortcut as in example 2), the rules are met and the knowledge can be processed.

(Check out the example RDF files with the RDF Validator to see a graphical demonstration of node-edge-node.)

Once you’ve described one data set with these rules, interferences can be made to other data sets made with the same rules.

As an example, RSS is nothing more than a quick news blurb that gets consumed in less than 24 hours and doesn’t persist. The power of RDF isn’t necessary for RSS used by aggregators, primarily because the data doesn’t persist and one thing about the search for knowledge: it does require that the bits of the knowledge stick around long enough to be discovered.

However, RSS captures a rich set of information about a specific web page or weblog posting: the author and creation date, as well as category, and possibly even links to other resources. What a pity to put this into a form that will only be thrown away.

Well, who says it has to be thrown away? We’s all bosses here, we is. If I says to keep it, I’s boss, and you listen up or Bird be real angry, she will. Real angry. Hissy fit angry.

I modified my individual weblog posting archives to include a bit of RDF in the header that contains the same information used to produce the RSS files that aggregators so callously consume and toss aside. Since this modification was in the template, this RDF is generated for each page automatically. And once persisted in the archive page, it’s there for anyone to discover, providing a richer set of data than just that assumed with keywords pulled from the text.

In this RDF is an identification of the author, an entity which is rounded out by a FOAF (Friend-of-a-Friend) RDF file; knowledge of me, who I am, adds depth and categorization to my Book Recommendation list RDF, and so on and on — a vicious cycle of knowledge acquisition.

(Archived page and comments at Wayback Machine)

Categories
Semantics Technology

RSS: Proof is in the implementation

Sam Ruby had taken a first shot at RSS 2.0 with an RSS document demonstrating the new, simplified RSS syntax. No evidence of RDF, RSS version, no RDF Seq.

Mark expanded on this with what looks to be the same specification, different examples and the use of included HTML (parseLiteral in RDF terms). (Correct me if I misread this Mark).

Since Sam has published an example of his version, allow me to work with the assumption that whatever works with his proposed RSS 2.0 should work with Mark’s, with the addition of HTML literals.

In this weblog page, I have PHP processing for the Book recommendation list. I copied the page and modified it to process Sam’s new proposed RSS file. You can see it in action here. The process took me about 10 minutes because the SHIFT key on my laptop doesn’t work well, and I am using vi to make the edits.

Now, I want to show you something. Here is my MT generated RDF/RSS file. Taking this and Sam’s and Mark’s proposed RSS 2.0, I came up with a simplified RDF/RSS syntax, seen in this file and also duplicated here:

<?xml version=”1.0″?>

<rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:dc=”http://purl.org/dc/elements/1.1/” xmlns=”http://purl.org/rss/1.0/”>

<channel rdf:about=”http://weblog.burningbird.net/”>
<title>Burningbird</title>
<link>http://weblog.burningbird.net/</link>
<description></description>

<item>
<rdf:Description rdf:about=”http://weblog.burningbird.net/archives/000514.php”>
<link>http://weblog.burningbird.net/archives/000514.php</link>
<title>Myths about RDF/RSS</title>
<description>Lots of discussion about the direction that RSS is going to take, which I think is good. However, the first thing that
happens any time a conversation about RSS occurs is people start questioning the use of RDF within the…</description>
<dc:subject>Technology</dc:subject>
<dc:creator>shelley</dc:creator>
<dc:date>2002-09-06T00:53:16-06:00</dc:date>
</rdf:Description>
</item>

<item>
<rdf:Description rdf:about=”http;//weblog.burningbird.net/archives/000515.php”>
<link>http://weblog.burningbird.net/archives/000515.php</link>
<title>ThreadNeedle Status</title>
<description>I provided a status on ThreadNeedle at the QuickTopic discussion group. I wish I had toys for you to play with, but no
such luck. To those who were counting on this technology, my apologies for not having it for…</description>
<dc:subject>Technology</dc:subject>
<dc:creator>shelley</dc:creator>
<dc:date>2002-09-06T00:19:28-06:00</dc:date>
</rdf:Description>
</item>

</channel>

</rdf:RDF>

Differences are:

 

  1. RDF element rather than RSS
  2. No versioning – not necessary with the concept of namespaces
  3. Use of namespaces to differentiate modules
  4. Surrounding the ITEM’s properties with a RDF:Description. The ITEM can have either literal data or XML elements that should be parsed. By using RDF:Description, I’m giving a hint to the processors that what follows is XML data to be parsed for new elements, so turn off literal text processing optimization, and use the more memory and CPU intensive XML parser, please.

Notice that there is no RDF:Seq in this RDF/RSS version. Why? You don’t have to use the Seq element for valid RDF. I believe Seq was used with RSS 1.0 because the originators of RSS 1.0 wanted to provide ordering information to the tool builders. However, this really seems to be an absolute sticking point with everyone. Fine. Dump it.

Run my new RDF/RSS through the RDF validator (here), and you’ll see it’s valid RDF.

Now, I created a third copy of my weblog page with the PHP processing and had it parse and print out this new RSS file. The changes necessary? I changed DC:DATE to DC:CREATOR — I wanted to print out the latter not the former. Here’s the new page.

Next, I copied the PHP page and had the code process my original RDF/RSS 1.0 file, the one that’s generated automatically from MovableType. Changes to the code? Nada. Not one single change other than the name of the RDF file. Time to make change? 4 seconds. See the new page here.

Now, all of these pages (including this one) use PHP-based XML processing to process the data (xml_parser). No specialized RSS or RDF APIs. Pure XML processing. And it took me about, well, honestly, probably a couple of hours to write the original code for my Books RDF/RSS application. That darn shift key you know.

I’m not trying to downplay other’s concerns or existing work or effort, and I realize that I have a better understanding of RDF than most of you (not bragging, but give me this as an accepted for discussion purposes at this moment) and that this gives me an edge when working with RDF.

What I’m trying to show is that keeping RDF in the RSS specification doesn’t nececssarily mean that simplified processing is impossible, or that we can’t use ‘regular’ XML tools, and that there will be a huge burden on tool writers.

We don’t have to keep Seq if it really bothers everyone. Let’s work this change. Let’s. Let us work this change. I like that phrase, don’t you?

By keeping RDF in RSS now — and really are those changes I made to the proposed RSS 2.0 so hard to swallow? — we keep the door open for the benefits that will be accured some day when RDF does have broader use.

I guess what I’m trying to show, demonstrate, prove is that RDF doesn’t have to make things arbitrarily complicated, or confusing. That we can write documentation that clarifies those few bits of RDF in the specification so that it isn’t complicated for folks writing or reading this stuff by hand (or processing it with various languages).

I’m hoping with this demonstration that I’ll convince a few of you that we can keep the door open on this discussion rather than arbitrarily throwing RDF out — a specification I’d like to gently remind you all that’s been in work for years by some of the best markup minds in the business. And as easy as it is to criticize the RDF working group for taking time, remember that they’re trying to create a specification that will stand the test of of time, rather than break through every version, as we had with HTML.

Mark provided a summary of the RSS issue, and I know that this discussion has been going on for years. And I know that there are a lot of people who say, let’s just fork. But folks, this didn’t work for SQL and QUEL (remember QUEL?) years ago when the decision was being made about which query format to use when accessing relational database data. I really do want to see these specs come together, with members and players from all sides.

And I’ll also be honest and say that I really don’t want to see this owned by any private company or person. Sorry, but I just can’t accept this, it goes everything I believe in. I am not belittling Dave’s and Userland’s contribution to RSS. I realize that Userland popularized RSS and a debt is owed.

What I am asking is that Dave become part of a team working on this, a team that’s open to people who literally have something to contribute on this issue, each with an equal vote. Yes, people like me, like Mark, like Sam, Jon, Joe, Bill — all the people who have something to contribute to make this specification rock. And hopefully prevent something like this from happening again in the future.

Am I too late though? Is the decision made? Can’t we talk?

Where’s the fire?

(Archived page and comments at Wayback Machine)

Categories
Diversity RDF Technology

Outside even among the outsiders

Recovered from the Wayback Machine.

Warning: Big time rant. Male/Female thing. Read at own risk.

Being a woman trying to find a place among the techie guys isn’t easy, particularly since the areas of technology of interest to me rarely have other women participants. Don’t have to believe me, take a look at the RSS-Dev group, the RDF interest groups, most of the W3C working groups and so on.

Sometimes the group participation has been good. I’m rather partial to the RDF working group because in the newsgroups, they always worked with me. However, in a lot of groups, particularly the RSS-Dev group, I am for the most part ignored. That’s not a lot of fun. It seems no matter what I do, I don’t have the respect of a lot of the players. Not all players — there’s good people here abouts that never ‘held’ me being a woman against me.

(Me not laying down a 100+ lines of code a day they might hold against me, but not being a woman. And I can live with this.)

The seemingly winless battle for respect over the last few years probably accounts for over 50% of my recent burnout. I’m not sure if any of you understand what its like not being sure if the reason you’re ignored in most of these groups is because you’re a woman, or an idiot. I guess I would prefer to think it was because I’m a woman. I seem to do okay on my jobs, and I’ve had some pretty tough technical jobs. But you just don’t know, and it eats at you. All the time. Takes your confidence and just tears it apart.

After I returned from my last trip, I felt renewed and ready to take on challenges again, especially after coming back to be met with the generosity of so many of you, helping me keep this weblog and my sites going. I started my work again with RDF, which I really do love. In particular, I started participating on Internet-related groups again — something I’m more than a bit wary of.

When things got bad at one email group I took the moderator up on his request to start another group, and started Bloggers Unlimited, and it grew. It’s now at 7698 members.

The conversations started out pretty good. There was a quiet time in the middle, but for most part, consistent discussion. It’s a bit too techy for the audience at times, but manageable.

However, I began to notice a distinctive behavior pattern with this group. There was a very strong dominant male presence, which I know left me feeling pushed out of most of the conversations. When the group fell silent for a few days, and then started up again, another member, a male member, was given credit for rejuvenating the group; and here is me, taking quiet pride in thinking I was the one that had sparked it back to life.

What was worse is that most of the comments I made were ignored. I began to feel invisible. The same old feeling of inadequacy. We had some crankiness among the male members a bit early on, but it smoothed out, and the group went back on track. Again, I hoped I helped on this and I suppose this is a nurturing female type of thing, but I didn’t want to be the nurturing female in this one act play.

I started questioning myelf: Is it just me? Am I asking dumb questions?

I decided to get another party’s opinion, and asked Liz today if she noticed this. Was I being paranoid? Did I have a valid concern? She responded with this posting after first giving me heads up and asking if I wanted to respond instead. I declined. Liz wrote:

 

Here’s how the story goes, so far as I can see:

a) Shelley posts an interesting query about the semantic web
b) A discussion begins, with posts from a number of people with interesting ideas
c) Shelley responds with questions and ideas, at the same time that predictable people begin posting predictable rants about predictable topics (RSS, for example. OPML. what constitutes an ad hominem attack. yada, yada, yada.)
d) Shelley’s points are essentially ignored in favor of the same-old-same-old peacocking and posturing among the boys.
e) Shelley gets mad.
f) Shelley gets noticed only because she got mad.
g) People like me unsubscribe because the signal-to-noise ratio is getting worse by the second, and they’d rather read blogs than wade through cross-posts and arguments.

 

I was somewhat relieved to feel vindicated in my read of the group responses, because Liz is not one to call out sexism, either lightly or easily.

On the other hand, though, I was more than a little discouraged to see her comment about me getting mad, because I’ve taken such care on the list not to be mad, to stay calm, even when baited. And I have been baited. Not just in the list but in emails.

Why won’t I take such and such down? Why won’t I hold such and such to task? Well, if I want to be walked on, that’s my problem.

When Liz talked in her posting about rather reading Jeneane and Halley’s comments, I know that she’s making a point about being among people that appreciate each other. And I understand this. However, the impact on me is that I feel left out among both the men and the women. That I have no place with either group.

So where does this leave me?

Most likely bowing out on the groups, though I’m continuing my RDF work here in my weblog, with just my readers who are interested. I most likely will not get involved in any of these groups in the future. I am disappointed at the guys in the list (not all, just some) who seem to have little regard for what I say (and I still have to live with that old worry, now, whether it’s because I’m a woman, or because I’m making stupid comments.)

But I’m also disappointed at the women in the group. Why didn’t they speak out? Why did I have to speak out, alone? Do they know how hard it is to be the only woman talking in these groups?

Where were they when I needed them?

I have some very bad stuff going on in my life now, which I’m not going to talk about here because its deeply personal and, respectfully, lovingly, none of your business. But I don’t have the energy to fight these battles now. I may not ever again in the future.

I’m not walking away from the tech again. I am enjoying my interaction with those who are interested in the RDF Poetry Finder. It may not be sexy lines of code, at least not yet; but this could be the first weblog-based group participation in a project that involves both technical and non-technical people, and it’s a really fun project. At least, I hope so.

When we’re finished, we’ll be able to offer it as a search engine implementation to sites such as Plagiarist and other literature, writing, and poetry related sites. Perhaps even the Guttenberg project. It’s a difference. A small difference, but a difference.

It’s not changing the face of the Web, or even of Google — but it’s a start. It may not be sexy, but it’s doable. I guess when it is up and running, and we can all look back and bask in the glow of our efforts, then that question I have about my worth in technology will be answered. Because it’s not going to get answered in email forums where the women stay silent, and the jerks dominate.

I will say this, though: social software is never going to fly if there isn’t some way to control the peacocks, as Liz called them, and the peahens don’t stop standing in the shadows.

Update:

I hope that the participants in the RDF Poetry Finder are not put off by this posting. Believe me when I say this wasn’t written lightly, and I’m aware it will make people uncomfortable. But it was something I had to say. And, note: I am also aware that I could be wrong in my interpretation — touchy I might be, but at least I try to be honest with myself.

Well, I think.

Categories
RDF

Why RSS?

Recovered from the Wayback Machine.

Dave asked the question: Why is RSS 1.0 called RSS? Since I’m writing a book on RDF and just finished my chapter on RSS (no, that wasn’t the one I lost), I feel qualified to answer this.

Dave, the RSS in RSS 1.0 stands for “RDF Site Summary”. The RSS in previous versions of RSS stood for “Rich Site Summary”.

Though Dave didn’t specifically ask this, I will: why involve RDF? And my answer is: for the exact same reason we build databases based on the relational data model rather than create our own storage scheme for each business data need – expediency.

By using an accepted and agreed on data model to define and store the data, a wide variety of tools and APIs can process the data without having to be rewritten for each specific application. The relational data model provides this for traditional databases; RDF provides this for XML.

By bringing RSS into compliance with the RDF specifications, you can (as I did yesterday) process an RSS document using the same pre-built APIs, services, and applications used to process RDF/XML defining other business processes. This processing reuse allows folks to focus on the unique needs of the business and the business data, rather than on the mechanics of how to process, store, or generate the XML.

This is no different than being able to store many different types of business data in an Oracle database and then access it using SQL.

Categories
RDF Writing

Break time

Break time on the book. I’m currently working on the RSS chapters, and haven’t I been careful when discussing the history of RSS. I’m also finding that I like the RDF working group’s new specification split. Either I’m getting a feel for their reasoning or I’m getting rummy from trying to meet an editorial mark Monday evening. Either way, it works.

You hate cat pictures? Hate really cute cat pictures? Then don’t go here (thanks to Head Lemur for heads up). Speaking of cat pictures, I’m still waiting patiently for cat pictures from someone in the community who shall go nameless (but you know who you are).

Dave helped Tara Grubb with a new web site, URL to be announced. I think this was a nice thing to do, but paused over:

 

As I was putting together the initial blogroll, I decided to link to Howard Coble, her opponent. I wondered how Tara would feel about it. I just walked her through the new site, and when I explained this part she literally shreiked with delight.

 

Hmmm.

I’ve started my own process of determing my vote. For senator, I’ll most likely vote for Jean Carnahan, though I don’t care for the widow rule, myself. However, she’s preferable to James Talent, and the Libertarian candidate Tamara Millay is stressing the rights of citizens to bear arms a little too much for my taste (and the balanced budget amendment has no place in the Constitution). Her position on the issues (Tara, you need one of these) has good points, but they seem to lean a little heavily on the side of the Social Darwinists for my taste.

Frankly, none of the candidates is a blinding flash and a defeaning roar. (We’ll see how many old time SciFi readers there are in the audience.)

You know, if I lived in one place long enough, I think I’d run for Congress. No, seriously. I’m fairly confident that no party would have me, but I could have such fun in Congress!

Just think of the possibilities….

(And on that note, back to the book.)