Categories
RDF

Picken’ and Choosin’

Recovered from the Wayback Machine.

I have to laugh when I read statements about the lack of technology related to RDF. I am at a point (past it really) with my book where I’m having to go through this huge list I’ve compiled of RDF tools, APIs, and applications and determine which to keep, and which to drop out of the book.

A technology book focusing on a small and finite subject can be exhaustive in its coverage of the subject, including every tool and every API. However, with a more general topic such as RDF, the key is to pick implementations that meet certain criteria; are unique and interesting; are best of breed; and are up-to-date, relevant, relatively bug-free, and relatively easy to install and use. Most of all, when determining what to include and what not to include, the writer has to keep the audience and their needs and interests in mind.

As an example of meeting my audience needs, in the book I’m covering various aspects of using RDF as ‘data: queries, RDF as a data store, and RDF and databases such as MySql. However, I only briefly mention Guha’s rdfDB — a pure RDF-based database. Why? Because there’s been no activity associated with it in some time; it’s still in beta; it’s been ported to only a few operating systems; and it doesn’t fit mainstream technology.

That “doesn’t fit mainstream technology” was a major influence in my decision to include, or not include, RDF implementations. For instance, I’m not covering any ‘C’ RDF API in the book, though ‘C’ has formed the backbone for applications development for years. The reason is because C is seldom used for web-based applications, and RDF is nothing if not web related. Additionally, C applications can be the most operating system dependent, and the most temperamental to install and configure. I don’t know about other developers, but I’m just not interested in playing Makefile tweak games any more. Been there, done that.

I was thinking this morning that, in some ways, my coverage of RDF reflects how our industry has changed so much in the last few years. Monolithic, standalone, linear, closed, function-based applications have been replaced by lightweight, modular, open, and object-based applications, usually hosted on the Web.

Applications built with ‘C’ and FORTRAN have given way, in most cases, to applications written with Java, Perl, Python, and C++. Distributed has replaced centralized. Open has replaced closed and secretive bits of binary blobs that only work within one environment, and only if you ask pretty please.

Social software has replaced data entry sheets. Documentation is no longer a dirty word. Anyone can peer beneath the covers, and the users have stopped being intimidated by the developers.

And anyone can be heard via “this-is-not-an-outline-dammit” weblogs.

Categories
RDF

Newest RDF goodies and challenges

Recovered from the Wayback Machine.

I spent the last several days reading through the six RDF documents currently under final review. During the last few days I acted the minor irritant to some members of the W3C RDF Working Group, primarily getting clarification on some confusing or complex aspects of the documents. I also spent time trying to bring concerns expressed by fellow webloggers to the attention of the group members — trying to bring the viewpoint of non-RDF folks to the RDF table. I find myself in the interesting position of being an RDF supporter who doesn’t necessarily support all aspects of RDF. Which means I can’t claim kinship with any ‘side’ in the RDF debate.

Of the items I couldn’t cover in the article (due to space considerations), and their possible impacts:

Containers Containers such as Bag, Seq, and Alt are still included, but without additional semantics attached. What does this mean? It means that a container is a grouping of resources, but there is no additional assumptions attached to the RDF specification about how container elements relate to each other, or how applications process the data. Elements in a Bag are treated no differently, to the specification, than elements in a Seq. It’s up to each individual application to determine what ‘Bag’ or ‘Seq’ means.

Sound familiar? Can we all say “HTML”?

RSS 1.0, the same RSS 1.0 generated by most weblogging tools, uses a container to group items — a Seq (sequence). I have never been particularly happy with this as I believe that ordering or other processing should result from the data rather than the structure, such as using the posting date to determine sequence of display. In addition, container within RSS add redundancy. Individual items are contained in Items which is contained in Channel. Added redundancy is equivalent to added complexity.

In fact, we can simplify RSS, as I tried to demonstrate once before. Unfortunately, this example won’t validate as RSS.

However, this one does.

By putting the onus of semantics — the behavior if you will — of containers on to the applications, there will be differing results based on different applications interpretation of ‘Bag’ and ‘Seq’ or ‘Alt’. The RSS 1.0 group can specify that RSS 1.0 uses a Container, but there is no guarantee that the data within the container will be processed in any specific way by any specific aggregator, or generated a specific way by a specific application.

That’s what happens when precision of meaning is lost…or deliberately withheld.

(I think I’m going to change my Movable Type RSS template to support the new and improved SORSS — Shelley’s Own RSS. I could unite disparate RSS sides under one banner. Instead of a king, a Queen, Dark and Beautiful. All will Love me and Despair…. )

nodeID For those of you who ran into problems with blank nodes (bnodes) when trying to work with RDF, you’re going to love this: the WG has created the nodeID that allows you to apply a label to blank nodes. This means you can use whatever you want as a bnode label and the document will validate. No more having to figure out how to create a fake URI in order to specifically access the relationship denoted by a bnode.

Collection A new container like construct has been created: the Collection. This is used with groups of resources to create a list-like structure. However, just as with Containers and Reification, there is no assumption about the semantics of a Collection — it is up to each application. If I use this in my own applications, it will only be because there’s an absolute need for it, and I can’t avoid it’s use.

datetype There’s a new RDF attribute, rdf:datatype, that can be attached to a property to give a specific datatype URI reference, usually to Schema datatypes. Adding in support for datatyping is a goodness, but I’m absolutely appalled that the Working Group is adding this in at the property instance, rather than into the vocabulary definition. This means that one person can define create-date as an integer, the number of seconds since January 1, 1972. Another person could define create-date as a Schema date, with a value of 1999-10-01. Both would be valid instances of the same RDF vocabulary.

Of course, the designers can provide documentation about what format is to be used, but I would prefer something other than the honor system.

Embedding The WG did come out specifically on the issue of embedding RDF in HTML and XHTML — don’t do it, use link instead.

There’s a lot more, but you’ll have to buy the book. Literally.

Categories
RDF

C2C Datahead

Recovered from the Wayback Machine.

Dorothea received an email from Simon St. Laurent, the editor of my RDF book. I appreciate her respect for Simon and match it with considerable respect of my own, which will cause him no end of discomfort, I’m sure. However, I have to push back at the sentence:

But Simon really is cool, one of the sadly few voices for document-oriented XML howling in the vast wilderness of C2C (computer-to-computer) dataheads.

It is the C2C ‘dataheads’ that ensure that XML documents don’t document crap for all of their cleanliness and pristine eloquence. It is the C2C ‘dataheads’ that provide the proofs behind the seemingly simple XML vocabularies to ensure that the data documented within them is always consistent and reliable. And it is this particular C2C datahead that spent several days this last week locked in debate, difficult debate, with members of the RDF Working Group, the XML community, the weblogging community, and others, trying my best to ensure that I understand the concerns of the non-RDF community; that RDF/XML is as simple as it can be, or work with the XML community to come up with a feasible alternative; that the RDF specification documents are comprehensive and clear; and that I understand the concepts and semantics of RDF well enough that I may write cleanly about them. Perhaps even clean enough for the D2D markup heads.

Of course, this was a lot more work than writing out “RDF/XML sucks”. I think next time I won’t go through this effort. When someone says, “RDF/XML sucks”, I’ll respond with “No it doesn’t” and leave it at that.

Categories
RDF

RDF Query-O-Matic light

Recovered from the Wayback Machine.

I slaved away this afternoon, persevering in my work in spite of numerous obstacles (sunshine, cat on lap, languor) to bring you RDF Query-o-Matic Light – the PHP-based RDFQL machine. A grueling six or so lines of code. I sit in exhaustion on my stool, fanning myself with old green bar computer paper.

Speaking of stools, that reminds me of another nursery rhyme associated with RDF.

Little Miss Muffet, sat on a tuffet,
Eating her curds and whey;
Along came a spider,
Who sat down beside her
And frightened Miss Muffet away.

Chances are, the stool referenced in this rhyme was a three legged one, similar to the milk stools still used today. Three is the perfect number of legs for a stool: just enough legs to provide stability, but without the need for the additional material for an extraneous fourth leg.

Returning to the subject of RDF, it, like the milk stool, is based on the principle that ‘three’ is the magic number – in this case three pieces of information are all that’s needed in order to fully define a single bit of knowledge. Less than three, then all you have is fact without context; more, and you’re being redundant.

Of the three pieces of information, the first is the RDF subject. After all, when discussing a property such as name, it can belong to a dog, cat, book, plant, person, car, nation, or insect. To make finite an infinite universe, you must set boundaries, and that’s what subject does for RDF.

The second piece of information is the predicate, more commonly thought of as the RDF ‘property’. There are many facts about any individual subject; for instance, I have a sex, a height, a hair color, eye color, degree, relationships, and so on. To focus on that aspect of me that we’re interested in at any one point in time, we need to specifically focus on one of my ‘properties’.

If you look at the intersection of ’subject’ and ‘property’, you’ll find the final bit of information quietly waiting to be discovered – the value of the property. X marks the spot.

I am me. I have a name (Shelley Powers). I have a height (close to six feet). I have an attitude (sweet tempered and quite easy going). Each of these bits of knowledge form a picture, and that picture is me.

All from RDF triples strung together in precise ways.

On to the new version of the RDF Query-o-Matic, the PHP-based Query-o-Matic Light. This version, like the JSP version can apply a valid RDFQL query against a valid RDF file, printing out a target value. However, there are some minor syntactic differences between the two.

The PHP classes that provide the functionality for Light (PHP XML rdql), include the file name as well as explicit namespace use within the query rather than as separate elements. For instance, the following query will access titles from all elements contained within my resume.rdf file – a file with an experimental resume RDF vocabulary:

SELECT ?b
FROM <http://weblog.burningbird.net/resume.rdf>
WHERE (?a, <bbd:title>, ?b)
USING bbd for >http://www.burningbird.net/resume_schema#>

The first line is the same SELECT clause, as discussed in the last RDFQL posting, but this is followed by a FROM clause, which lists the RDF file’s URL within angle brackets. Following is the WHERE clause containing the query, and again, this is no different than the JSP version, except that an alias is used instead of the full namespace. The namespace itself is listed in the last clause, delimited with the USING keyword.

Regardless of some syntactic differences, the query still returns the same result.

Taking the Light version of Query-o-matic out for a spin, I went looking for more complex queries, and found one in Phil’s Comments RDF. Though deceptively simple looking, Phil’s RDF file, in fact any RSS 1.0 RDF file, has one nasty little complication: containers.

An RDF container is an RDF object that groups related items together, usually with some implied processing as to order. An RDF container can group ordered items (SEQ), alternative items (ALT), or just a collection of unordered items (BAG). An RDF container is also a bit of a bugger when it comes to processing or generating RDF, one reason that they lack popularity.

However, the key to overcoming the difficulties associated with containers is the same as the one used with RDFQL queries – work with it one step at a time.

Container elements can be accessed individually by knowing that each item appears as an object in a (subject, predicate, object) triple with a predicate of TYPE (http://www.w3.org/1999/02/22-rdf-syntax-ns#type using the namespace). To access all container elements using RDFQL, you would need to have a WHERE clause similar to:

(?subject, <rdf:type>, “http://purl.org/rss/1.0/item”)

This will return all container elements within the RDF document for the JSP version of Query-o-Matic, but not the Light version. The PHP version doesn’t allow for literals (the “http://purl.org/rss/1.0/item” value) directly within the query triple. Instead, you use a filter, designated by the keyword AND:

WHERE (?subject, <rdf:type>, ?object)
AND ?object==”http://purl.org/rss/1.0/item”

This triple query filters the elements returned, giving us a target set of subjects that are equal to all of the container elements in the document. With Phil’s comments RDF/RSS file, this is all the comments.

Once we have the container elements, the subject values are then are passed into the next triple query, to access the DESCRIPTION property for each (the description holds the actual comment in RDF/RSS Comments). The value of the DESCRIPTION predicate is our target value, which gets printed out.

Pulling this all together, the query to access all of the actual comment text in the RDF document is:

SELECT ?desc
FROM <http://philringnalda.com/comments.rdf>
WHERE (?subject, <rdf:type>, ?object),
(?subject, <rss:description>, ?desc)
AND ?object==”http://purl.org/rss/1.0/item”
USING rdf for <http://www.w3.org/1999/02/22-rdf-syntax-ns#>,
rss for <http://purl.org/rss/1.0/>

The mapped values – the subjects – are highlighted. The subjects found in the first triple query are passed as subjects to the next.

Check out the results.

I’m actually not fond of container elements myself, precisely because there is processing semantics integrated into the element – sequence is assumed to be an ordered list of items, while a bag is not. I would rather provide the information necessary to order elements – such as date or some other characteristic – and then let the tool creators decide how they want the elements ordered.

Regardless, the trick to working with container elements is to use the TYPE predicate to discover the container elements, pull the subject associated with each, and then use these with relatively standard RDFQL for the rest of the query.

You can use both the JSP-based Query-o-matic and the PHP-based Query-o-Matic Light to try out different queries on whatever valid RDF documents you know of. Documentation for the RDFQL syntax used with the JSP based version can be found here, and the RDFQL syntax for the Light version can be found here. Remember that though there are syntactic differences between the two, the actual RDFQL used in the WHERE clause is logically the same – one or more chained triples, with the results of the first triple being passed to the second and so on.

Now that I have my query engines and can test my RDFQL, the next step is to pull these queries into an actual application, covered in the next of these essays into RDF and RDFQL.

To try the JSP Query-o-matic yourself, download and install Jena into your own environment. The actual o-matic JSP page can be downloaded here.

To try out o-Matic light, download and install the PHP XML classes. The PHP I used can be downloaded here.

Remember, these are for fun. So, have fun.

 

Categories
RDF

The saga of RDF continues

Recovered from the Wayback machine.

The posting I wrote on Friday about RDF has triggered much debate (in posting and at xml-dev), which is a goodness. I think it’s also triggered much misinterpretation and misunderstanding, which is what happens when a debate occurs across threads of mailing lists and weblog comments.

There has been summary attempts of the debate, such as at O’Reilly Network and at Joe Gregorio’s, but I’m not going to attempt to summarize it myself. Why? I have a viewpoint in this, and this would slant my summary. I’d rather just provide the links and let you form your own summation.

However, I do want to clarify something with my own position.

First, I’m not speaking for the RDF Working Group, in any way. I am giving my own viewpoints and opinions, which the WG may not agree with. No one can speak for the WG members, but they, themselves.

Additionally, I do not discount the complexity and difficulty inherent with RDF. I am aware, all too aware, of how complex the RDF Model documents can be. I know that there is much of the lab and not enough of the real world associated with the effort. And I’m not trying to dismiss people’s concerns with the model or the RDF/XML serialization when I say that we need to release the RDF specification rather than start over.

When I say that I don’t have problems with the RDF/XML, people should be aware that this is because I spent an enormous amount of time with the RDF specifications learning the core of the RDF model. I then spent a considerable amount of time learning how RDF is serialized with RDF/XML. I will now spend a significant amount of time reading through the newly released specifications to see where my understanding differs from the newest releases.

All of this has taken time and effort. I do not deny this.

I also don’t deny the importance of people being able to read and write RDF/XML. However, my interpretation of XML has been, and continues to be, that it’s a mechanical language rather than a biological one, and that it must be accurate, consistent, and reliable in a mechanical sense first, and foremost. Within these constraints, though, we should work to make the syntax as biologically understandable as possible.

Ultimately, I’m not trying to defend RDF/XML as much as I’m trying to generate understand that the problems people are having with RDF/XML aren’t consistent, and may not necessarily be problems with RDF/XML, at all.

Tim Bray creates RPV, which makes the RDF triples easier to read, but Simon St. Laurent says he doesn’t think in triples. Simon, on the other hand, is more concerned that RDF is having a deleterious effect on XML directly, as witness discussions about Qnames and URIs. These are two separate interpretations of “what’s wrong”, and lumping them all together into vague generalizations such as “RDF is ugly” or “RDF/XML is ugly” won’t help anyone.

Because of the discussions in the last week, I am re-visiting the chapters I wrote on the RDF specification for the Practical RDF book, coming in with a fresh perspective, and a better understanding of what the heck I need to write about. Unfortunately, I know enough after this weekend to be aware that this is going to be the most difficult technical writing task I have ever had. Can I clarify RDF and RDF/XML to the point that everyone understands both equally?

Exactly how does one achieve the impossible in 10,000 words, or less?


Posted by Bb at November 18, 2002 10:32 AM