RDF Poetry Finder: Pieces of the Puzzle

Recovered from the Wayback Machine.

First in a multi-part series focusing on RDF (Resource Description Framework) and poetry and demonstrating two-way integration between art and technology. No prior experience with either RDF or poetry is required.

Recently, Simon St. Laurent wrote a weblog essay titled The (data) medium is the message, in which he discusses the influence of the data container on the data. He uses the analogy of the newspaper and television as mediums for delivering information, which makes them technically the same type of container — both deliver information. However the format and quantity of information differs enormously between the two:

To some degree, you can get the same information from different media sources, but no one expects television to be a reading of newspaper stories or the newspaper to be a transcript of the nightly news on TV. Both are containers for information, but the shape of the container inevitably affects the way the information is both produced and consumed.

Developers tend to disregard this lesson from the real world, approaching the problem of data and container from a purely programming perspective based on an assumption of passive data. The assumption becomes it doesn’t matter what the container is, one can always manipulate the data to fit; we’ll just use technology to transform the data from a relational database to an XML document to RDF to an object store and so on. However, this passive data/programmatic approach to managing data almost always requires effort beyond that required using the appropriate data container; and the transforms between the data require compromises that may not always work cleanly.

In his essay, Simon wrote that the best approach to managing data is to first understand that it isn’t passive, and to work with its native structure, respect it’s natural state. Most importantly, working with data means using the appropriate container for the data.

As examples of matching data to container, data that requires a great deal of flexibility and that has recursive structures is a good fit for XML; while unordered data requiring a great deal of processing is a better fit for relational databases and so on.

Coming from a strong data background, I agree with Simon on the active nature of data, and thought his essay was both thoughtful and compelling. However, what caught my interest most about it was his interpretation of the nature of RDF data. Simon described it as, RDF feels like ‘puzzle’ data to me, interlocking pieces which form larger pictures when assembled. This is, in my opinion, one of the best descriptions of RDF I’ve yet seen, and I’ve seen a few.

Interlocking pieces, which form larger pictures when assembled. In addition to describing RDF data, this phrase could also be used to describe the data model underlying semantics; after all, semantics is the process of discovering meaning behind combinations of symbols — finding the big picture from the sum of the parts.

This parallelism of data model between RDF and semantics is to be expected because the purpose behind RDF is to provide a model on which to build the semantic web. Unfortunately, though, somewhere along the way, we became fixated on RDF’s serialization (transformation) to XML and lost sight of RDF’s power to describe complex structures, the big picture mentioned earlier.

While working on the book, Practical RDF, I had difficulty discovering uses of RDF that I felt demonstrated this capability. I was familiar with the two most popular uses of RDF/XML — RSS (RDF Site Summary) and FOAF (Friend of a Friend). I also created my own vocabularies, for Threadneedle (a way of threading conversations online), as well as PostCon (an online post-content management system). However, while all of these vocabularies are useful and workable, to me none of them captured, fully, the essence of RDF — a model of data that can only be described as complex concept rather than simple fact.

For instance, taking a closer look at RSS and FOAF:

At its simplest, RDF is a way of recording statements consisting of a subject, a predicate, and an object, known as the RDF triple. I know a person. I (subject) know(predicate) a person(object). The triples can be also be ‘chained’ when the object of one statement forms the subject of another, as in: I know a person who has a cat. With this example, the object of the first statement, the person, becomes the subject of the second, the owner of the cat.

Within the FOAF vocabulary, I know a person, and this person has a name; this person has an email address; this person has their own FOAF file, which, in turn lists the people they know, and so on. No matter how you record these statements — in an RDF directed graph, in a RDF/XML file, or using another notation, such as N-Triples — it doesn’t change the nature of the statements, assured by the underlying RDF model.

The same underlying principles work with RSS. A brief synopsis of the postings/essays I write to this weblog are output to a file which is then accessed by tools my readers use to determine that I (and others) updated, and what I have written. Within this file, the source of the information is described, including the source’s primary URL, name, and so on. Following are other statements, such as the individual items, each of which has a unique URL, and a unique title, and so on.

The data in the RSS file is described using RDF/XML, but, as with FOAF, I could easily record the statements as another allowable RDF format, N-Triples, and again, the validity of the statements isn’t changed. The model ensures this.

FOAF and RSS share other similarities beyond just those imposed by the underlying RDF model. Both record knowledge about a top-level object, either a person or a channel; both then record information about items related to that top-level object, in a strongly hierarchical relationship.

A FOAF file lists information about the subject, the person whom the FOAF file describes. It also links the person to other people. They also may know people, and this association can continue in a hierarchy of “A knows B’ until a FOAF file is reached wherein a person lists only people that don’t have FOAF files themselves and no further traversals are possible.

A RSS file lists information about a channel, such as this weblog. It also lists information about items contained within the webog, such as the individual postings. Newer changes proposed to the RSS specification are taking this breakdown of information further, by listing out comments under individual items, and eventually we’ll see trackback entries recorded in RSS. With the addition of trackback into RSS, weblog posting can be related to other weblog posting, and so on. Literally, ‘A knows B’, until, again, there is no further RSS object to traverse.

From an RDF semantics point of view, to some degree FOAF does provide the ability to capture and record information that would be difficult to discover just by searching for specific pieces of the data. Without FOAF, it would be difficult to determine if someone such as Leigh Dodds knows someone else, such as Edd Dumbill other than searching on both their names and hoping to find something in a web page somewhere that validates this assumption. Within the relationship there is a hint of interlocking pieces and a bigger picture.

RSS, on the other hand, provides no clues to some bigger picture within the data it encompasses, and makes no use of the richness of RDF semantics. I have referred to it as a ‘brain dead’ data model, and before the RSS fans in the audience lynch me, allow me to explain.

RSS is a convenience. Sources of information such as this weblog can generate RSS files or feeds. You, as the source reader, can subscribe to a feed using an RSS aggregator (a tool that grabs the feed information and organizes it into one spot). With the aggregator, you’ll be notified of updates, shown abstracts or even the entire items.

The RSS business model states that my RSS file contains a reference to this writing, including the title, the author, an excerpt, the date and time it was written, and the category. However, this same information is nothing more than a repetition of the information contained in the individual writing page. There is nothing in the RSS file that enhances the discovery of information about that thing being described.

What’s more, the RSS files only contain a specified number of items — next update, the oldest item drops off the page. Not only is the information simple and repetitious, it’s temporary at that. So the components of the RSS specification, rather than combining to describe a more complex concept, provide nothing more than a snapshot in time, abbreviated for easier consumption.

Of course, the RSS business model can be changed and the data persisted as well as enhanced, but then it would not longer be RSS. It would be something else.

This isn’t to say the RSS specification isn’t important, or useful, it is. RSS aggregators allow people to see, at a glance, that their favorite sources have written something new, on what subject and when. It is a fantastic convenience…but it is nothing more than a convenience. There is no complex semantics associated with RSS — hence my use of ‘brain dead’ to describe the underlying data structure. In fact, the structure of RSS, which consists of flexible data in recursive structures is a perfect fit for XML, but not necessarily RDF/XML.

Even FOAF for all of its ability to enhance discovery of information about a person and the people they know doesn’t really provide much sophistication — deliberately on the part of the original creators who wanted to keep the vocabulary simple. You can find out who a person knows, but not in what context, and without the context, the information associated with ‘knows’ is limited.

From my FOAF file, you can read that I know Danny Ayers and Mark Pilgrim. Well, that knows could be anything from I’ve met them online and have exchanged emails and we read each others weblogs (true), to we were once torrid lovers (untrue). That’s quite a range implied with that ‘knows’. The maximum information that can be gained from the richer aspects of FOAF is that Person A knows Person B. And that’s it.

Because of this deliberate simplification, I use the term ‘brain dead’ with FOAF, but with a caveat: FOAF was created to be simple deliberately, and could easily be enhanced to a much higher level of sophistication on the part of the FOAF originators if they or others choose.

My own efforts in creating an RDF vocabulary don’t fare much better. Threadneedle could be used to discover and persist the threads of an Internet-based conversation, resulting in a hierarchical structure somewhat comparable to FOAF but capturing the interaction of a group momentarily self-formed about a specific topic at a specific time. There is some semantic richness to this vocabulary, but again, no new information is inferred, just existing communication threads discovered.

PostCon does provide information that would be difficult to discover by other means, such as the movement history of a web resource, or why it was pulled from the server. However, this information isn’t necessarily sophisticated, as much as it just doesn’t exist. Current web technologies don’t have a way to persist this type of information, and PostCon supplies that persistence. Nice, but not quite a semantic cigar.

Again, as with FOAF and RSS, these implementations are useful and very handy, but they aren’t the brass ring of RDF semantic richness I hoped to discover. They are not examples of data demonstrating the complex nature of semantic data, the … interlocking pieces which form larger pictures when assembled.

Of course, RDF provides usefulness beyond just discovering complex concepts. First of all, it is based on a formalized model, which does ensure that it’s data is consistent regardless of business use. No small thing, this. In addition, its incorporation of namespaces allows data from many sources to be combined, and vocabularies to be enhanced and still ensure backwards compatibility. Additionally, I have found the APIs and the simple RDF triple based queries to be quite an easy way of manipulating data in XML documents — even more so then pure XML based query mechanisms. Based on this, I still use RDF for any XML vocabularies I create. But it’s not the same as using RDF’s rich semantics capability, especially when used to build an ontology that incorporates the inferential rules necessary to discover “concepts” rather than just “facts”.

I was beginning to think I would never find what I felt to be a perfect candidate for RDF. However, this all changed, by accident, when I started doing something new in my weblog. Something poetic.


Next: The Beginnings of a Beautiful Friendship

Print Friendly, PDF & Email