Categories
RDF

RDF Poetry Finder: Pieces of the Puzzle

Recovered from the Wayback Machine.

First in a multi-part series focusing on RDF (Resource Description Framework) and poetry and demonstrating two-way integration between art and technology. No prior experience with either RDF or poetry is required.

Recently, Simon St. Laurent wrote a weblog essay titled The (data) medium is the message, in which he discusses the influence of the data container on the data. He uses the analogy of the newspaper and television as mediums for delivering information, which makes them technically the same type of container — both deliver information. However the format and quantity of information differs enormously between the two:

To some degree, you can get the same information from different media sources, but no one expects television to be a reading of newspaper stories or the newspaper to be a transcript of the nightly news on TV. Both are containers for information, but the shape of the container inevitably affects the way the information is both produced and consumed.

Developers tend to disregard this lesson from the real world, approaching the problem of data and container from a purely programming perspective based on an assumption of passive data. The assumption becomes it doesn’t matter what the container is, one can always manipulate the data to fit; we’ll just use technology to transform the data from a relational database to an XML document to RDF to an object store and so on. However, this passive data/programmatic approach to managing data almost always requires effort beyond that required using the appropriate data container; and the transforms between the data require compromises that may not always work cleanly.

In his essay, Simon wrote that the best approach to managing data is to first understand that it isn’t passive, and to work with its native structure, respect it’s natural state. Most importantly, working with data means using the appropriate container for the data.

As examples of matching data to container, data that requires a great deal of flexibility and that has recursive structures is a good fit for XML; while unordered data requiring a great deal of processing is a better fit for relational databases and so on.

Coming from a strong data background, I agree with Simon on the active nature of data, and thought his essay was both thoughtful and compelling. However, what caught my interest most about it was his interpretation of the nature of RDF data. Simon described it as, RDF feels like ‘puzzle’ data to me, interlocking pieces which form larger pictures when assembled. This is, in my opinion, one of the best descriptions of RDF I’ve yet seen, and I’ve seen a few.

Interlocking pieces, which form larger pictures when assembled. In addition to describing RDF data, this phrase could also be used to describe the data model underlying semantics; after all, semantics is the process of discovering meaning behind combinations of symbols — finding the big picture from the sum of the parts.

This parallelism of data model between RDF and semantics is to be expected because the purpose behind RDF is to provide a model on which to build the semantic web. Unfortunately, though, somewhere along the way, we became fixated on RDF’s serialization (transformation) to XML and lost sight of RDF’s power to describe complex structures, the big picture mentioned earlier.

While working on the book, Practical RDF, I had difficulty discovering uses of RDF that I felt demonstrated this capability. I was familiar with the two most popular uses of RDF/XML — RSS (RDF Site Summary) and FOAF (Friend of a Friend). I also created my own vocabularies, for Threadneedle (a way of threading conversations online), as well as PostCon (an online post-content management system). However, while all of these vocabularies are useful and workable, to me none of them captured, fully, the essence of RDF — a model of data that can only be described as complex concept rather than simple fact.

For instance, taking a closer look at RSS and FOAF:

At its simplest, RDF is a way of recording statements consisting of a subject, a predicate, and an object, known as the RDF triple. I know a person. I (subject) know(predicate) a person(object). The triples can be also be ‘chained’ when the object of one statement forms the subject of another, as in: I know a person who has a cat. With this example, the object of the first statement, the person, becomes the subject of the second, the owner of the cat.

Within the FOAF vocabulary, I know a person, and this person has a name; this person has an email address; this person has their own FOAF file, which, in turn lists the people they know, and so on. No matter how you record these statements — in an RDF directed graph, in a RDF/XML file, or using another notation, such as N-Triples — it doesn’t change the nature of the statements, assured by the underlying RDF model.

The same underlying principles work with RSS. A brief synopsis of the postings/essays I write to this weblog are output to a file which is then accessed by tools my readers use to determine that I (and others) updated, and what I have written. Within this file, the source of the information is described, including the source’s primary URL, name, and so on. Following are other statements, such as the individual items, each of which has a unique URL, and a unique title, and so on.

The data in the RSS file is described using RDF/XML, but, as with FOAF, I could easily record the statements as another allowable RDF format, N-Triples, and again, the validity of the statements isn’t changed. The model ensures this.

FOAF and RSS share other similarities beyond just those imposed by the underlying RDF model. Both record knowledge about a top-level object, either a person or a channel; both then record information about items related to that top-level object, in a strongly hierarchical relationship.

A FOAF file lists information about the subject, the person whom the FOAF file describes. It also links the person to other people. They also may know people, and this association can continue in a hierarchy of “A knows B’ until a FOAF file is reached wherein a person lists only people that don’t have FOAF files themselves and no further traversals are possible.

A RSS file lists information about a channel, such as this weblog. It also lists information about items contained within the webog, such as the individual postings. Newer changes proposed to the RSS specification are taking this breakdown of information further, by listing out comments under individual items, and eventually we’ll see trackback entries recorded in RSS. With the addition of trackback into RSS, weblog posting can be related to other weblog posting, and so on. Literally, ‘A knows B’, until, again, there is no further RSS object to traverse.

From an RDF semantics point of view, to some degree FOAF does provide the ability to capture and record information that would be difficult to discover just by searching for specific pieces of the data. Without FOAF, it would be difficult to determine if someone such as Leigh Dodds knows someone else, such as Edd Dumbill other than searching on both their names and hoping to find something in a web page somewhere that validates this assumption. Within the relationship there is a hint of interlocking pieces and a bigger picture.

RSS, on the other hand, provides no clues to some bigger picture within the data it encompasses, and makes no use of the richness of RDF semantics. I have referred to it as a ‘brain dead’ data model, and before the RSS fans in the audience lynch me, allow me to explain.

RSS is a convenience. Sources of information such as this weblog can generate RSS files or feeds. You, as the source reader, can subscribe to a feed using an RSS aggregator (a tool that grabs the feed information and organizes it into one spot). With the aggregator, you’ll be notified of updates, shown abstracts or even the entire items.

The RSS business model states that my RSS file contains a reference to this writing, including the title, the author, an excerpt, the date and time it was written, and the category. However, this same information is nothing more than a repetition of the information contained in the individual writing page. There is nothing in the RSS file that enhances the discovery of information about that thing being described.

What’s more, the RSS files only contain a specified number of items — next update, the oldest item drops off the page. Not only is the information simple and repetitious, it’s temporary at that. So the components of the RSS specification, rather than combining to describe a more complex concept, provide nothing more than a snapshot in time, abbreviated for easier consumption.

Of course, the RSS business model can be changed and the data persisted as well as enhanced, but then it would not longer be RSS. It would be something else.

This isn’t to say the RSS specification isn’t important, or useful, it is. RSS aggregators allow people to see, at a glance, that their favorite sources have written something new, on what subject and when. It is a fantastic convenience…but it is nothing more than a convenience. There is no complex semantics associated with RSS — hence my use of ‘brain dead’ to describe the underlying data structure. In fact, the structure of RSS, which consists of flexible data in recursive structures is a perfect fit for XML, but not necessarily RDF/XML.

Even FOAF for all of its ability to enhance discovery of information about a person and the people they know doesn’t really provide much sophistication — deliberately on the part of the original creators who wanted to keep the vocabulary simple. You can find out who a person knows, but not in what context, and without the context, the information associated with ‘knows’ is limited.

From my FOAF file, you can read that I know Danny Ayers and Mark Pilgrim. Well, that knows could be anything from I’ve met them online and have exchanged emails and we read each others weblogs (true), to we were once torrid lovers (untrue). That’s quite a range implied with that ‘knows’. The maximum information that can be gained from the richer aspects of FOAF is that Person A knows Person B. And that’s it.

Because of this deliberate simplification, I use the term ‘brain dead’ with FOAF, but with a caveat: FOAF was created to be simple deliberately, and could easily be enhanced to a much higher level of sophistication on the part of the FOAF originators if they or others choose.

My own efforts in creating an RDF vocabulary don’t fare much better. Threadneedle could be used to discover and persist the threads of an Internet-based conversation, resulting in a hierarchical structure somewhat comparable to FOAF but capturing the interaction of a group momentarily self-formed about a specific topic at a specific time. There is some semantic richness to this vocabulary, but again, no new information is inferred, just existing communication threads discovered.

PostCon does provide information that would be difficult to discover by other means, such as the movement history of a web resource, or why it was pulled from the server. However, this information isn’t necessarily sophisticated, as much as it just doesn’t exist. Current web technologies don’t have a way to persist this type of information, and PostCon supplies that persistence. Nice, but not quite a semantic cigar.

Again, as with FOAF and RSS, these implementations are useful and very handy, but they aren’t the brass ring of RDF semantic richness I hoped to discover. They are not examples of data demonstrating the complex nature of semantic data, the … interlocking pieces which form larger pictures when assembled.

Of course, RDF provides usefulness beyond just discovering complex concepts. First of all, it is based on a formalized model, which does ensure that it’s data is consistent regardless of business use. No small thing, this. In addition, its incorporation of namespaces allows data from many sources to be combined, and vocabularies to be enhanced and still ensure backwards compatibility. Additionally, I have found the APIs and the simple RDF triple based queries to be quite an easy way of manipulating data in XML documents — even more so then pure XML based query mechanisms. Based on this, I still use RDF for any XML vocabularies I create. But it’s not the same as using RDF’s rich semantics capability, especially when used to build an ontology that incorporates the inferential rules necessary to discover “concepts” rather than just “facts”.

I was beginning to think I would never find what I felt to be a perfect candidate for RDF. However, this all changed, by accident, when I started doing something new in my weblog. Something poetic.

 

Next: The Beginnings of a Beautiful Friendship

Categories
Photography RDF

Knowing which trail to walk

Recovered from the Wayback Machine

Today we tried a new park called the Forest 44 Conservancy, which is part of the Missouri conservation effort. It’s an interesting place very close to home and bordered by a large horse farm. Because it’s conservation land, the trail was lightly developed; from the nature of the trail, the park isn’t used that much. The day was lovely, but the only people we met were a couple on horses.

lonewalk.jpg

We were accompanied by sound the entire trip, including red-wing blackbirds, cardinals, meadowlarks, and so on. The trail traversed both forest and meadow, including wetland with one larger pond and a couple of smaller ones, and a stream.

The main meadow had a pond that was full of goldfish. Goldfish? Are they native to Missouri?

meadow2.jpg

In the forested part of the walk, we were surrounded by a crackling sound as small things scurried about under the dead leaves from last fall. It sounded like we were walking in a bowl of Rice Krispies.

At one point, my roommate, who was walking ahead of me, scared something that ran directly in front of me, a small, round brown thing, I have no idea what. Moved fast, though. Incredibly fast.

Another area of the forest had several ant mounds, a colony that must have been in that area of the land for years. Centuries? We walked especially carefully in that section. (I can post photos if there’s interest.)

mstream2.jpg

This is a good trail to walk. It was peaceful, tranquil.

The RSS trail, that’s not a good trail to walk. Not after the seeing the CSS barbs against Mark Pilgrim and Zeldman. Not after this thread. And too many others. No matter the facts, no matter how quiet one wants to discuss this topic, no matter how objective you can be, there is no successful resolution to the ‘problem’ of RSS.

The advice to me is to ignore it, and write about something else, something positive. Find my lighthouse, as Mark says. This trail, the walk, that’s a start. And I’m quite excited to see other people interested in the RDF Poetry Finder — I usually don’t get this interest from my readers when I talk about RDF. This is a little more than great. WOot!

So, pretty pics tonight. Peaceful trails tonight. And RDF and poetry next.

Update:

Shit! Can’t we ever go for a walk in the Missouri wilderness without becoming lunch for some critter that rides home with us? I learned my lesson from last year was dressed in long cotton pants, thick socks, and long sleeved cotton shirt. Roommate, who wore a tank top and shorts…well, he didn’t fare so well.

rickity.jpg

Categories
RDF

RSS and what’s the use?

Recovered from the Wayback Machine.

Actually, I should realize that I would be wasting my time discussing RSS, as no one will care, and no minds will change, and it won’t stop people getting into flame wars if I happen to mention RSS in a posting.

Pictures. I should just stick with pictures.

Categories
RDF

First things first

Recovered from the Wayback Machine.

I have been working on a series of essays and matching examples and implementations that combine art and technology, human communication and the Internet. Specifically they focus on RDF and poetry, of all things.

The essays will, hopefully, demonstrate where the complexity of RDF (Resource Description Framework) and ontologies shine and traditional keyword technology fails — within the metaphorical richness of poetry. I’m demonstrating how one can search on concepts, not just keywords, and get a listing of poems that incorporate the concepts, regardless of the actual words used in the poem.

For instance, the ‘bridge’ is many times used as a metaphor for a variety of complex concepts, such as a person facing change within themselves. By defining bridge as metaphor for this concept, one could attach the ‘bridge’ metaphor to a poem that doesn’t even contain the word ‘bridge’, but does contain the concept, though using a different metaphor.

After being burned out for so long, I’ve had a lot of fun working on what I call My “Poetry Finder” for want of a better term. What I particularly like about this work is it allows me to combine my interest in technology with art, particularly writing, something I’ve not been able to do before. I’m having fun. Rusty fun.

As an aside in the article, I was going to discuss, briefly, about my disappointment that so much about RDF is focused on RSS and FOAF, both of which I’ve referred to as ‘brain dead’ data models. I don’t use this term to insult these highly useful and popular specifications; but to demonstrate that using RDF for a simple hierarchy of items, parent to child, isn’t a good representation of the richness of RDF and an associated ontology built on RDF. The only semantics associated with either RSS or FOAF is from the data’s inclusion in a RSS or FOAF file — there is no other semantics associated with either of these specifications. I feel the points are good, and demonstrative.

However, lately, I’ve found myself reluctant to even mention RSS, because doing so invites a flame war into my comments that has little to do what I originally wrote.

When I write to this weblog, or elsewhere, it’s more than a collection of keywords randomly stuck together. When I use ‘RSS’ in a weblog posting, it’s within the context of the larger body of writing, not specifically associated with whatever one’s feelings are about RSS at that point in time. I’m reminded of a dog’s interpretation of how we speak, when I see the fixation on keywords within our weblogs at times:

 

blah blah blah, blahty, blah, RSS

blah, blah blah blippy blag ramble blah RSS blah blah

 

It can be more than a little disappointing when you write something and the comments take off on a tangent having little to do with what you write. Some would say that this is the richness of this medium — that it opens new doors to communication. True, and I’ve seen, and been pleased by, rich discussions in my comments that were inspired by the original writing, but not necessarily referencing it.

However, in the case of RSS, there is no inspiration involved — people see “RSS” and that’s all she wrote. Next thing you know, wars start, flames begin, and the whole thing about who ‘owns’ RSS, or who has ‘ruined’ RSS begins anew.

Not with my RDF and poetry work. I’ve spent too much time on these to allow them to be used as springboards for yet another mud slinging session. So I have two choices:

1. I can forego including anything referring to RSS in the essays. However, the inclusion of the material is demonstrative of some key points I want to make. In effect, by removing my writing on RSS, I would be censoring myself because I don’t want ill-mannered behavior in my comments. This is not acceptable.

2. Use a lightning rod. By this I mean get the discussion about RSS over and done with before going into the RDF and poetry writing. Make it clear that now is the time to discuss this things, get them out of our system. Not in my work on RDF/poetry.

So this is fair warning: today I’m wading into the politics surrounding RSS. Most of you won’t care, or will be tired of the discussion. You’ll most likely want to bypass my postings related to these topics. Fair enough. Please stop by after this weekend when I promise to write on other things, more pics, blogshares, and, especially, my RDF/poetry work.

I know that some people will be disappointed that I’m covering this topic. And I’m not going to bitch if it gets ugly, because I’m an old hand at this, I know what to expect. This discussion is a lose/lose, and most likely nothing will be resolved. However, my point isn’t resolution as much as it is exhaustion. Strike now, and then forever hold your peace.

More later today.

Categories
Weblogging

Back to important stuff

Recovered from the Wayback Machine.

Well, enough of the poetry stuff, let’s get back to the important stuff — blogshares!

I’m getting shares in weblogs as gifts from folks. I’m new to this subterranean culture — is there some etiquette to blogshare giving? Is giving a person one share of your weblog’s stock a new calling card of capitalism?

And I’m now the second best player for May, and I have no idea of why. It’s fun, though. Especially torturing Dorothea with lyrics from old 80’s shows, until she cracks under the pressure and sells me shares in her weblog. I am a ruthless entrepreneur.

I have something fun I’ve been working on the last couple of days I need to finish, but other stuff keeps intruding. There’s fun stuff, such as exactly what Evil Twin would do with a course called Nature, Nuture, Nonsense. And then there’s less fun stuff, such as an accusation about non-objective coverage of RSS on the part of O’Reilly authors. Such as myself. With this one, though, I think my best bet is to focus on fun, rather than tired, worn out fights.