Categories
RDF

RDF Poetry Finder: The beginnings of a beautiful friendship

Recovered from the Wayback Machine.

I enjoy posting photographs to the weblog, and usually accompany them with a story or a short and appropriate note. A couple of months ago I started something new — posting photos accompanied by poems that complemented the ‘story’ I wanted to tell with the photo. Explaining my new approach, I wrote:

I started pairing my photographs with poems I found on the Internet as a way of playing with the mood of the photograph, and to discover new poems and new poets. It is fast becoming a favorite hobby, and is very effective at relieving stress, anger, and sadness. (Which is why I found myself spending a lot of time with it the last few weeks.)

I’ll look at a photograph and write down my first impressions of it: what it means to me, why I like it or not, and what I was trying to say with it when I took it. From this, I’ll gather select keywords and use these to search for a poem at a site, such as Plagiarist or the Academy of American Poets. I’ll wander about through the results until finding the poem that best connects.

What I found over time, though, is that it is uncommonly difficult to find a poem online using the ‘traditional’ search techniques — so much of poetry is based on imagery and metaphor, and metaphors are the ultimate destroyer of search engines. True Google busters.

For instance, take a look at the following photograph:

This photo shows a bridge, a river, and some vegetation, such as weeds, grasses, and trees. Now, I can search for poetry related to bridgesriver, and trees and find some interesting results, but none of them matches what I see.

The photo shows a bridge, a river, and bushes, but what I see is change in my life, with the bridge being the path I must take, while the river is the path I want to take. In the photo, the bridge is more substantial than the river, but I see it reflected in the water in wavy lines, making it less real than the seemingly solid, glassy surface of the water. This dichotomy of images and reality represents my internal struggle over the direction the change in my life will take.

Yes, all of that from one simple photo.

Now, try putting all of that into Google and see what you get. You get something like this. There’s a dream analysis in the results that’s rather interesting, but not the poem I’m looking for.

The bridge, as metaphor, can represent meeting a challenge, facing responsibility, leaving someone important, even death. It can also mean facing change in one’s life, and if I look through enough poems that have the word ‘bridge’ in them, I’m sure to find one that represents the complex concept I’m looking for. After a time. But what if I want to find a poem that focuses on a particular concept, but uses something else other than a bridge as metaphor? Other metaphors representing a change in life are the eagle, tree, a wave, and I’ve even seen cheese used as a metaphor for a resistance to change. (Here if you must.)

Do I then search for all metaphors for the concepts I’m seeking, and then search for these metaphors among the poems? I wanted an opportunity to get exposed to new poetry, but even the most ardent poetry lover would weary of this over a time.

No matter how sophisticated the search engine, even our personal pet Google, the success of the queries break down whenever you move beyond searches for specific facts. When you look for something more complex, such as a concept, the most you can do is input all of the words that best represent the concept, and then start the long, arduous refining process.

During my search for poetry I tried this, and then varied the results by focusing on a specific artist and then looking for a poem among their other works. The effort became frustrating at times because I couldn’t find what I was looking for and the poems would run together into this mess o’ words. At that point, I would sell the search for artistic truth short and resort to ‘bridge’, ‘river’, and ‘trees’; giving into the inevitable limitations of the traditional hierarchically structured, keyword-based discovery that is the existing web.

With poetry, the traditional web fails. Even with Google. Even the Google of the future.

Of course, not all poems incorporate imagery, or use metaphors; however, there are underlying concepts incorporated into the poem, assumptions and history and interpretations that blow holes in the search engine bots as sure as a shotgon blows holes in the wily fox caught in the hen house. For instance, take the following poem:

 

Mary Tyler Moore
moved out today.
A big orange truck
came and took her away.

 

So what does it mean? Exactly what it says: a neighbor who looks remarkably like Mary Tyler Moore moved out today, and a moving company that uses large orange painted trucks moved her. Now, if someone wants a poem about Mary Tyler Moore, or a big orange moving truck, why here I am.

We have focused so much of our attention on RSS and FOAF and other RDF-based vocabularies such as these. But given access to Google, I can find all the information that’s contained in RSS files. Given Google, I can most likely find all the information that’s contained in FOAF files, too.

What I wrote in the weblog is available here, at the weblog. If I were to stop generating an RSS file for this weblog tomorrow, the information would still exist. You would just have to visit here, rather than use your aggregator.

If someone was looking for weblogs that post on specific topics, again the information is here. And there are web bots looking for certain topics already, such as references to Ashcroft, or Iraq.

Same with FOAF — anything I’m willing to expose in a FOAF file is already exposed. Want to know if Mark Pilgrim knows me, and I know him? Easy, search on our names.

The information contained in RSS and FOAF files isn’t hidden behind imagery, isn’t obfuscated behind metaphor. The information these files record is the bits and pieces at which Google is so good, but we’re looking for that information that can only appear when the bits and pieces are assembled into a whole.

The web is positively dripping with facts, and there’s no better tool to find these facts than Google and the other search engines. We don’t necessarily need the semantic web to uncover the facts, though it can help. No, we really need the semantic web to uncover the complex concepts, the information that can only be found when many pieces are pulled together in a meaningful way. To discover the conversations. The subtle rumor and innuendo. The play on words and the analogies.

The poetry.

The title of this essay in the series is The Beginnings of a Beautiful Friendship, and, hopefully, you can now see how the semantic web can be a friend to poetry. But you might be wondering how poetry can be a friend to the semantic web?

The odd thing about poetry and the web is that as much as poetry isn’t a fit for the traditional web, it’s an ideal fit for the semantic web. Consider the dictionary definition of semantics — not the one given for artificial intelligence and programming, but the one given for linguistics. Semantics is the study of the meaning of language. Who pursues meaning in language more diligently, than the poet?

What’s been missing from the effort on RDF and the semantic web is the poets. On the committees and in the interest groups we have the mathematician, the logician, the computational linguist, the semantician, the artificial intelligence specialist, and the computer engineer. But we don’t have the the poet, and that’s a pity.

After all, who else through history has been more focued on meaning than the poet? Except perhaps the priest or the philosopher, and the former is more worried about souls while the latter spreads their interests too thinly.

 

Next: The Technician Sleeps while the Poet Speaks

Categories
RDF

RDF Poetry Finder: Pieces of the Puzzle

Recovered from the Wayback Machine.

First in a multi-part series focusing on RDF (Resource Description Framework) and poetry and demonstrating two-way integration between art and technology. No prior experience with either RDF or poetry is required.

Recently, Simon St. Laurent wrote a weblog essay titled The (data) medium is the message, in which he discusses the influence of the data container on the data. He uses the analogy of the newspaper and television as mediums for delivering information, which makes them technically the same type of container — both deliver information. However the format and quantity of information differs enormously between the two:

To some degree, you can get the same information from different media sources, but no one expects television to be a reading of newspaper stories or the newspaper to be a transcript of the nightly news on TV. Both are containers for information, but the shape of the container inevitably affects the way the information is both produced and consumed.

Developers tend to disregard this lesson from the real world, approaching the problem of data and container from a purely programming perspective based on an assumption of passive data. The assumption becomes it doesn’t matter what the container is, one can always manipulate the data to fit; we’ll just use technology to transform the data from a relational database to an XML document to RDF to an object store and so on. However, this passive data/programmatic approach to managing data almost always requires effort beyond that required using the appropriate data container; and the transforms between the data require compromises that may not always work cleanly.

In his essay, Simon wrote that the best approach to managing data is to first understand that it isn’t passive, and to work with its native structure, respect it’s natural state. Most importantly, working with data means using the appropriate container for the data.

As examples of matching data to container, data that requires a great deal of flexibility and that has recursive structures is a good fit for XML; while unordered data requiring a great deal of processing is a better fit for relational databases and so on.

Coming from a strong data background, I agree with Simon on the active nature of data, and thought his essay was both thoughtful and compelling. However, what caught my interest most about it was his interpretation of the nature of RDF data. Simon described it as, RDF feels like ‘puzzle’ data to me, interlocking pieces which form larger pictures when assembled. This is, in my opinion, one of the best descriptions of RDF I’ve yet seen, and I’ve seen a few.

Interlocking pieces, which form larger pictures when assembled. In addition to describing RDF data, this phrase could also be used to describe the data model underlying semantics; after all, semantics is the process of discovering meaning behind combinations of symbols — finding the big picture from the sum of the parts.

This parallelism of data model between RDF and semantics is to be expected because the purpose behind RDF is to provide a model on which to build the semantic web. Unfortunately, though, somewhere along the way, we became fixated on RDF’s serialization (transformation) to XML and lost sight of RDF’s power to describe complex structures, the big picture mentioned earlier.

While working on the book, Practical RDF, I had difficulty discovering uses of RDF that I felt demonstrated this capability. I was familiar with the two most popular uses of RDF/XML — RSS (RDF Site Summary) and FOAF (Friend of a Friend). I also created my own vocabularies, for Threadneedle (a way of threading conversations online), as well as PostCon (an online post-content management system). However, while all of these vocabularies are useful and workable, to me none of them captured, fully, the essence of RDF — a model of data that can only be described as complex concept rather than simple fact.

For instance, taking a closer look at RSS and FOAF:

At its simplest, RDF is a way of recording statements consisting of a subject, a predicate, and an object, known as the RDF triple. I know a person. I (subject) know(predicate) a person(object). The triples can be also be ‘chained’ when the object of one statement forms the subject of another, as in: I know a person who has a cat. With this example, the object of the first statement, the person, becomes the subject of the second, the owner of the cat.

Within the FOAF vocabulary, I know a person, and this person has a name; this person has an email address; this person has their own FOAF file, which, in turn lists the people they know, and so on. No matter how you record these statements — in an RDF directed graph, in a RDF/XML file, or using another notation, such as N-Triples — it doesn’t change the nature of the statements, assured by the underlying RDF model.

The same underlying principles work with RSS. A brief synopsis of the postings/essays I write to this weblog are output to a file which is then accessed by tools my readers use to determine that I (and others) updated, and what I have written. Within this file, the source of the information is described, including the source’s primary URL, name, and so on. Following are other statements, such as the individual items, each of which has a unique URL, and a unique title, and so on.

The data in the RSS file is described using RDF/XML, but, as with FOAF, I could easily record the statements as another allowable RDF format, N-Triples, and again, the validity of the statements isn’t changed. The model ensures this.

FOAF and RSS share other similarities beyond just those imposed by the underlying RDF model. Both record knowledge about a top-level object, either a person or a channel; both then record information about items related to that top-level object, in a strongly hierarchical relationship.

A FOAF file lists information about the subject, the person whom the FOAF file describes. It also links the person to other people. They also may know people, and this association can continue in a hierarchy of “A knows B’ until a FOAF file is reached wherein a person lists only people that don’t have FOAF files themselves and no further traversals are possible.

A RSS file lists information about a channel, such as this weblog. It also lists information about items contained within the webog, such as the individual postings. Newer changes proposed to the RSS specification are taking this breakdown of information further, by listing out comments under individual items, and eventually we’ll see trackback entries recorded in RSS. With the addition of trackback into RSS, weblog posting can be related to other weblog posting, and so on. Literally, ‘A knows B’, until, again, there is no further RSS object to traverse.

From an RDF semantics point of view, to some degree FOAF does provide the ability to capture and record information that would be difficult to discover just by searching for specific pieces of the data. Without FOAF, it would be difficult to determine if someone such as Leigh Dodds knows someone else, such as Edd Dumbill other than searching on both their names and hoping to find something in a web page somewhere that validates this assumption. Within the relationship there is a hint of interlocking pieces and a bigger picture.

RSS, on the other hand, provides no clues to some bigger picture within the data it encompasses, and makes no use of the richness of RDF semantics. I have referred to it as a ‘brain dead’ data model, and before the RSS fans in the audience lynch me, allow me to explain.

RSS is a convenience. Sources of information such as this weblog can generate RSS files or feeds. You, as the source reader, can subscribe to a feed using an RSS aggregator (a tool that grabs the feed information and organizes it into one spot). With the aggregator, you’ll be notified of updates, shown abstracts or even the entire items.

The RSS business model states that my RSS file contains a reference to this writing, including the title, the author, an excerpt, the date and time it was written, and the category. However, this same information is nothing more than a repetition of the information contained in the individual writing page. There is nothing in the RSS file that enhances the discovery of information about that thing being described.

What’s more, the RSS files only contain a specified number of items — next update, the oldest item drops off the page. Not only is the information simple and repetitious, it’s temporary at that. So the components of the RSS specification, rather than combining to describe a more complex concept, provide nothing more than a snapshot in time, abbreviated for easier consumption.

Of course, the RSS business model can be changed and the data persisted as well as enhanced, but then it would not longer be RSS. It would be something else.

This isn’t to say the RSS specification isn’t important, or useful, it is. RSS aggregators allow people to see, at a glance, that their favorite sources have written something new, on what subject and when. It is a fantastic convenience…but it is nothing more than a convenience. There is no complex semantics associated with RSS — hence my use of ‘brain dead’ to describe the underlying data structure. In fact, the structure of RSS, which consists of flexible data in recursive structures is a perfect fit for XML, but not necessarily RDF/XML.

Even FOAF for all of its ability to enhance discovery of information about a person and the people they know doesn’t really provide much sophistication — deliberately on the part of the original creators who wanted to keep the vocabulary simple. You can find out who a person knows, but not in what context, and without the context, the information associated with ‘knows’ is limited.

From my FOAF file, you can read that I know Danny Ayers and Mark Pilgrim. Well, that knows could be anything from I’ve met them online and have exchanged emails and we read each others weblogs (true), to we were once torrid lovers (untrue). That’s quite a range implied with that ‘knows’. The maximum information that can be gained from the richer aspects of FOAF is that Person A knows Person B. And that’s it.

Because of this deliberate simplification, I use the term ‘brain dead’ with FOAF, but with a caveat: FOAF was created to be simple deliberately, and could easily be enhanced to a much higher level of sophistication on the part of the FOAF originators if they or others choose.

My own efforts in creating an RDF vocabulary don’t fare much better. Threadneedle could be used to discover and persist the threads of an Internet-based conversation, resulting in a hierarchical structure somewhat comparable to FOAF but capturing the interaction of a group momentarily self-formed about a specific topic at a specific time. There is some semantic richness to this vocabulary, but again, no new information is inferred, just existing communication threads discovered.

PostCon does provide information that would be difficult to discover by other means, such as the movement history of a web resource, or why it was pulled from the server. However, this information isn’t necessarily sophisticated, as much as it just doesn’t exist. Current web technologies don’t have a way to persist this type of information, and PostCon supplies that persistence. Nice, but not quite a semantic cigar.

Again, as with FOAF and RSS, these implementations are useful and very handy, but they aren’t the brass ring of RDF semantic richness I hoped to discover. They are not examples of data demonstrating the complex nature of semantic data, the … interlocking pieces which form larger pictures when assembled.

Of course, RDF provides usefulness beyond just discovering complex concepts. First of all, it is based on a formalized model, which does ensure that it’s data is consistent regardless of business use. No small thing, this. In addition, its incorporation of namespaces allows data from many sources to be combined, and vocabularies to be enhanced and still ensure backwards compatibility. Additionally, I have found the APIs and the simple RDF triple based queries to be quite an easy way of manipulating data in XML documents — even more so then pure XML based query mechanisms. Based on this, I still use RDF for any XML vocabularies I create. But it’s not the same as using RDF’s rich semantics capability, especially when used to build an ontology that incorporates the inferential rules necessary to discover “concepts” rather than just “facts”.

I was beginning to think I would never find what I felt to be a perfect candidate for RDF. However, this all changed, by accident, when I started doing something new in my weblog. Something poetic.

 

Next: The Beginnings of a Beautiful Friendship

Categories
Photography RDF

Knowing which trail to walk

Recovered from the Wayback Machine

Today we tried a new park called the Forest 44 Conservancy, which is part of the Missouri conservation effort. It’s an interesting place very close to home and bordered by a large horse farm. Because it’s conservation land, the trail was lightly developed; from the nature of the trail, the park isn’t used that much. The day was lovely, but the only people we met were a couple on horses.

lonewalk.jpg

We were accompanied by sound the entire trip, including red-wing blackbirds, cardinals, meadowlarks, and so on. The trail traversed both forest and meadow, including wetland with one larger pond and a couple of smaller ones, and a stream.

The main meadow had a pond that was full of goldfish. Goldfish? Are they native to Missouri?

meadow2.jpg

In the forested part of the walk, we were surrounded by a crackling sound as small things scurried about under the dead leaves from last fall. It sounded like we were walking in a bowl of Rice Krispies.

At one point, my roommate, who was walking ahead of me, scared something that ran directly in front of me, a small, round brown thing, I have no idea what. Moved fast, though. Incredibly fast.

Another area of the forest had several ant mounds, a colony that must have been in that area of the land for years. Centuries? We walked especially carefully in that section. (I can post photos if there’s interest.)

mstream2.jpg

This is a good trail to walk. It was peaceful, tranquil.

The RSS trail, that’s not a good trail to walk. Not after the seeing the CSS barbs against Mark Pilgrim and Zeldman. Not after this thread. And too many others. No matter the facts, no matter how quiet one wants to discuss this topic, no matter how objective you can be, there is no successful resolution to the ‘problem’ of RSS.

The advice to me is to ignore it, and write about something else, something positive. Find my lighthouse, as Mark says. This trail, the walk, that’s a start. And I’m quite excited to see other people interested in the RDF Poetry Finder — I usually don’t get this interest from my readers when I talk about RDF. This is a little more than great. WOot!

So, pretty pics tonight. Peaceful trails tonight. And RDF and poetry next.

Update:

Shit! Can’t we ever go for a walk in the Missouri wilderness without becoming lunch for some critter that rides home with us? I learned my lesson from last year was dressed in long cotton pants, thick socks, and long sleeved cotton shirt. Roommate, who wore a tank top and shorts…well, he didn’t fare so well.

rickity.jpg

Categories
RDF

RSS and what’s the use?

Recovered from the Wayback Machine.

Actually, I should realize that I would be wasting my time discussing RSS, as no one will care, and no minds will change, and it won’t stop people getting into flame wars if I happen to mention RSS in a posting.

Pictures. I should just stick with pictures.

Categories
RDF

First things first

Recovered from the Wayback Machine.

I have been working on a series of essays and matching examples and implementations that combine art and technology, human communication and the Internet. Specifically they focus on RDF and poetry, of all things.

The essays will, hopefully, demonstrate where the complexity of RDF (Resource Description Framework) and ontologies shine and traditional keyword technology fails — within the metaphorical richness of poetry. I’m demonstrating how one can search on concepts, not just keywords, and get a listing of poems that incorporate the concepts, regardless of the actual words used in the poem.

For instance, the ‘bridge’ is many times used as a metaphor for a variety of complex concepts, such as a person facing change within themselves. By defining bridge as metaphor for this concept, one could attach the ‘bridge’ metaphor to a poem that doesn’t even contain the word ‘bridge’, but does contain the concept, though using a different metaphor.

After being burned out for so long, I’ve had a lot of fun working on what I call My “Poetry Finder” for want of a better term. What I particularly like about this work is it allows me to combine my interest in technology with art, particularly writing, something I’ve not been able to do before. I’m having fun. Rusty fun.

As an aside in the article, I was going to discuss, briefly, about my disappointment that so much about RDF is focused on RSS and FOAF, both of which I’ve referred to as ‘brain dead’ data models. I don’t use this term to insult these highly useful and popular specifications; but to demonstrate that using RDF for a simple hierarchy of items, parent to child, isn’t a good representation of the richness of RDF and an associated ontology built on RDF. The only semantics associated with either RSS or FOAF is from the data’s inclusion in a RSS or FOAF file — there is no other semantics associated with either of these specifications. I feel the points are good, and demonstrative.

However, lately, I’ve found myself reluctant to even mention RSS, because doing so invites a flame war into my comments that has little to do what I originally wrote.

When I write to this weblog, or elsewhere, it’s more than a collection of keywords randomly stuck together. When I use ‘RSS’ in a weblog posting, it’s within the context of the larger body of writing, not specifically associated with whatever one’s feelings are about RSS at that point in time. I’m reminded of a dog’s interpretation of how we speak, when I see the fixation on keywords within our weblogs at times:

 

blah blah blah, blahty, blah, RSS

blah, blah blah blippy blag ramble blah RSS blah blah

 

It can be more than a little disappointing when you write something and the comments take off on a tangent having little to do with what you write. Some would say that this is the richness of this medium — that it opens new doors to communication. True, and I’ve seen, and been pleased by, rich discussions in my comments that were inspired by the original writing, but not necessarily referencing it.

However, in the case of RSS, there is no inspiration involved — people see “RSS” and that’s all she wrote. Next thing you know, wars start, flames begin, and the whole thing about who ‘owns’ RSS, or who has ‘ruined’ RSS begins anew.

Not with my RDF and poetry work. I’ve spent too much time on these to allow them to be used as springboards for yet another mud slinging session. So I have two choices:

1. I can forego including anything referring to RSS in the essays. However, the inclusion of the material is demonstrative of some key points I want to make. In effect, by removing my writing on RSS, I would be censoring myself because I don’t want ill-mannered behavior in my comments. This is not acceptable.

2. Use a lightning rod. By this I mean get the discussion about RSS over and done with before going into the RDF and poetry writing. Make it clear that now is the time to discuss this things, get them out of our system. Not in my work on RDF/poetry.

So this is fair warning: today I’m wading into the politics surrounding RSS. Most of you won’t care, or will be tired of the discussion. You’ll most likely want to bypass my postings related to these topics. Fair enough. Please stop by after this weekend when I promise to write on other things, more pics, blogshares, and, especially, my RDF/poetry work.

I know that some people will be disappointed that I’m covering this topic. And I’m not going to bitch if it gets ugly, because I’m an old hand at this, I know what to expect. This discussion is a lose/lose, and most likely nothing will be resolved. However, my point isn’t resolution as much as it is exhaustion. Strike now, and then forever hold your peace.

More later today.