Categories
RDF

The RDF Query-O-Matic

Recovered from the Wayback Machine.

Note that my current server does not support Tomcat-based application. The Java-based Query-o-Matic is disabled until I can move it to the appropriate environment.

I created a small application, the RDF Query-o-Matic, using Java and HP’s Jena (a Java RDF API), and hosted it on my Tomcat server. The Query-o-Matic accepts the name of an RDF file (any valid RDF file), and an RDFQL (RDF Query Language) query, and will print out a test value found as a result of that query. I created the tool as a way of testing queries without having to go back into my code as I work.

You don’t have to be a techie, or a programmer, or familiar with RDF or even XML to work with RDFQL, as the Query-o-Matic will demonstrate. All you need is a bit of logic, and a familiarity with old nursery rhymes.

Taking it one step at a time…

RDF is a meta-model of information, similar to the relational data model. RDF/XML is a way of serializing the model information, as one would use a relational database to store relational data. Carrying the analogy to its natural conclusion, as SQL is to relational data, RDFQL is to RDF data.

RDFQL is actually not that complex. The key is remembering that every ’statement’ in an RDF file is made up of a subject, predicate (property), and value. If you view Mark Pilgrim’s FOAF file in graphical format, using the RDF Validator (access here), the predicate (property) always appears on an arc – the subject is to the left of the arc and the value of the predicate, the object, is to the right. Every RDF statement can be broken down into one of these <subject, predicate, object> triples.

RDF queries are nothing more than patterns based on this triple. This might sound confusing, but not if you take the queries one step at a time.

For instance, if I want to access and print out all of the NAME elements in Mark Pilgrim’s FOAF file, I would use a query like the following:

select ?name where (?subject, <http://xmlns.com/foaf/0.1/name>, ?name)

In this query, the SELECT clause (’select ?name’) references the variable I’ll access from the results; the rest of the query, the WHERE clause has the actual query. In this instance, I don’t care what the subject is so I’m using a placeholder ?subject that’s basically ignored. It’s followed by the predicate that forms the query, in this case the NAME. Since all elements in RDF belong to a namespace, I’m preceding the element with its namespace, and including the whole within angle brackets.

The angle brackets are used to destinguish an element from a literal value

Following the predicate is another placeholder, this one for the name element’s value (i.e. the actual names).

The whole is entered into the Query-o-matic as follows:

URL: http://www.diveintomark.org/public/foaf.rdf

query: select ?name where (?subject, <http://xmlns.com/foaf/0.1/name>, ?name)

value to print: name

View the result.

Let’s say I want to refine the query – I only want the value of ‘name’ for the subject f8dy. I would then need to modify the query to add the subject as well as the predicate:

URL: http://www.diveintomark.org/public/foaf.rdf

query: select ?name where (<http://www.diveintomark.org/public/foaf.rdf#f8dy>, <http://xmlns.com/foaf/0.1/name>, ?name)

value to print: name

This time only one value is returned (if Mark’s RDF file doesn’t change), Mark Pilgrim.

Well, this is great for finding all elements of a certain type of if you’re accessing a specific statement given a subject. but what if you want to find all elements of a certain type that have a specific relationship with another element? After all, the power of RDF is the ability to record statements and relate these same statements to one another.

Piece of cake. All you have to remember is an old, old nursery rhyme:

The itsy bitsy spider
Crawled up the water spout
Down came the rain
And washed the spider out
Out came the sun
And dried up all the rain
And the itsy bitsy spider
Crawled up the spout again

If you sang this as a kid (or sing this song with your own kids), you would play out the motion of the spider climbing by placing your hands together, the small finger of your right hand against the thumb of your left, and the small finger of your left against the right thumb. As you sing the song, you twist your hands, keeping the top two digits in contact, bringing up the bottom in a circular motion, re-joining these digits at the top. You would repeat this action, twisting on the top digits, bringing up the bottom and so on, never breaking the contact between the two hands.

The objective with your hands during this song was to always keep contact between the two and still have motion. That’s the basic foundation of more complex queries in RDFQL: mapping one element of one triple, to another element on another triple in a chained path that eventually gets you from point A all the way to point Z.

As an example, within Mark’s FOAF file, he has listed a group of people that he ‘knows’, each of whom has a NAME. To print out just the names of these people, we’ll need to adjust the query to find each statement that has ‘know’ as predicate, and then use the object of that statement, as the subject of the next triple. This gets us a list of people who Mark knows. To get their actual names, the NAME element is then used in the predicate of the second triple, to refine the result.

Well, this one definitely needs an example:

url: http://www.diveintomark.org/public/foaf.rdf

query: select ?name where
(?a, <http://xmlns.com/foaf/0.1/knows>, ?object),
(?object, <http://xmlns.com/foaf/0.1/name>, ?name)

print: name

In this, the first triple returns statements where the predicate is the ‘knows’ element – all known people. The results of this triple are then passed to the next. In the second triple, the object of the first triple – the identifier as it were of the individual people, is the subject of the new triple. This will return all of the properties for each of the known people. Since we’re only interested in the ‘name’ property, we further refine the query to only return the name values, which are printed out.

Check out the results.

The key to this query working is that not all objects (property values) are literal values – sometimes they can be subjects, too, as occurs with the ‘knows’ relationship in FOAF. These objects can then be plugged in as the subject of a new query (note the highlighted ?object), and the results combined to return not only ‘names’ of people, but names of people that Mark knows.

Just like walking that spider up the wall.

Of course, not all queries are going to be as straight forward as they are in the FOAF example, and the next installment on RDFQL will take a look at additional and increasingly complex examples. In addition, the Tomcat/Java Query-o-Matic will be joined by its PHP cousin: Query-o-matic Light.