Sometimes you feel like RDF, sometimes you don’t

Semaview came out with this illustrated RDF vs XML graphic showing the ‘differences’ between RDF and XML. At least one assumes this is the purpose of a graph so titled. This might be confusing for some people that assume RDF is XML, which isn’t entirely true: RDF is a model, RDF/XML is one serialization of that model.

(Still, when you have XML on one side and RDF/XML on the other, one does wonder where the concept of ‘versus’ enters the picture.)

Based on this illustration, Leigh Dobbs asked the question:

I’m working with RDF tools now, but thats because FOAF is an RDF vocabulary. I’m just using the right tools for the job. If I was given a task to design a new system I don’t have any feel for why I might choose RDF over XML. I haven’t had that “aha” moment yet.

Personally, I doubt there will ever be an ‘aha’ moment associated with any W3C specification, but that’s beside the point.

In response to Leigh’s question, I wrote the following in his comments:

Pat Hayes actually grabbed a quote from the book and posted at the RDF WG core mail list about RDF’s usefulness:

“RDF is a technique to record statements about resources so that machines can easily pick up the statements. Not only that, but RDF is based on a domain-neutral model that allows one set of statements to be merged with another set of statements, even though the information contained in each set of statements may differ dramatically.”

XML gives us the format to record domain-neutral data, but RDF gives us the methodology to record complete domain-neutral statements — data in action as it were.

Ontologies are then domain-specific views built on top of the domain-neutral model that is RDF.

It’s all layers. Taking a cross-section:

Knowledge can be split into domain-specific views (ontology) based on complete statements (RDF) consisting of separate pieces of syntactically valid data (XML).

Since the first moment that XML appeared a few years back, the first thing I, as a data, not a markup person, looked for was a data model making use of XML. To me, XML would never be anything more than interesting data formatted in an interesting manner; without rules to help that data form some cohesive pattern outside of the rules defined for each implementation of an XML vocabulary I doubted its usefulness. Still, the bugger caught on and achieved wide-spread use.

Such waste — all that machine accessible data and absolutely no way of merging it into one data store in any meaningful way. Worse, having to create algorithms to manage each specific XML vocabulary rather than having one set for all vocabularies.

To me, all these XML vocabularies are equivalent to throwing out our relational databases and going back to proprietary data structures in each of our applications. Change jobs, learn an all-new structure. Buy a new application, and face a huge learning curve just understanding the underlying data and how it’s interrelated.

I knew when RDF came out it was the missing link between XML as bits and pieces of data, and XML as information; this though RDF wasn’t necessarily created specifically to be used as a model for XML.

(Some would say that the marriage between RDF and XML was a shotgun wedding at best. I don’t care. The cake was good and the band played on; I had a good time.)

In an effort to answer Leigh, Dorothea Salo wrote:

First. If you must end up with something XML-valid, don�t bother with RDF. Just don�t. Yes, you can restrict the RDF/XML you produce to a specific syntax form; you just can�t expect anything you receive to be similarly restricted, because RDF/XML-generating tools can�t be made to give a damn about which form they output of the many possible syntax forms of a given set of RDF/XML statements.

What Dorothea is referencing specifically is that there are different forms of XML that can be used for a specific type of RDF construct, which means that the same RDF model can be serialized in four different forms, and each would be an accurate and valid rendition of that model. True, but that doesn’t preclude that all of the XML is valid and that all of it can be restricted through DTD’s and XML Schemas, and still be valid RDF/XML.

However, Dorothea is right in that RDF is not magic pixie dust. RDF is nothing more than a way of recording domain-neutral statements in such a way that they are merged with other domain-neutral statements, each statement adding to the others in a mounting knowledge base.

When she says:

Computers only know what you tell �em. They don�t automagically know foo from bar any more than humans do. Inference only gets you so far. Sure, it might be further than we�ve been yet; I�m inclined to think so, myself. At some point, though, somebody�s got to know what the bits of the vocabularies mean, and all the inferential power in the world won�t get that across.

XML gives us the ability to record bits and pieces of data in a valid manner. RDF then builds on the data, piecing the bits and pieces together into complete statements. Ontologies then take these statements and build machine-understandable inferential rules based around them. The result of all this working together is the wine scenario:

Information from a vineyard is recorded as XML, and the names of wine are recorded as XML Schema datatype strings. The XML ensures that the names are valid, and the data is accessible with any XML compliant parser. Another format could be used, but then if someone else wanted to access the data, they’d have to build parsers that can understand the proprietary format.

The RDF model then provides the means to incorporate those names into facts, such as “Chianti is a red wine”, using a serialization technique molded on to XML:

<rdf:Description rdf:ID=”chianti”>
<wine:category rdf:datatype=
“http://www.w3.org/2001/XMLSchema#string>white</wine:category>
</rdf:Description>

We could build the model on which the facts are based directly into the XML vocabulary. But then, we’d have to make sure the model and the facts were consistent regardless of use. And since it was proprietary, other tools would also have to build in the ability to produce or consume facts based on this proprietary model.

Finally, the ontology, such as DAML+OIL and the W3C’s OWL, pieces together the separate statements and facts into domain-specific knowledge, by applying rules that allow machines to make inferences on the data, such as the fact that a cheese and nut dessert course is a part of a formal meal and is an alternative to a sweet dessert, and wines served during this course should be red:

<rdfs:Class rdf:ID=”CHEESE-NUTS-DESSERT-COURSE”>
<daml:intersectionOf rdf:parseType=”daml:collection”>
<daml:Restriction>
<daml:onProperty rdf:resource=”#FOOD”/>
<daml:hasClass rdf:resource=”#CHEESE-NUTS-DESSERT”/>
</daml:Restriction>
<rdfs:Class rdf:about=”#MEAL-COURSE”/>
</daml:intersectionOf>
<rdfs:subClassOf rdf:resource=”#DRINK-HAS-RED-COLOR-RESTRICTION”/>
</rdfs:Class>
<rdfs:Class rdf:ID=”CHEESE-NUTS-DESSERT”>
<rdfs:subClassOf rdf:resource=”#DESSERT”/>
<daml:disjointWith rdf:resource=”#SWEET-DESSERT”/>
</rdfs:Class>

This type of information can never be recorded in ‘straight’ RDF/XML because you’d have to have the ability to record the inferential rules, and RDF focuses on recording statements. Additionally, the information could never be discovered in straight XML because you have to have the ability to record not only the rules but the statements, too. You would literally have to build a model and then find a way to serialize that model in XML. Just like RDF. If you used XML, you’d have to define the ability to record facts, and then on top of that, the ability to record the necessary information to perform inferential queries — something more esoteric than “what is a white wine”.

But using XML as a data format, and using RDF as a statement/model methodology and using OWL to record the domain-specific rules, you can go to the application such as the Wine Agent, ask for recommendations for a cheese desert wine within a certain region and get the following answer:

“Pairs well with sweet red varieties. Full-bodied wines featuring strong flavors match especially well.”

The local knowledge base particularly recommends the following:

TAYLOR PORT

The recommended wines can be found below, along with some comparable selections: (with link to selection)

The frosting on this particular layer cake is that anyone associated in some way with the wine industry — producing or consuming — could use the wine ontology, based on RDF, persisted in XML for their own applications and functionality. Better yet, another industry, let’s say the chocolate industry, can use the same XML/RDF/ontology combination, and the same tools that work with each, as a way of recording their domain-specific data.

And you’ll be able to know exactly which champagne to buy to go with that dark bitter-sweet chocolate covered orange peel you bought for that special someone.