The elephant strikes again

Danny Ayers and Bill de hÓra got into a bit of back and forth on RDF and Bill’s response reminded me again of the blind wise men and the elephant. Each person describes a different creature when asked to describe the whole from just the part they touched: one touching the leg assumes the creature is a large, stumpy creature; the tail, skinny and flexible and so on.

Bill lists out several areas where attention paid by the SemWeb group could increase overall adoption of RDF. In turn:

Better communication, algorithms, tools, etc. to map RDF into relational databases. How can we do so and increase efficiency. This one has actually happened, but perhaps not been as marketed. I know that in the Java world, the Jena folks have put a great deal of thought and energy into efficient RDF storage opimized for SPARQL access (do we use Sparql now?). As Danny stated, tests have been made in systems with millions and millions of triples stored, and relatively quick access of same using Sparql–as quick as a relational query against joined relational tables.

I think this is one that has to go back to Bill: exactly what language/API or data store solution did you use that resulted in such slow performance?

Bill’s issues associated with integration was, at least to me, a bit confusing. I don’t think one can say to companies: use RDF thus and thus and you’ll save so much money, because one can’t say the same about the relational model or the network model or even the old hierarchical model. However, give us a specific task, and we might be able to point out where RDF can be used effectively. Or perhaps we’ll say, no, a relational model is better for that purpose. We might even recommend hierarchical–no one model is perfect for every use.

Where Bill seems to be focused is on data interchange: moving data from tool to tool. Well, now that is one I am very familiar with and this is one that can be extremely effectively managed by RDF as much, or more, than a specifically defined XML vocabulary.

Think of a hub-and-spoke system, whereby heterogeneous applications all access data from a common core, which acts as a buffer to prevent hard coded dependencies between these applications. One can define an XML format to be used between each application and the buffer, but this means that each application has to create XML to whatever format is needed for each buffer. Commercial applications won’t do this, so this means the company has to create a mapping–a layer between the application’s API and the system core, which adds to the overall complexity of the system.

The company developers have to do this not because the business model might differ in whatever degree between the core and the application, but because XML is physically dependent, and this means that the mapping has to be just so before the data can even be processed in order to be merged. In other words: there’s a pure tech dependency on the structure of the XML that transcends even the business model.

RDF eliminates this. A statement is a statement is a statement. It removes positional dependencies, very similar to how the relational model removed such when it was adopted over the older network or hierarchical models. If all the spokes ‘spoke’ RDF (sorry, couldn’t resist), then all that the hub/core would need to do is map each to the other, and this doesn’t even require any additional coding if such business mapping is defined using OWL. All that would be required to add a new application that produces an RDF conversion of its data export/service response is the addition of rules to the core’s rules database using OWL.

You can’t get there with relational databases. I know, I worked with EDI (Electronic Data Interchange) for years. You also can’t get there with XML because there is no rules mechanism inherent in XML. Of course not: why over-complicate a nice, clean syntactic format? XML wasn’t meant to be a rules database. Relational wasn’t meant to be a rules database (but does very nicely for pre-defined, isolated business model use). RDF, on the other hand, eats rules for lunch.

Think of Scotty: Use the right tool for the job.

Moving on to Bill’s list, his big item is ORM (Object-Relational Mapping) and widgets, and it’s true, I’ve not necessarily worked with an object-relational mapping or direct form to RDF widgets. As such, I can’t respond to this one, other than to say if people have created these, I’d like to see them myself.

I think what’s a bigger issue, though, is that Bill demonstrates the wise men and the elephant conundrum: Bill has certain items he considers important, including one that I wouldn’t necessarily consider important. He sees ORM as bigger than syndication feeds, yet for all my tweaking of syndication feed folks, I consider feeds probably one of the most significant advances in technology in years. Why? Because it has had a major impact on relatively standard behavior: how we get access to current news.

That’s the deal: Bill is thinking purely as a developer working with specific problems, and wants to see how RDF can help him. If we continue to get caught up in specific tech-to-tech comparions, RDF will continuously be placed into a position of loss: being compared against specific uses of technology for which it really wasn’t designed, and when it comes up short, which of course it will, being discounted in such a way that it’s not considered for use where it would be effective.

What the RDF community needs to do is go beyond just the technology-to-technology mapping and look at the overall picture: where does the semantic web community want to go, and what piece of this best fits RDF?

Sheila Lennon writes about the John Markoff Times’ article on common sense and Web 3.0 and gives her interpretation of what she sees as the semantic web:

Real world: When restaurants are online in realtime (yet to happen), my computer could display Providence restaurants serving cordon bleu tonight at what prices, ask me to choose one (around when?), then make a reservation, reserve a portion of chicken cordon bleu for me, and notify the restaurant’s computer if I’m hung up in traffic. It will not think about chicken cordon bleu. Its mouth will not water.

And — being my agent — it will not suggest I’ve had enough calories already today and should have salad instead.

Spot on. Sheila is looking at the elephant, not the trunk, tail, and so on.

My mistake with my first interest in RDF and the semantic web is I wanted a web of meaning. Perhaps something like this will happen in the future, but I doubt it because from my earlier attempts at getting others involved I found out something important: what I wanted was not what most people wanted. What most people want is what Sheila is describing: systems that work together seamlessly; that integrate immediately; that help us do something we couldn’t do before.

However, they don’t want these systems to do what we can do well, which is have an opinion or differentiate based on nebulous qualities such as taste and desire. We do this very well ourselves, thank you. What we need from computers is to help us sift through the raw stuff and that’s where the semantic web enters the picture.

Ultimately the semantic web consists of data stores in relational databases, hierarchical data structures, RDF, and even tags embedded into a page–anything that gets us closer to realtime data access (“What’s cooking tonight?”), ‘smart’ agents, and the restaurant’s cooking schedule integrating with our portable GPS devices (and given a rule of locations in thus and thus zone is equivalent to so many minutes from restaurant). No one data model will work for all of this, which is why we need to carve out that piece that best belongs to RDF, and stop trying to force it into a relational model replacement, or microformat killer.

(“Woman on camel comes upon man in the desert; crawling on his hands and knees, faintly crying out “Water, water.” She grabs her bladder filled with pomegranate juice and jumps down, holding the the opening to his lips. The man tastes the juice and pushes her away. “I asked for water”, he says.)

Which leads us back to the last issue of Bill’s I’m going to touch on and that is tags and other microformats. I have been one of those who scoffed at microformats, and I still do. I can see the value of tags, but whether you call them ‘keyword’ associations or categories, they’re a way of adding a description to an item, but not necessarily anything more complex. As such, they have value, but only to a point: in many social software tools such as Flickr, people use these as mnemonics more for themselves than others. Their value to a global whole is directly proportional to the amount of time a person puts into the effort, and most people don’t care that much.

Where tags work well is when a group of people get together and mutually decide on one tag to differentiate the association, such as the tags for a conference. People use the agreed on tag because it’s simple (the tag has already been provided), and they can immediately see the usefulness of such (querying on the tag returns items associated with the event).

Thus the same requirements for metadata can be defined for microformats and RDF: we have to make the effort to define such easy. We have to demonstrate value.

Now, one can say microformats make things easier, but I have a real problem with shoving everything under the sun into one single HTML attribute: class. The class attribute is defined to hold values for stylesheet settings or ‘whatever other use devised by user agents’. What we’ve done, as end users, is shove both microformat AND Ajax information into these attributes, and frankly, I don’t know how much this is impacting on and will continue to impact on these ‘user agents’.

For instance, if I have eight different terms within a single class attribute, all properly separated from each other, does Firefox look at each of these, and then spend time reviewing the stylesheet settings to see if any match? If so, and this use of class is scattered all about a biggish page, and the CSS stylesheets are quite large, this strikes me as impacting, perhaps significantly, against page load times.

We are misusing the class attribute, and that’s my biggest pushback against microformats. The same goes for the use of the class attribute with Ajax.

Now, generating an RDF output from metadata associated with a page, and then serving this up when you tack ‘/rdf/’ (or ‘/meta/’) to the end of the page URL strikes me as a superior method of providing metadata to smart agents. The browsers don’t have to deal with microformat creators and RDF/XHTML mappers converting the HTML class attribute into a CLOB, and the resulting metadata doesn’t have to be limited to whatever layout elements are provided through valid uses of HTML. I’ve never had anyone give me a reason why they feel this is wrong.

(Declarative HTML for Ajax developers is another thing, and best left for another post.)

If the pages are dynamically generated, then the metadata page can be dynamically generated. If the pages are static, when they’re created, the metadata page can be created. And if someone comes along and says something about the page being created by hand, I’m going to come back with, “Oh yeah?”

One last issue that Bill had in his list was with the syntax of RDF/XML. We get so hung up on this so frequently that I think it would be a good idea for the W3C to re-visit this one. After all, if the organization is revisiting HTML in the interest of moving forward, there is that much more reason to revisit RDF/XML.