November 12th, 2006

Danny Ayers and Bill de hÓra got into a bit of back and forth on RDF and Bill's response reminded me again of the blind wise men and the elephant. Each person describes a different creature when asked to describe the whole from just the part they touched: one touching the leg assumes the creature is a large, stumpy creature; the tail, skinny and flexible and so on.

Bill lists out several areas where attention paid by the SemWeb group could increase overall adoption of RDF. In turn:

Better communication, algorithms, tools, etc. to map RDF into relational databases. How can we do so and increase efficiency. This one has actually happened, but perhaps not been as marketed. I know that in the Java world, the Jena folks have put a great deal of thought and energy into efficient RDF storage opimized for SPARQL access (do we use Sparql now?). As Danny stated, tests have been made in systems with millions and millions of triples stored, and relatively quick access of same using Sparql–as quick as a relational query against joined relational tables.

I think this is one that has to go back to Bill: exactly what language/API or data store solution did you use that resulted in such slow performance?

Bill's issues associated with integration was, at least to me, a bit confusing. I don't think one can say to companies: use RDF thus and thus and you'll save so much money, because one can't say the same about the relational model or the network model or even the old hierarchical model. However, give us a specific task, and we might be able to point out where RDF can be used effectively. Or perhaps we'll say, no, a relational model is better for that purpose. We might even recommend hierarchical–no one model is perfect for every use.

Where Bill seems to be focused is on data interchange: moving data from tool to tool. Well, now that is one I am very familiar with and this is one that can be extremely effectively managed by RDF as much, or more, than a specifically defined XML vocabulary.

Think of a hub-and-spoke system, whereby heterogeneous applications all access data from a common core, which acts as a buffer to prevent hard coded dependencies between these applications. One can define an XML format to be used between each application and the buffer, but this means that each application has to create XML to whatever format is needed for each buffer. Commercial applications won't do this, so this means the company has to create a mapping–a layer between the application's API and the system core, which adds to the overall complexity of the system.

The company developers have to do this not because the business model might differ in whatever degree between the core and the application, but because XML is physically dependent, and this means that the mapping has to be just so before the data can even be processed in order to be merged. In other words: there's a pure tech dependency on the structure of the XML that transcends even the business model.

RDF eliminates this. A statement is a statement is a statement. It removes positional dependencies, very similar to how the relational model removed such when it was adopted over the older network or hierarchical models. If all the spokes 'spoke' RDF (sorry, couldn't resist), then all that the hub/core would need to do is map each to the other, and this doesn't even require any additional coding if such business mapping is defined using OWL. All that would be required to add a new application that produces an RDF conversion of its data export/service response is the addition of rules to the core's rules database using OWL.

You can't get there with relational databases. I know, I worked with EDI (Electronic Data Interchange) for years. You also can't get there with XML because there is no rules mechanism inherent in XML. Of course not: why over-complicate a nice, clean syntactic format? XML wasn't meant to be a rules database. Relational wasn't meant to be a rules database (but does very nicely for pre-defined, isolated business model use). RDF, on the other hand, eats rules for lunch.

Think of Scotty: Use the right tool for the job.

Moving on to Bill's list, his big item is ORM (Object-Relational Mapping) and widgets, and it's true, I've not necessarily worked with an object-relational mapping or direct form to RDF widgets. As such, I can't respond to this one, other than to say if people have created these, I'd like to see them myself.

I think what's a bigger issue, though, is that Bill demonstrates the wise men and the elephant conundrum: Bill has certain items he considers important, including one that I wouldn't necessarily consider important. He sees ORM as bigger than syndication feeds, yet for all my tweaking of syndication feed folks, I consider feeds probably one of the most significant advances in technology in years. Why? Because it has had a major impact on relatively standard behavior: how we get access to current news.

That's the deal: Bill is thinking purely as a developer working with specific problems, and wants to see how RDF can help him. If we continue to get caught up in specific tech-to-tech comparions, RDF will continuously be placed into a position of loss: being compared against specific uses of technology for which it really wasn't designed, and when it comes up short, which of course it will, being discounted in such a way that it's not considered for use where it would be effective.

What the RDF community needs to do is go beyond just the technology-to-technology mapping and look at the overall picture: where does the semantic web community want to go, and what piece of this best fits RDF?

Sheila Lennon writes about the John Markoff Times' article on common sense and Web 3.0 and gives her interpretation of what she sees as the semantic web:

Real world: When restaurants are online in realtime (yet to happen), my computer could display Providence restaurants serving cordon bleu tonight at what prices, ask me to choose one (around when?), then make a reservation, reserve a portion of chicken cordon bleu for me, and notify the restaurant's computer if I'm hung up in traffic. It will not think about chicken cordon bleu. Its mouth will not water.

And — being my agent — it will not suggest I've had enough calories already today and should have salad instead.

Spot on. Sheila is looking at the elephant, not the trunk, tail, and so on.

My mistake with my first interest in RDF and the semantic web is I wanted a web of meaning. Perhaps something like this will happen in the future, but I doubt it because from my earlier attempts at getting others involved I found out something important: what I wanted was not what most people wanted. What most people want is what Sheila is describing: systems that work together seamlessly; that integrate immediately; that help us do something we couldn't do before.

However, they don't want these systems to do what we can do well, which is have an opinion or differentiate based on nebulous qualities such as taste and desire. We do this very well ourselves, thank you. What we need from computers is to help us sift through the raw stuff and that's where the semantic web enters the picture.

Ultimately the semantic web consists of data stores in relational databases, hierarchical data structures, RDF, and even tags embedded into a page–anything that gets us closer to realtime data access ("What's cooking tonight?"), 'smart' agents, and the restaurant's cooking schedule integrating with our portable GPS devices (and given a rule of locations in thus and thus zone is equivalent to so many minutes from restaurant). No one data model will work for all of this, which is why we need to carve out that piece that best belongs to RDF, and stop trying to force it into a relational model replacement, or microformat killer.

("Woman on camel comes upon man in the desert; crawling on his hands and knees, faintly crying out "Water, water." She grabs her bladder filled with pomegranate juice and jumps down, holding the the opening to his lips. The man tastes the juice and pushes her away. "I asked for water", he says.)

Which leads us back to the last issue of Bill's I'm going to touch on and that is tags and other microformats. I have been one of those who scoffed at microformats, and I still do. I can see the value of tags, but whether you call them 'keyword' associations or categories, they're a way of adding a description to an item, but not necessarily anything more complex. As such, they have value, but only to a point: in many social software tools such as Flickr, people use these as mnemonics more for themselves than others. Their value to a global whole is directly proportional to the amount of time a person puts into the effort, and most people don't care that much.

Where tags work well is when a group of people get together and mutually decide on one tag to differentiate the association, such as the tags for a conference. People use the agreed on tag because it's simple (the tag has already been provided), and they can immediately see the usefulness of such (querying on the tag returns items associated with the event).

Thus the same requirements for metadata can be defined for microformats and RDF: we have to make the effort to define such easy. We have to demonstrate value.

Now, one can say microformats make things easier, but I have a real problem with shoving everything under the sun into one single HTML attribute: class. The class attribute is defined to hold values for stylesheet settings or 'whatever other use devised by user agents'. What we've done, as end users, is shove both microformat AND Ajax information into these attributes, and frankly, I don't know how much this is impacting on and will continue to impact on these 'user agents'.

For instance, if I have eight different terms within a single class attribute, all properly separated from each other, does Firefox look at each of these, and then spend time reviewing the stylesheet settings to see if any match? If so, and this use of class is scattered all about a biggish page, and the CSS stylesheets are quite large, this strikes me as impacting, perhaps significantly, against page load times.

We are misusing the class attribute, and that's my biggest pushback against microformats. The same goes for the use of the class attribute with Ajax.

Now, generating an RDF output from metadata associated with a page, and then serving this up when you tack '/rdf/' (or '/meta/') to the end of the page URL strikes me as a superior method of providing metadata to smart agents. The browsers don't have to deal with microformat creators and RDF/XHTML mappers converting the HTML class attribute into a CLOB, and the resulting metadata doesn't have to be limited to whatever layout elements are provided through valid uses of HTML. I've never had anyone give me a reason why they feel this is wrong.

(Declarative HTML for Ajax developers is another thing, and best left for another post.)

If the pages are dynamically generated, then the metadata page can be dynamically generated. If the pages are static, when they're created, the metadata page can be created. And if someone comes along and says something about the page being created by hand, I'm going to come back with, "Oh yeah?"

One last issue that Bill had in his list was with the syntax of RDF/XML. We get so hung up on this so frequently that I think it would be a good idea for the W3C to re-visit this one. After all, if the organization is revisiting HTML in the interest of moving forward, there is that much more reason to revisit RDF/XML.

Comments
1
Scott Reynen - 1:12 pm 11/12/2006

I'm not clear on what you're talking about with "metaformats." Is that just another word for microformats, or are you including things like RDF-in-HTML under that umbrella or is this something new?

2
Bill de hOra - 1:48 pm 11/12/2006

"Bill demonstrates the wise men and the elephant conundrum: Bill has certain items he considers important, including one that I wouldn't necessarily consider important."

All this started with a claim (mine) that the semweb community don't listen to developers.

"I don't think one can say to companies: use RDF thus and thus and you'll save so much money, because one can't say the same about the relational model or the network model or even the old hierarchical model."

Tell that to Tim BL. He urges export of data into RDF; or did up to two years ago. I understand the reasoning is economic.

"RDF, on the other hand, eats rules for lunch."

Nope - RDF triples in this case are notation not denotation. All the rules stuff I've stuff seen are fullblown semantic extensions (barely) of RDF. OWL itself isn't really RDF - there's a compormise hack whereby they shared the notion of a class, but you can't pump OWL into an RDF reasoner and have the OWL inferences made. I hear this conflation a lot from semwebbers; I wish they'd go and study AI/KR/FOL and stop muddying things. OWL as best I can tell doesn't need RDF at all.

"exactly what language/API or data store solution did you use that resulted in such slow performance?"

I put 10M triples into Jena/kowari/rdflib a few years ago. Too slow to be usable. That's 2M records for a domain based approach if you buy the tenfold increase argument. An RDBMS like MySQL will eat 2M records for breakfast. I ended up using a hybrid model - blobs and tags. This is something I am looking to revisit tho'.

"If all the spokes 'spoke' RDF (sorry, couldn't resist), then all that the hub/core would need to do is map each to the other, and this doesn't even require any additional coding if such business mapping is defined using OWL."

Well, all the hubs *don't* speak RDF. I've watched KR/GOFAI based technogies for most of my adult life. This is where they all fall down - assuming a shared interlinguaga, a shared semantics exists. It's *the* classic GOFAI problem - everything is fine for a lab scenario, but in the field the assumptions turn out to be nonsense and the technology turns out to be a letdown. In the field bad assumptions like that tend to kill projects, or make them very expensive. The successful integration projects I've worked on do not make that assumption, for *any* technology, because it's a fallacy - it's the most important fallacy of systems integration. They make the assumption that the word is fundamentally hetereogenous, messy, and non-interoperable and you will be doing work to get the data into shape. That's exactly my point about organic growth - show me how RDF can be put onto one spoke for starters.

I guess here's the problem. You're not telling me anything I don't know about RDF, or its capabilities. The wise men and elephant foil is witty, but it's only another way to say "you don't get it". Which gets to my criticism of the semweb community that started this off. What's to get?

3
Shelley - 2:06 pm 11/12/2006

Scott, no, just got into the subject and mixed the meta and micro.

Bill, I have to disagree on OWL not needing RDF. It may not need the syntax, but I believe it needs the underlying model.

You mentioned about putting 2M triples into Jena a couple of years ago. I believe the Jena team has completely redesigned the underlying infrastructure since then, and tests have been made with over 20M triples.

As for the hub-and-spoke, I guess one could also say: show me how straight XML could be used between different commercial applications that don't have an agreed on XML vocabulary, and then I'll come back with how one could be designed with RDF.

Actually, I wasn't saying "You don't get it" with the elephant analogy. People have different needs and want to see how RDF maps to those needs. If the RDF community spends all of its time trying to map to each and every use of tech there is, it will never make it out of the lab.

I don't care what people have said in the past, I don't see RDF replacing relational databases, or XML, or any other technology that really has its own place. I do think that RDF has a place of its own, and what we need to focus on, is what is this place.

Danny's original post on this in regard to the CEO of MySQL really wasn't being nasty about the MySQL CEO–he really did mean, hey! Work has been in this area, why are you reinventing the wheel? He sees this particular use as the place for RDF, and I agree. This doesn't replace MySQL, but complements it.

As for the old KR/GOFAI, I really think that we need to look at RDF and even future semantic web efforts as distinct from the old AI efforts in the past. I think the goals are completely dissimilar.

As for the RDF folks not assuming the world is messy and the data heterogeneous, not a bit of it: we think it's jumbled as hell. I like to think, though, that it is this disorganization that makes us most excited.

4
Bill de hOra - 4:10 pm 11/12/2006

"Now, generating an RDF output from metadata associated with a page, and then serving this up when you tack '/rdf/' (or '/meta/') to the end of the page URL strikes me as a superior method of providing microdata to smart agent"

Agree it's a good idea.

One of the things I like about uF is it cuts down on transformation work. I do CMS/publication work and smart use of class/span/div means you can defer some of the usual transform+publish work into CSS.
Another thing about uF worth mentioing - non technical people, that have to decide about what software to buy like them. They see XML, it's just another IT project. They see web page, that's a different proposition. uF are not optimised for engineering.

"we need to look at RDF and even future semantic web efforts as distinct from the old AI efforts in the past. I think the goals are completely dissimilar."

It's 2006/7, maybe I can get my head around that. But for a few years there, the semweb became an AI/KR project. I still see a lot of talk about behind the firewall ontology engineering and description logics, when basic field work like tagging and roundtripping and presentation seems to wallow.

"You mentioned about putting 2M triples into Jena a couple of years ago."

10M; 2M is roughly what I think the table size would be with a domain/entity model approach, ie an order of magnitude. As I've said elsewhere; I'm two years behind the project trunks, so the numbers could well be up since 2003/4.

I'm glad you agree about ORM/Widgets. Cos assuming you can run your backend on RDF machinery, you still have to schlep the data in and out of the applicaton and presentation code the web runs on. RDF needs to be highly constrained for that to work (it needs a type mapping for forms; it needs an object/api mapping for biz logic).

"show me how straight XML could be used between different commercial applications that don't have an agreed on XML vocabulary, and then I'll come back with how one could be designed with RDF."

I could really do with seeing how it can be done in RDF wthout end to end agreement on RDF+semantics+toolsets. I know it can be done via pipelines+code (as I've done it in my work). This is why I abhor the XML syntax; if it was stable enough for use outside an RDF toolchain, grafting RDF onto existing systems would be much easier. My point above about presentation/application layers is a prime example of that. You can't integrate without syntax transformation :)

"As for the RDF folks not assuming the world is messy and the data heterogeneous, not a bit of it: we think it's jumbled as hell"

This is worth being precise over. I know the RDF model caters for messiness when it comes to modelling; that's not what I'm talking about. The world I'm talking about is technology - physically deployed systems, current technologies, past technologies, fads, frameworks and formats. Where you have to live as a developer.

5

[…] Speaking of meaning, I don't think that what we get from semantic web technologies is shared meaning. What we want from the next generation of the web or even this generation of it is action. We need it to handle more of the tedious details of life. Sheila Lennon points out where this morning's NY Times article got it wrong. It's not about meaning. It's about software agents who can do our bidding and remove some of the friction from managing our days. The semantic web provides the foundation of this because it allows automated agents to tell a dental office from a chiropractor from a new age guru and then read their appointment books. It relies on meaning in a limited sense only in order to complete defined tasks. As Shelley says, people aren't looking for meaning, at least not from RDF. […]

Thanks to all those who have contributed to the discussion. Comments are now closed, but you can contact the author of the post directly.