Categories
Semantics

Good Enough

Recovered from the Wayback Machine.

Mark Pilgrim does not believe in the Semantic Web. He believes Semantics is hard; that the syntax for the Semantic Web is laughably complex. Mark wants to stay with the “…simple but relatively well-defined semantics of HTML.”

HTML is good enough for Mark, and I say that’s great, because no one wants to force the Semantic Web on Mark.

But HTML is not ‘good enough’ for me. HTML has pre-defined elements and I can’t add to these. HTML comes with a lot of baggage from the past, and I don’t want this. And HTML is primarily about presentation, and I’m not necessarily interested in this outside of my own web pages. Don’t mistake me: I’m not out to re-create the world, or provide tools that allows one to cut through the bullshit and drill directly to the truth. All I want is a way of defining data that is consistent, using a commonly occurring syntax with pre-existing tools that can parse that syntax.

I’ve worked with data since day one of my professional life. I wrote applications that traversed billions of lines of code from Peace Shield in order to populate a data dictionary. I was lead Information Repository modeler for Boeing Commercial. I helped the old Oracle Case tools people design their products. I’ve worked with PDES and POSC and other organizations, to find a way to define data so that it was interoperable between organizations without having to re-negotiate protocols. And I was looking for a magic interoperability protocol long before the web. It started with EDI, but EDI wasn’t good enough.

SGML didn’t work because, bluntly, we didn’t think about using it. HTML didn’t work because HTML was/is about web pages. XML didn’t work because there was no meta-data structure associated with the markup language. Even within RDF there are other serialization formats that aren’t ‘good enough’ for me. Mark points to Aaron Swartz’s RDF Primer that focuses on N3 notation. Aaron really doesn’t care for RDF/XML; N3 notation is ‘good enough’ for Aaron. But it’s not good enough for me.

RDF/XML, with its metadata structure (RDF) paired with a common syntax (XML), is a start on being ‘good enough’ for my needs.

The point though, is that for each of us there are technologies that aren’t ‘good enough’, and you spend your time finding ways to improve or expand or correct the technology until it is ‘good enough’. To Mark, this is improving how we use HTML, which is comendable. But to me, it’s finding ways to use RDF/XML and in the process explain RDF/XML so that others might also find some uses for it. I hope this is also seen as comendable.

Mark’s discussion about Semantic web and HTML is a response, in part, to Dare Obasanjo, who writes:

 

Given that the W3C thinks XML is the basis for RDF and the Semantic Web it seems the general direction going forward is to move towards replacing a WWW full of HTML documents to one full of XML documents.

If you are for the Semantic Web, you are for an XML Web not for an HTML one.

 

(I sometimes think that the W3C is its own worst enemy. So many noble goals, based on so many impracticable ideas. We keep telling them and telling them: webbies just want to have fun, but they keep pushing back with the search for truth, and a better way of life.)

Reading Dare’s comment, I can see why Mark feels that technologies such as RDF/XML are being pushed on him. I can see why he pushes back with:

 

RSS 0.91 is the simplest and most popular of all the RSS formats, it’s one of the simplest XML-based formats you’ll ever find, and 10% of the world’s RSS feeds are still invalid—mostly due to XML formatting rules (escaping ampersands, character encoding issues) that aren’t even RSS-specific. And you want to “move towards replacing a WWW full of HTML documents to one full of XML documents”? Are you sure? Because realistically, all you’ll manage to do is replace a morass of bloated, poorly written, invalid HTML documents with a morass of bloated, poorly written, invalid XML documents. And to tease any meaning at all out of these “semantic” documents, you’ll spend your days writing ultra-liberal parsers to parse invalid XML (or, God help you, invalid RDF/XML), and you’ll spend your nights and weekends decrying “the new generation of tag soup” on XML-DEV.

 

Dare’s comment, and the W3C esoteric ideals aside, isn’t that what the move towards XHTML is all about? Moving towards valid and well written XML documents that are based on the HTML vocabulary? Isn’t that the whole point of technologies such as XHTML and CSS: to replace those …bloated, poorly written, invalid HTML documents? To realize the full potential that started with HTML, before we got sloppy?

Innovation and improvements in technology don’t come about because technology is ‘good enough’. They come about because technology is full of holes and no matter what we do we’ll never plug all of them. But we’ll keep trying and we’ll keep improving and in the process, we’ll discover new and exciting technologies, and we start the process all over again.