Arbitrary Vocabularies and Other Crufty Stuff

I went dumpster diving into the microformats IRC channel and found the following:

singpolyma – Hixie: that’s the whole point… if you don’t have a defined vocabulary, you end up with something useless like RDF or XML, etc
@tantek – exactly
Hixie – folks who have driven the design of XML and RDF had “write a generic parser” as their priority
@tantek – The key piece of wisdom here is that defined vocabularies are actually where you get *user* value in the real world of data generated/created by humans, and consumed eventually by humans.
Hixie – i’m not talking about this being a priority though — in the case of the guy i mentioned earlier, it was like or
Hixie – but it was still a reason he was displeased with microformats
@tantek – Hixie – ironically, people have written more than one generic parser for microformats, despite that not being a priority in the design
Hixie – url?
@tantek – mofo, optimus
@tantek –
@tantek – not exactly hard to find
@tantek – it’s ok that writing a generic parser is hard, because not many people have to write one
Hixie – optimus requires updating every time you want to use a new vocabulary, though, right
@tantek – OTOH it is NOT ok to make writing / marking up content hard, because nearly far more people (perhaps 100k x more) have to write / mark up content.
Hixie – yes, writing content should be easy, that’s clear
Hixie – ideally it should be even easier than it is with microformats 🙂
singpolyma – Of course you have to update every time there’s a new vocabulary… microformats are *exclusively* vocabularies
Hixie – there seems to be a lot of demand for a technology that’s as easy to write as microformats (or even easier), but which lets people write tools that consume arbitrary vocabularies much more easily than is possible with text/html / POSH / Microformats today
singpolyma – Hixie: isn’t that what RDFa and the other cruft is about?
Hixie – RDFa is a disaster insofar as “easy to write as microformats” goes
singpolyma – Not that I agree arbitrary vocabularies can be used for anything…
Hixie – and it’s not particularly great to parse either

Hixie – is it ok if html5 addresses some of the use cases that _are_ asking for those things, in a way that reuses the vocabularies developed by Microformats?

Well, no one is surprised to see such a discussion about RDFa in relation to HTML5. I don’t think anyone seriously believed that RDFa had a chance of being incorporated into HTML5. Most of us have resigned ourselves to no longer support the concept of “valid” markup, as we go forward. Instead, we’ll continue to use bits of HTML5, and bits of XHTML 1.0, RDFa, and so on.

But I am surprised to read a data person write something like, if you don’t have a defined vocabulary, you end up with something useless like RDF or XML. I’m surprised because one can add SQL to the list of useless things you end up with if you don’t have defined vocabularies, and I don’t think anyone disputes the usefulness of SQL or the relational data model. A model specifically defined to allow arbitrary vocabularies.

As for XML, my own experiences with formatting for eBooks has shown how universally useful XML and XHTML can be, as I am able to produce book pages from web pages, with only some specialized formatting. And we don’t have to form committees and get buy off every time we create a new use for XML or XHTML; the same as we don’t have to get some standards organization to give an official okee dokee to another CMS database, such as the databases underlying Drupal or WordPress.

And this openness applies to programming languages, too. There have been system-specific programming languages in the past, but the widely used programming languages are ones that can be used to create any number of arbitrary applications. PHP can be used for Drupal, yes, but it can also be used for Gallery, and eCommerce, and who knows what else—there’s no limiting its use.

Heck HTML has been used to create web pages for weblogs, online stores, and gaming, all without having to redefine a new “vocabulary” of markup for each. Come to think of it, Drupal modules and WordPress plug-ins, and widgets and browsers extensions are all based on some form of open infrastructure. So is REST and all of the other web service technologies.

In fact, one can go so far as to say that the entire computing infrastructure, including the internet, is based on open systems allowing arbitrary uses, whether the uses are a new vocabulary, or a new application, or both.

Unfortunately, too many people who really don’t know data are making too many decisions about how data will be represented in the web of the future. Luckily for us, browser developers have gotten into the habit of more or less ignoring anything unknown that’s inserted into a web page, especially one in XHTML. So the web will continue to be open, and extensible. And we, the makers of the next generation of the web can continue our innovations, uninhibited by those who want to fence our space in.