Categories
Specs

Data Shoeology

Recovered from the Wayback Machine.

I’ve been working with data since before I left college. Before the first standard release of SQL, which makes me feel really…seasoned. Applications came and went, but data is what really mattered, no matter how fancy the programming language or development paradigm.

Every once in a while someone will ask me what’s the difference between the different data initiatives that get bandied about: relational/SQL, RDF/OWL, microformats, XML, CSV, OPML, syndication feeds, and so on. I’ve tried to respond intelligently on this subject over the years, but from the repetition of the question, I don’t think I’m succeeding.

I think what’s missing is how I’m explaining the concepts. What I need to do is put each data initiative into a familiar context–something everyone can identity with. So, I’ve decided to use shoes as a metaphor for understanding data. Think of your foot as data: how, then, could you package it?

Before computers, all data was stored in hard format. Think of a picture of a shoe: it looks good but you can’t do much with it.

Then there are the hierarchical and network databases. They’re comparable to shoes in a very packed closet, where you have to move the entire contents of the closet, first, in order to get to them.

One of the most common data stores is the relational database. These are the workhorses of the data world. You won’t find many corporate data systems that don’t make use of relational databases. Weblogs, either. From a shoe perspective, a relational database is a work shoe: plain, sturdy, well designed and crafted, and surprisingly comfortable if one disrgards the steel toe. Not just any work shoe, though. A relational database is a tie-up work shoe, where one has to lace up the front and pull, strongly, in order to attach the shoe to our foot.

work boot

Some forget to properly tie their shoe, and it becomes loose, their foot falls out and they trip and fall down. Others only single tie the shoe and sometimes the lace doesn’t come undone, but many times it does and they’re just like the person who doesn’t practice safe tying: the shoe comes undone, the person trips and falls down, and 5 million credit card customers are suddenly at risk.

Though not as sturdy as a relational database, a plain old CSV, or comma separated values, file is also quite common. A CSV file is any variation of text file where the individual pieces of data are separated from each other using commas, spaces, or whatever. In shoe parlance, a CSV file is equivalent to padding about in slippers: it’s simple, it’s easy, you feel remarkably free and unconstrained. Eventually, though, you’ll stub your toe or the slipper will wear thin and you’ll begin to think this going about in a slipper all the time is perhaps not as fun as you originally thought.

Then there are the occasions when speed is needed; the thought of running in slippers across a jagged, rocky landscape leads one to realize that slippers don’t scale, which is appropriate because neither does CSV.

Enter some of the newer initiatives. First there was object-oriented data stores. The concept sounded futuristic, but it never took off as strongly as the proponents wished. I think comfort was the factor: think of an object-oriented database as a platform shoe made of titanium. It protects the foot, is stylish, will last forever, but you wouldn’t want to walk a marathon in it.

(Come to think of it, a blister is a rather OO looking image…)

Mary Janes

Next, let’s jump into the markups. Yes markup languages are way of storing and transmitting data. When one considers that there are now billions of web pages, each with all sorts of data, one can see it’s an impressive way of storing and transmitting data.

Of course, the first forms of markup had some trouble getting acceptance because the concepts were too complex, perhaps a little too rigorous. Mary Janes and Buster Browns–the earliest markups were Mary Janes and Buster Browns. Proper. Anal. Images of anklets with little embroidered violets, velvet jumpers, slicked down hair, bow ties and plaid, and above all, manners. Very proper, and just a wee bit scary.

Then there’s HTML. Oh my, HTML was the hippie of markup, the one that let it all hang out. HTML is a flip flop made of beach grass and old tire; bright neon flowers painted on the uppers, the soles, and the dirty feet thrust into them. It was a revolution, You say you want a revolution, oh yeah.

Eventually, though, we found the lack of discipline associated with HTML sucked about as much as the communes with their religious James and Josephs squatting like toads amid bright, beautiful, and really stupid flowers.

Enter XML. The Birkenstock of markups. Birkenstock shoe

Unlike the earliest markups, XML is relatively simple and fairly easy to understand, just like with HTML. Unlike HTML, though, XML has discipline. XML seems like the best of all worlds, except one thing: XML is syntax, but XML is not model. You can store anything in XML and it can validate but it doesn’t mean that it’s a good, or universal, use of XML.

Birkenstock with tuxedo. Birkenstock with evening gown. Birkenstock for tennis. Birkenstocks while jogging. Birkenstocks for mountain climbing. Skiing. Wearing by the fire on a cold night –eventually we couldn’t swing a dead cat without hitting Birkenstocks and the same can be said of XML. It is, literally, everywhere. Instead of creating structure, chaos flowered into a thousand angle bracketed blooms.

HTML was replaced by XHTML, which washed the foot and tamed the sandal into loafers with tassels–similar to what happened with Jerry Rubin when he turned 30:

By the end, everybody had a label – pig, liberal, radical, revolutionary… If you had everything but a gun, you were a radical but not a revolutionary.

There are the syndication feeds, such as RSS 2.0 and Atom. These aren’t, technically, permanent data stores; they’re more like the booties doctors wear over their shoes before surgery. The difference between the two is the right bootie is labeled from the left in Atom; in RSS 2.0, half the fun is guessing.

VRML was a 3D modeling language that reached sudden popularity years ago and just as suddenly, died away–disco shoes, only without the trail of glitter.

I like to think of OPML, with its reliance on exactly one element and several attributes, as equivalent to owning one dress and 453 pairs of shoes.

The use of microformats, where all sorts of metadata is loaded into the class attribute, is comparable to owning one pair of shoes and 453 attachments to accessorize them.

(”Should I wear the bow? Or the diamond clasp? I know, I’ll wear both!”)

If you’ve worked with web services you’re familiar with SOAP and XML-RPC, but they’re very much like wearing a stiletto: looks good from afar but plays hell on the infrastructure. No, RDF does not go unscathed in my data shoeology, but we have to define an understanding of shoe to foot to leg to body, and then also define contexts of use, and it all gets, frankly, a little complicated.

First there’s a definition of activity, such as tennis, and then there’s the association between activity and shoe, such as tennis shoe, but then we have to add in assertions that not all tennis shoes are worn to play tennis, which means then that we have to add statements such as “tennis shoe that’s worn for walking” and “tennis shoe that’s worn for dancing”, and make sure, then, that we establish the context of the activity before making the link between the activity and type of shoe. (This all just for a tennis shoe–I haven’t even gotten into defining an OWL for ‘pumps’ or ‘boots’, yet.)

Once all this is accomplished, we must format the results into a triple form:

Tennis is an activity
Shoes are worn during an activity
Tennis shoes are a type of shoe.

From which we can infer:

Tennis shoes are worn during Tennis

But then we have to reify the statement…

…at which point barefoot looks pretty good.

firewalking