Categories
Specs

Data Shoeology

Recovered from the Wayback Machine.

I’ve been working with data since before I left college. Before the first standard release of SQL, which makes me feel really…seasoned. Applications came and went, but data is what really mattered, no matter how fancy the programming language or development paradigm.

Every once in a while someone will ask me what’s the difference between the different data initiatives that get bandied about: relational/SQL, RDF/OWL, microformats, XML, CSV, OPML, syndication feeds, and so on. I’ve tried to respond intelligently on this subject over the years, but from the repetition of the question, I don’t think I’m succeeding.

I think what’s missing is how I’m explaining the concepts. What I need to do is put each data initiative into a familiar context–something everyone can identity with. So, I’ve decided to use shoes as a metaphor for understanding data. Think of your foot as data: how, then, could you package it?

Before computers, all data was stored in hard format. Think of a picture of a shoe: it looks good but you can’t do much with it.

Then there are the hierarchical and network databases. They’re comparable to shoes in a very packed closet, where you have to move the entire contents of the closet, first, in order to get to them.

One of the most common data stores is the relational database. These are the workhorses of the data world. You won’t find many corporate data systems that don’t make use of relational databases. Weblogs, either. From a shoe perspective, a relational database is a work shoe: plain, sturdy, well designed and crafted, and surprisingly comfortable if one disrgards the steel toe. Not just any work shoe, though. A relational database is a tie-up work shoe, where one has to lace up the front and pull, strongly, in order to attach the shoe to our foot.

work boot

Some forget to properly tie their shoe, and it becomes loose, their foot falls out and they trip and fall down. Others only single tie the shoe and sometimes the lace doesn’t come undone, but many times it does and they’re just like the person who doesn’t practice safe tying: the shoe comes undone, the person trips and falls down, and 5 million credit card customers are suddenly at risk.

Though not as sturdy as a relational database, a plain old CSV, or comma separated values, file is also quite common. A CSV file is any variation of text file where the individual pieces of data are separated from each other using commas, spaces, or whatever. In shoe parlance, a CSV file is equivalent to padding about in slippers: it’s simple, it’s easy, you feel remarkably free and unconstrained. Eventually, though, you’ll stub your toe or the slipper will wear thin and you’ll begin to think this going about in a slipper all the time is perhaps not as fun as you originally thought.

Then there are the occasions when speed is needed; the thought of running in slippers across a jagged, rocky landscape leads one to realize that slippers don’t scale, which is appropriate because neither does CSV.

Enter some of the newer initiatives. First there was object-oriented data stores. The concept sounded futuristic, but it never took off as strongly as the proponents wished. I think comfort was the factor: think of an object-oriented database as a platform shoe made of titanium. It protects the foot, is stylish, will last forever, but you wouldn’t want to walk a marathon in it.

(Come to think of it, a blister is a rather OO looking image…)

Mary Janes

Next, let’s jump into the markups. Yes markup languages are way of storing and transmitting data. When one considers that there are now billions of web pages, each with all sorts of data, one can see it’s an impressive way of storing and transmitting data.

Of course, the first forms of markup had some trouble getting acceptance because the concepts were too complex, perhaps a little too rigorous. Mary Janes and Buster Browns–the earliest markups were Mary Janes and Buster Browns. Proper. Anal. Images of anklets with little embroidered violets, velvet jumpers, slicked down hair, bow ties and plaid, and above all, manners. Very proper, and just a wee bit scary.

Then there’s HTML. Oh my, HTML was the hippie of markup, the one that let it all hang out. HTML is a flip flop made of beach grass and old tire; bright neon flowers painted on the uppers, the soles, and the dirty feet thrust into them. It was a revolution, You say you want a revolution, oh yeah.

Eventually, though, we found the lack of discipline associated with HTML sucked about as much as the communes with their religious James and Josephs squatting like toads amid bright, beautiful, and really stupid flowers.

Enter XML. The Birkenstock of markups. Birkenstock shoe

Unlike the earliest markups, XML is relatively simple and fairly easy to understand, just like with HTML. Unlike HTML, though, XML has discipline. XML seems like the best of all worlds, except one thing: XML is syntax, but XML is not model. You can store anything in XML and it can validate but it doesn’t mean that it’s a good, or universal, use of XML.

Birkenstock with tuxedo. Birkenstock with evening gown. Birkenstock for tennis. Birkenstocks while jogging. Birkenstocks for mountain climbing. Skiing. Wearing by the fire on a cold night –eventually we couldn’t swing a dead cat without hitting Birkenstocks and the same can be said of XML. It is, literally, everywhere. Instead of creating structure, chaos flowered into a thousand angle bracketed blooms.

HTML was replaced by XHTML, which washed the foot and tamed the sandal into loafers with tassels–similar to what happened with Jerry Rubin when he turned 30:

By the end, everybody had a label – pig, liberal, radical, revolutionary… If you had everything but a gun, you were a radical but not a revolutionary.

There are the syndication feeds, such as RSS 2.0 and Atom. These aren’t, technically, permanent data stores; they’re more like the booties doctors wear over their shoes before surgery. The difference between the two is the right bootie is labeled from the left in Atom; in RSS 2.0, half the fun is guessing.

VRML was a 3D modeling language that reached sudden popularity years ago and just as suddenly, died away–disco shoes, only without the trail of glitter.

I like to think of OPML, with its reliance on exactly one element and several attributes, as equivalent to owning one dress and 453 pairs of shoes.

The use of microformats, where all sorts of metadata is loaded into the class attribute, is comparable to owning one pair of shoes and 453 attachments to accessorize them.

(”Should I wear the bow? Or the diamond clasp? I know, I’ll wear both!”)

If you’ve worked with web services you’re familiar with SOAP and XML-RPC, but they’re very much like wearing a stiletto: looks good from afar but plays hell on the infrastructure. No, RDF does not go unscathed in my data shoeology, but we have to define an understanding of shoe to foot to leg to body, and then also define contexts of use, and it all gets, frankly, a little complicated.

First there’s a definition of activity, such as tennis, and then there’s the association between activity and shoe, such as tennis shoe, but then we have to add in assertions that not all tennis shoes are worn to play tennis, which means then that we have to add statements such as “tennis shoe that’s worn for walking” and “tennis shoe that’s worn for dancing”, and make sure, then, that we establish the context of the activity before making the link between the activity and type of shoe. (This all just for a tennis shoe–I haven’t even gotten into defining an OWL for ‘pumps’ or ‘boots’, yet.)

Once all this is accomplished, we must format the results into a triple form:

Tennis is an activity
Shoes are worn during an activity
Tennis shoes are a type of shoe.

From which we can infer:

Tennis shoes are worn during Tennis

But then we have to reify the statement…

…at which point barefoot looks pretty good.

firewalking

Categories
Specs

Fire the W3C

Recovered from the Wayback Machine.

I have to disagree with Dare on his recent post about the troubles at the W3C.

I had to work, quite extensively at times, with the W3C working group related to RDF when I was writing Practical RDF. There were times when I thought I had walked into a lab and was chief rat. In particular, I was concerned about the R & D aspect of the work: where were the ‘practical’ people?

It was only later, as I saw RDF hold up under the challenges that I realized that the model has to be mathematically vetted before practical use could be made of it. For better or worse, the only people willing to take on this kind of effort, and having the background, are the R & D, academic types of folks. They’re not easy to live with at times, but they have more background for this work then the average person.

I know that the W3C has had problems. I do think it needs to connect more with the user base. I agree with Molly that it desperately needs to be diversified. But what are the alternatives?

Dare mentions relying on defacto standards. Would that be like HTML? We’re only now starting to pull ourselves out of the nightmare of inconsistent HTML markup and elements such as BLINK, or worse, FONT.

Dependending on proprietary standards such as RSS? But certain aspects of this syndication feed are imprecise, and this imprecision leads to confusion. All you need do is link two enclosures to see this for a fact, and this is only one of the more obvious. Look also at the fact that RSS has political overtones to it that will always cloud it use. Heck, the one organization ‘picked’ to help document it, was fired by the person who picked them! Excuse me, but exactly how are the W3C efforts worse?

As for the microformats community, are we forgetting nofollow? Well if not that, then ask ourselves something: what purpose does hAtom solve? Considering that the generation of the page is most likely from a data set and is dynamic, then how is hAtom any better than just generating Atom from the same data?

Lately I’ve been really looking at microformats and I can understand the utility of some–such as calendar and reviews–focusing on using specific markup to define business data. Others, though, look to me like an exercise in pushing data around just to do so. Have you ever played with dominoes? Where you line them up just right and then push them down? It’s cool a couple of times, but most people get bored and move on. Some (not all) of the microformat effort reminds me of dominoes.

More importantly, there is no real organization unassociated with a specific company driving out microformats.

The W3C has work to do. But I’d rather have the W3C, than not.

Categories
Specs

The importance of standards

Recovered from the Wayback Machine.

Nat at O’Reilly Radar writes on the importance of standards in web page design, making me very happy. He wrote:

The point of the standards is not just to ensure that browsers can display the pages. The standards also ensure the pages form a platform that can be built upon; a hacked-together platform leads to brittle and fragile extensions.

That’s the problem with some of the Ajax libraries, such as Dojo: a belief that some of these standards aren’t all that important and can be disregarded. A page that uses standard CSS and XHTML can easily incorporate change, as well as integrate new functionality. The use of standard XHTML and CSS is never going to go out of style.

The one paragraph with which I disagree, somewhat, is:

Between Google and Yahoo!’s work on in-page widgets, the spreading effect of microformats, and the rise of the importance of accessibility, we’re finally getting rewards for standards-compliance.

The in-page widgets from Yahoo and Google are nifty, but I don’t see them as an important end-result of standards compliance. I do agree, hugely, on the growing acknowledgement of the importance of accessibility, but microformats, (no offense Kevin), are largely unknown, and most are not based on an independent standards effort I’m aware of.

Aside from this one paragraph, which struck me as a bit buzz wordy, overall I can agree, strongly, with the gist of the post.

Categories
Specs XHTML/HTML

Ambiguous Specifications do not make Good Technology

Recovered from the Wayback Machine.

There is a belief that if it weren’t for the fact that the earliest versions of HTML were unstructured–full of proprietary idiosyncrasies and ill-formed markup indulged by too-loose browsers–the web wouldn’t have grown as fast as it did. Somehow, we’ve equated growth with bad and imprecise specifications rather than the more logical assumption that the growth was due to interest in an exciting new medium.

As such, we’ve carried forward into this new era in web development an almost mythical belief in bad specifications. If we wish to have growth, we think to ourselves, we mustn’t hinder the creative spirit of the users by providing overly rigorous specifications. Because of this belief, we’re still battling ill-formed, inaccessible web pages created by a legion of web page designers who picked up some pretty bad habits: namely the use of deprecated attributes and proprietary elements, as well as the use of HTML tables for everything. Well, everything that isn’t covered by the use of non-standard and proprietary Javascript–use of which results in the annoying messages that one needs a specific browser, or worse, a specific operating system in order to see this Wonderful Thing. Go away until you’re properly equipped.

What we’re finding now with web page designers today, whether they’re amateur or professional, is that it’s just as easy to learn how to do things the right way, as the wrong. What’s important is to provide good, clear documentation, as well as good, clean examples. Contrary to some expectations, adherence to standards, and precise specifications have not killed the spirit of creativity.

In the end, rather than aid the growth of the web, bad specifications slowed it down as a new generation of web pages had to be created out of the ashes generating by burning the old.

Learning how to do things right has such rewards, too. It’s knowing that your page looks good in all operating systems and most browsers; that people can easily navigate your site; that there are a hundred new tools and toys you can use now because you’re using precise and structured markup. Being able to validate a page isn’t a matter of dumping a fairly useless sticker into a sidebar; it means being able to drop in a Google map, or add in-place editing, or automatically pull your calendar out of the page, or any number of wonderfully useful and fun innovations.

We still continue with this belief, though, that to standardize or embed precision into a specification is to stifle the creative juices of the consumer of the specification: whether they be developer, designer, or end-user. Why? What can possibly lead anyone to believe that you can create good technology out of a bad specification?

Some would point to RDF and say that this is a case of a very precise specification that has not led to quick adoption. However, it isn’t surprising that there isn’t billions of RDF/XML documents scattered here and there, and it has nothing to do with the precision of the specification. Some folk didn’t, and still don’t, like the look of the externalized syntax of RDF; others felt that semantics should arise from existing elements; and still others just don’t see the need for it, and won’t until you give them an application that demonstrates this directly for them.

Oh, there’s some pieces of the RDF model we might do without, but precision is not one of them. I look at the precision of the specification of RDF with nothing but relief. I know that the work I do now with RDF follows a model that’s been carefully defined, intimately documented, and rigorously tested. I can trust the model, and know that the documents I create with RDF today will parse just as successfully as documents I’ll create five years from now; more importantly, knowing without a doubt it will mix with your data modeled in RDF.

That’s why I look with some confusion at the backlash against efforts to clarify the RSS 2.0 specification. There is no doubt–none whatsoever–that the RSS 2.0 specification, as currently written, is ambiguous; from what we’re hearing now, in comments and email lists, it is being kept deliberately so. I don’t understand this. This would be no different than to ask Microsoft not to follow standardized use of CSS in the new IE 7.x. Why on earth would anyone want this?

I am just a simple woman who works in technology. Perhaps one of you can explain it to me in such a way that I can understand.

I wrote on the ambiguity in RSS 2.0 as regards to enclosures here, and actually had to modify Molly Holzschlag’s weblog software (WordPress) because her posts with enclosures would cause tools such as Bloglines to break. These are two very popular tools; hence, the ambiguity in RSS 2.0 specification does cause problems. This is a proven fact that no amount of marketing and cheerleading can obfuscate.

Throw as much money as you want at it; write the most glowing reviews; get prestigious names to exult its beauty and power; seek to crush a non-existent enemy if you must–it is still not ‘good technology’. It may have damn good marketing, and lots of dough invested in it, and even have widespread use–but it is not good technology.

I am puzzled as to how anyone, particularly those who work in technology, could say otherwise. I await enlightenment.

Categories
Specs Standards

We Interrupt This Commercial Break with a Word about RSS

Recovered from the Wayback Machine.

It had all the makings of a true Real Life Drama:

In an effort to defuse what could only be termed mutiny in the ranks, otherwise known as the ‘Atom Effect’, Dave Winer turns the copyright of the RSS 2.0 specification over to Harvard, attaching a Creative Commons License reflecting something about share and share alike. The nobility of the act stuns people–well other than those who questioned how much of the specification he was entitled to claim as his copyright. Oh, and those people who kept insisting that Creative Commons licenses were not designed to cover something such as software or specifications.

Accepting the accolades as only what he was due, the Big Dog then anoints a committee of three to watch over our sleeping beauty, the little syndication feed that was. But these caretakers take little care, and run for the hills–whether of gold or sanity, only they can say. Poor little feed lies there, alone and vulnerable, while its bastard cousin, Atom, is fed care and attention and grows up to be a big, strapping specification that can bite through ambiguity and confusion, like Jaws bit through surfer girls.

It is then, when our precious little orangy bundle of joy is at its most aloneness that even Bigger Dogs enter the picture: Apple and Microsoft, seeing the light (or, more likely, seeing a potential new profit stream) embrace RSS and in the process, fracture, bruise, and even somewhat maim it. “The problem is,” the masses cry out, “the specification is too open, too ill-defined.”

Enter now, a new hero: Rogers Cadenhead. Stalwart defender of Popish dignity and bearer of thick, wavy, locks of silver. Big Dog taps Rogers on the shoulder with his sword and says to him, “You shall be my defender, the RSS Champion”.

–curtain closes for intermission, while scenery is changed–

Now enters the story a host of new players: 8 new keepers of the RSS flame to support our champion. Their task? They come not to destroy RSS 2.0 but to praise it. They seek to clear the confusion, to cut away the darkness that surrounds this neurotic little bundle of joy. Where before there were endless questions of interpretation, and breaking tools right and left, the Nine Champions of the Rin…urh, sorry, wrong movie, scratch that–the Nine Champions of the Specification will make it all better!

(Loosely translated for the prop department: They come to change little RSS 2.0’s diaper, because it had done a doo-doo and now stinks to high heaven. )

But hark? What’s this? What’s this rumble in the distance. Oh, no! It’s Big Dog, and he’s got his lawyer!

But the Lawyer brings no books or suits or habius, or even corpses. He opens the door long enough to make statement, and the moves on to other things that come ten by ten. The statement? Nothing has changed on RSS 2.0. Harvard still owns it, but the community may do what they will within the bounds of a Creative Commons license. Leading to, (now pay attention, this is going to go fast)…

A community, which now it seems, must absorb the Nine Champions of RSS 2.0, because they have been banished from the round table that was the RSS Advisory Board. A Board that is no more, created by a man who resigned from it, and who gave up any intellectual ownership of the specification, but still retains ownership of the specification, to wit, making decisions about who is or is not on a board that no longer exists for a technical specification given intellectual property rights by a University that had little or no involvement with the specification, under a license that has little or not applicability to specifications, mainly created for songsters and photogs and other artsy types AND which has little or no legal standing within the rules of the land because there are no rules of the land when great bodies of water separate most of it.

Have no fear, though, as our hero, the RSS Champion can see his way clear through this confounding maze. I will not waiver he cries. “While my heart beats and I draw breath, I will not be swayed from my sacred duty. Nay! Though you may torture me with unclosed tags, and malformed dates, I will hold true to my task. To the end. To the bitter end!” His dedication shines so brightly, members of the advisory board are heard to murmur ” “.

His passion even moves former foes–those who had vowed to pull RSS 2.0 from its throne and install one of their own choosing (see reference re above: Atom, Effort of). They are so moved, they pull their validators from their leather sheaths and hold them high in support and salute, crying out, Hey, cool.

All are not in one accord, though, for Big Dog is angered, mightily angered. Why? To know this is to know one of the universe’s least interesting facts. All we need know is that Big Dog is angered at the Champion and the Nine defenders of RSS 2.0, and so he sends out of the darksome mists an imp to torment both our hero and his new allies.

See? You can’t make this stuff up. And a few years ago, the battles between these opposing forces would have received much attention and the thundering of the post and counter-post would have shaken even the political webloggers who might–might–take time out from verbally eviscerating each other to take notice.

But there was a party put on by a player, to celebrate a book authored by other players, with words about how to become players, sponsored by other groups hoping to become players, drinking wine pushed by a hopeful, attended by 500 or so close friends, each with a startup, a product, an agenda, or, at minimum, a weblog.

And no one cared anymore about old enemies and ancient battles, the hard work of our hero and his allies. and other daring do and RSS 2.

The end.