Categories
Specs

Data Shoeology

Recovered from the Wayback Machine.

I’ve been working with data since before I left college. Before the first standard release of SQL, which makes me feel really…seasoned. Applications came and went, but data is what really mattered, no matter how fancy the programming language or development paradigm.

Every once in a while someone will ask me what’s the difference between the different data initiatives that get bandied about: relational/SQL, RDF/OWL, microformats, XML, CSV, OPML, syndication feeds, and so on. I’ve tried to respond intelligently on this subject over the years, but from the repetition of the question, I don’t think I’m succeeding.

I think what’s missing is how I’m explaining the concepts. What I need to do is put each data initiative into a familiar context–something everyone can identity with. So, I’ve decided to use shoes as a metaphor for understanding data. Think of your foot as data: how, then, could you package it?

Before computers, all data was stored in hard format. Think of a picture of a shoe: it looks good but you can’t do much with it.

Then there are the hierarchical and network databases. They’re comparable to shoes in a very packed closet, where you have to move the entire contents of the closet, first, in order to get to them.

One of the most common data stores is the relational database. These are the workhorses of the data world. You won’t find many corporate data systems that don’t make use of relational databases. Weblogs, either. From a shoe perspective, a relational database is a work shoe: plain, sturdy, well designed and crafted, and surprisingly comfortable if one disrgards the steel toe. Not just any work shoe, though. A relational database is a tie-up work shoe, where one has to lace up the front and pull, strongly, in order to attach the shoe to our foot.

work boot

Some forget to properly tie their shoe, and it becomes loose, their foot falls out and they trip and fall down. Others only single tie the shoe and sometimes the lace doesn’t come undone, but many times it does and they’re just like the person who doesn’t practice safe tying: the shoe comes undone, the person trips and falls down, and 5 million credit card customers are suddenly at risk.

Though not as sturdy as a relational database, a plain old CSV, or comma separated values, file is also quite common. A CSV file is any variation of text file where the individual pieces of data are separated from each other using commas, spaces, or whatever. In shoe parlance, a CSV file is equivalent to padding about in slippers: it’s simple, it’s easy, you feel remarkably free and unconstrained. Eventually, though, you’ll stub your toe or the slipper will wear thin and you’ll begin to think this going about in a slipper all the time is perhaps not as fun as you originally thought.

Then there are the occasions when speed is needed; the thought of running in slippers across a jagged, rocky landscape leads one to realize that slippers don’t scale, which is appropriate because neither does CSV.

Enter some of the newer initiatives. First there was object-oriented data stores. The concept sounded futuristic, but it never took off as strongly as the proponents wished. I think comfort was the factor: think of an object-oriented database as a platform shoe made of titanium. It protects the foot, is stylish, will last forever, but you wouldn’t want to walk a marathon in it.

(Come to think of it, a blister is a rather OO looking image…)

Mary Janes

Next, let’s jump into the markups. Yes markup languages are way of storing and transmitting data. When one considers that there are now billions of web pages, each with all sorts of data, one can see it’s an impressive way of storing and transmitting data.

Of course, the first forms of markup had some trouble getting acceptance because the concepts were too complex, perhaps a little too rigorous. Mary Janes and Buster Browns–the earliest markups were Mary Janes and Buster Browns. Proper. Anal. Images of anklets with little embroidered violets, velvet jumpers, slicked down hair, bow ties and plaid, and above all, manners. Very proper, and just a wee bit scary.

Then there’s HTML. Oh my, HTML was the hippie of markup, the one that let it all hang out. HTML is a flip flop made of beach grass and old tire; bright neon flowers painted on the uppers, the soles, and the dirty feet thrust into them. It was a revolution, You say you want a revolution, oh yeah.

Eventually, though, we found the lack of discipline associated with HTML sucked about as much as the communes with their religious James and Josephs squatting like toads amid bright, beautiful, and really stupid flowers.

Enter XML. The Birkenstock of markups. Birkenstock shoe

Unlike the earliest markups, XML is relatively simple and fairly easy to understand, just like with HTML. Unlike HTML, though, XML has discipline. XML seems like the best of all worlds, except one thing: XML is syntax, but XML is not model. You can store anything in XML and it can validate but it doesn’t mean that it’s a good, or universal, use of XML.

Birkenstock with tuxedo. Birkenstock with evening gown. Birkenstock for tennis. Birkenstocks while jogging. Birkenstocks for mountain climbing. Skiing. Wearing by the fire on a cold night –eventually we couldn’t swing a dead cat without hitting Birkenstocks and the same can be said of XML. It is, literally, everywhere. Instead of creating structure, chaos flowered into a thousand angle bracketed blooms.

HTML was replaced by XHTML, which washed the foot and tamed the sandal into loafers with tassels–similar to what happened with Jerry Rubin when he turned 30:

By the end, everybody had a label – pig, liberal, radical, revolutionary… If you had everything but a gun, you were a radical but not a revolutionary.

There are the syndication feeds, such as RSS 2.0 and Atom. These aren’t, technically, permanent data stores; they’re more like the booties doctors wear over their shoes before surgery. The difference between the two is the right bootie is labeled from the left in Atom; in RSS 2.0, half the fun is guessing.

VRML was a 3D modeling language that reached sudden popularity years ago and just as suddenly, died away–disco shoes, only without the trail of glitter.

I like to think of OPML, with its reliance on exactly one element and several attributes, as equivalent to owning one dress and 453 pairs of shoes.

The use of microformats, where all sorts of metadata is loaded into the class attribute, is comparable to owning one pair of shoes and 453 attachments to accessorize them.

(”Should I wear the bow? Or the diamond clasp? I know, I’ll wear both!”)

If you’ve worked with web services you’re familiar with SOAP and XML-RPC, but they’re very much like wearing a stiletto: looks good from afar but plays hell on the infrastructure. No, RDF does not go unscathed in my data shoeology, but we have to define an understanding of shoe to foot to leg to body, and then also define contexts of use, and it all gets, frankly, a little complicated.

First there’s a definition of activity, such as tennis, and then there’s the association between activity and shoe, such as tennis shoe, but then we have to add in assertions that not all tennis shoes are worn to play tennis, which means then that we have to add statements such as “tennis shoe that’s worn for walking” and “tennis shoe that’s worn for dancing”, and make sure, then, that we establish the context of the activity before making the link between the activity and type of shoe. (This all just for a tennis shoe–I haven’t even gotten into defining an OWL for ‘pumps’ or ‘boots’, yet.)

Once all this is accomplished, we must format the results into a triple form:

Tennis is an activity
Shoes are worn during an activity
Tennis shoes are a type of shoe.

From which we can infer:

Tennis shoes are worn during Tennis

But then we have to reify the statement…

…at which point barefoot looks pretty good.

firewalking

Categories
Places

Fall

Though it has been raining steadily and the humidity high, the temperatures have fallen and I can now leave the windows open for long periods of the day. I just watched a school bus go by, with it’s reminder that school started this week. However, Fall doesn’t really start until after our long Labor Day weekend, in this city marked by dual Japanese Festival at the Gardens and the Air Show. I think I’ll go to the candlelight walk at the Gardens this year, but will pass the Sumo demonstrations.

The color in the trees has not started yet as our state is a late turner. We’ve had drought this year, but I have a feeling we’ll have a decent Fall season for all of that.

Categories
Photography Technology

Camp

Recovered from the Wayback Machine.

Saturday I slipped out for a couple of hours in the afternoon to go to the first day of the new Metrolink extension. I went later in the day and missed most of the crowds. The LinkFest associated with the opening was pretty quiet, and aside from having to walk a gauntlet of candidates, there wasn’t much going on.

Inside the new St. Louis MetroLink

Curved tunnel

Reflecting at the Station waiting for a train

The Train! The Train!

One of the candidates was a very impressive woman named Barbara Fraser who waved a brochure to get my attention when I started to walk past her. She was strong, confident, and engaging. Updated: I originally listed the wrong name and party affiliation, but she is the right person: Barbara Fraser was the candidate, she is Democrat and will get my vote.

The new MetroLink is surprisingly beautiful and graceful. A strong hint of curves all throughout the line, with a lot of raised platforms and tunnels–most with some simple light-based sculpture. I was only able to snap a few photos from Saturday; I’ll try to get more, later, that really do the line justice. Some photos are posted to the side. They’re not FOO Camp or BarCamp, or WorldCamp, or even all that campy–but I was able to sleep in my own bed last night.

Speaking of *camps, I have my own variation of a Chumby. I’ll post a photo later in the week, in addition to responding to a couple of other posts from Christine at christie.net including Open Data Standardards Redux.

As for the number of women at FOO camp increasing to a whopping 16.97%, all I have to say is: better, yes. Good. Well done. Much better than other conferences I’ve read on this week. (Couldn’t get much worse than the other conferences I read about this week.)

HOWEVER, not good enough. Next year O’Reilly, you can do better. I know you can.

I can see that Jeneane has been out disrespectin’ the fastigium of weblogging again. Along with Jeneane, there were responses in the MacLeod/Godin interview that puzzled me. For instance:

2. QUESTION: As a cartoonist, I find myself quite surprised that very few of the more prominent bloggers out there are in the “Arts”. It seems we have lots of business thinkers, technologists, entrepreneurs, consultants etc, but why do we have so surprisingly few filmmakers, playwrights, novelists, musicians, painters etc at the top of the pyramid? I have a few theories myself as to why this is, but may I ask what may be your take on it?

ANSWER: They’re coming, for sure. Postsecret is one of the three most popular blogs in the world. I think mainstream artists are rarely the first to embrace a new medium (silkscreening, for example, took a long time to get its Andy Warhol), but they’re coming. It’s going to be a new generation of artists that embrace the nature of the medium, and they’re just getting started.

I don’t know that I would classify Postsecret as art, as I believe Hugh MacLeod was referencing the term. To me, the site is more of a visual aid for Catholic Priests. Regardless, Godin’s response was off on so many levels. Leaving aside the term, “mainstream artist”, and stating as fact that silkscreening needed Andy Warhol to become ‘big’ and this is proof that old artists can’t somehow embrace new techniques or media…well, no, these two pretty much have me stopped.

However, at the moment I just can’t work up the energy to do more than twist an eyebrow up, like Mr. Spock. Twitch. Twitch. Twitchtwitchtwitch… Makes me glad Jeneane is there to beat such hyperbole into the ground. We take turns at this, she and I; that’s why we’re known as the Tag Sisters.

(Tag…get it? You know, like weblogging and ta…oh never mind.)

I was also reminded today to remind you all that I no longer have a gmail account (nor a Flickr account), and if you want to reach me by email, use shelleyp@burningbird.net.

Categories
Weblogging

Friday Links

Recovered from the Wayback Machine.

I have a Monday book deadline, so must behave and focus on work.

Though I had my own way to respond to the whole river of news fooflah Ralph at There is No Cat had the most important take on this issue: ‘river of news’ a way of focusing attention, not providing the proper solution to the problem of mobile accessibility:

The web has been all aflutter with news of Dave Winer’s latest, greatest invention, the ability to view web sites on mobile devices, which he calls “A River of News”. Neat metaphor, but the approach he takes, which amounts to little more than scraping poorly authored web sites and stripping out most of the crufty presentational HTML, is wrong-headed, a gnarly hack. As Danny Ayers points out in the comments to a post by Doc Searls, there are reams of documentation on best practices for authoring web sites to allow them to display on a wide variety of devices. Winer’s approach removes all branding from the sites in question, something that is absolutely unnecessary to display a site on mobile devices. It also requires visiting a different address than the normal address for a web site, which also harms the brand of the site in question.

One can provide a separate stylesheet for mobile access, which is the appropriate approach and one I need to implement myself.

Still, I like my logo (even if it is a croc and not a gator), I love the color, and I love the Big Dog feed. I plan on maintaining it, and adding to it. Suggestions on who belongs on the YellowGatr Big Dog feedare welcome.

Denise Howell writing for ZD Net is more favorable to the concept of ‘river of news’, but she brings up some issues of copyright, especially when sites republish the entire content. For instance, YellowGatr publishes whatever is published in the feed: summary or full. But then, most Planet-based online aggregators, do. Of course, most Planet-based sites are opt-in, while the feeds on mine are dragged in.

I’ve switched to summary in YellowGatr: Big Dogs, which means if the feed provides a summary, it’s published instead of the full content.

I think Denise’s issue on copyright is interesting and important, and one we’ve discussed in the past as it relates to full feeds. Unfortunately, every time we’ve attempted to have these discussions, the “news must be free” folks come along with cries of “Evil!” and the debate is usually shut down before we start. They’ll be happy to see my Big Dog feed. (Remind me to add BoingBoing to the list).

Where Denise and I differ is she thinks the idea behind ‘rivers of news’ is good, while I find it terribly flawed.

What isn’t flawed is Amazon’s EC2 service, which is a fascinating concept: mobile instances of processing that can be created and placed into a centralized computing cloud. As soon as the program opens up for more programmers, I’m going to give it a try–probably paired with S3 data access. I’m a curious tech, and I like to play with new toys such as this, even if I am somewhat ambivalent about the concept.

Why ambivalent? On the one hand, this is a way to expand one’s processing capability without breaking the bank, which could open the door to some rather innovative efforts. On the other, though, I find all this shifting of hosting process, data, and identity to centralized locations such as Amazon, Yahoo, Microsoft, and Google to be worrisome. Especially since companies such as these aren’t completely transparent as to their motives for such actions. Most of the services offered are either free, or heavily discounted, so fees from users are not an important component of the business model.

I’m concerned these centralized sites will become the black holes of the internet: sucking in more and more of the web until they may become too critically important for the web to operate without them. I’m also concerned about becoming a rat under observation as I push my bits of data through the Big Tunnel.

Regardless, I’m still going to try out Amazon’s EC2. Street cred, you know.

Speaking of street cred, Ajaxian posted a note about the upcoming Ajax Experience conference. In Boston, of course. Last one was in San Francisco. Of course.

The sessions sound interesting, but I was extremely disappointed to see that of 50 sessions, only 1, one was led by a woman. I had thought, hoped, that perhaps a newer generation of technology would attract a more diversified following and would demonstrate a break from the patterns established in the ‘old’ technologies–especially since I know for a fact that there are women involved in these technologies. This, though, how sad.

Finally, interesting NY Times article on a new Walker Evens show in New York, which also brings in the question of improvising on an original photographer’s work. I’ll have more to say on this, but later, I have to get back to work.

One more.

In addition to feeding my squid addiction, Pharyngula also comes up with some interesting ways of managing comments, including a new 3 comment rule. I agree, and insist that you all write at least three comments to all of my posts.

I must admit, though, I was rather taken with the way of ultimately managing the more obtuse among the threads. Anyone have a whiskey bottle handy?

Categories
Technology

Noirland

Recovered from the Wayback Machine.

The first Office 2.0 Conference is organized by IT|Redux, and brings together vendors, investors, industry analysts, and journalists. The goal for the event is to collectively build the foundation for Office 2.0, investigate technical challenges, and showcase practical applications. Most importantly, it will be an opportunity for like-minded people to meet and network with an elite group of visionaries and industry leaders.

What is like-minded for the Office 2.0 conference planners? Here’s a hint: Michael Noir would be an excellent keynote speaker.