Weblogging data model: Hello Mr. Christian

Time for a break from Linux for Poets, which is becoming quite fun…

Sam has started a wiki and a weblog entry looking for the basic data elements of what he calls “a well-formed log entry”, and by log, I would assume an online journal/weblog. Instead of drilling down into the physical, he wants to keep the discussion on the business, something I can get behind.

Sam writes that the essential characteristics of a log entry are authentic voice of person, reverse chronological order, and on the web. From this he derives required attributes for a log entry of permalink, creationDate, author, and content.

I come to the same conclusion though I don’t necessarily agree with the essential characteristics. After all, we’ve discussed what is meant by ‘authentic’, but I do agree with at least identifying a specific voice, one that’s guaranteed to represent one entity, regardless of the authenticity of the entity. So I agree with:

author

Sam also mentions reverse chronological order, and this is something else I don’t should be assumed. After all, just because it’s the standard doesn’t mean that everyone supports multiple entries displayed in reverse chron. However, I think that the date of a specific item is important, and then people can pick and choose how they want things displayed based on this date. More importantly, the date sets the context for the entry. After all, discussing the election of George Bush can have different meanings based on the year of the discussion. So, I agree on date:

date

Sam also talks about permalink, which is in some ways a physical manifestation of nothing more than a unique address of a resource on the web. Additionally, we all move – we will always move. The days when someone says, “You must not deal 404’s” are gone with the dodo bird. People move, domains change, life morphs, we all go on. So my preference would be to call it unique location at any instance of time, or unique location for short, rather than permalink:

unique location

In fact, the date and author become validation of the unique location – the unique location gives us one specific entry, and the date and author combined give us the same specific entry. By this approach, we have a better understanding of what we mean by ‘author’, which could be an individual, a company, a ficticious character, as long as it combined with the date, can give us the one entry.

Finally, Sam and I are in agreement on content, but don’t get all huffy (Doc) that we’re calling your beautiful prose ‘content’ – this is just a way of getting a handle on something. After all, if we were only hear to put an empty file out on a web server, and put our name to it, we wouldn’t have to worry much about popularity.

However, I would break content down into categories, all of which roll up into the higher level ‘content’ – something that’s very doable within the standard data modeling languages such as idef1x, ER, and so on. My categories would be:

content (category) – one or more of the following:

grouping of related items (a collection of children)
content directly
some variation of the content
Another like item

If I can dig up a freebie idef1x tool that will allow me to publish this as a conceptual data model online, I’ll post one. But for now, this is my first take – hand drawn so it’s rude and crude.

So, my first shot – now you tell me where am I right and where am I wrong. Note, though, that I agree completely with Sam – no implementation details, let’s keep it high level, business domain data model only now. That way everyone can join in, not just the techs.

Or in other words – you do boo boo and do tech voo doo and birdie reach down and slap your fine, fine hand with whisper thin but ouchy and terribly hot flames.

Now, back to poetic technology.

It’s unfortunate that the wiki mentioned above has quickly broken down into physical implementation issues such as content must be well formed (that’s physical), HTML (that’s physical), with an associated MIME type – that looks physical to me, and it precludes any discussion on content that isn’t some form of markup.

I don’t agree with the physical implementation, because it doesn’t account for a child/parent relationship that something like threadsML, threaded comments, syndication feeds, etc need. However, I wish we had given the high level at least a day of discussion before drilling down into implementation issues.