Connecting RDF Technology

Portable data

Recovered from the Wayback Machine.

In addition to being on a panel at SxSW next year, I’m also giving a full day tutorial on RDF at XML 2005 on November 18th. Which also happens to be my birthday.

This is not going to be a passive exercise. I won’t be putting up slide after Powerpoint slide. There will be no hand waving and promises of Big Things to come. We’ll hit the ground running at the start of the session with a scenario that takes us from understanding the basic structures of the model (demonstrated via modeling tools); to using various tools to build an underlying data structure and application to meet specific needs; to consuming, querying, and re-using the data in various applications.

Those attending will have no time to read or respond to their weblog entries; no time to start a backchannel, because I have every intention of keeping attendees too busy and hopefully interested to be distracted. I’m assuming that the only reason why a person would stay the extra day after the conference is because they’re truly interested. Well, I aim to misbehave.

Oh, wait–wrong event. I am to provide.

The session is going to focus on incorporating RDF into our everyday activities, as I am heavily incorporating RDF into my weblog use. We’ll be exploring how one doesn’t have to use every last aspect of RDF in order to gain advantage from its use. In particular, I plan on exploring the use of RDF as an almost ideal portable data structure that doesn’t require a more formal database in order to operate (though we’ll look at how the two can coincide).

In the last several months, I’ve been experimenting with RDF stored in MySQL, as compared to RDF stored in files. When one considers that all applications eventually hit the file system, including databases, there is something to be said for using direct file-based storage for small, discrete models that may or may not be cached in memory for quick access. About the only time I really need the power of a centralized data store with RDF is querying across models–and heck, I have Piggy-bank on my Windows machine for that. More, I can easily and relatively quickly load all my little individual data stores into the database if I so decide.

This is the true power of RDF over relational: relational doesn’t work well with isolated, discrete objects, while RDF does. It is a truly portable database. Anyone can drop the data in at their sites without worry about having to create a database, or manage it. As for portability: how easy can you copy files?

Of course, since the data stored in RDF is meant to be exposed, then anyone can come along and grab the data and store it, using Piggy-Bank or other means. Combine it with their data, query the hell out of it, and use it as they will. As I can do the same with their RDF-based data.

But to return to the requisite hand waving and star-eyed pronouncements: my use of RDF isn’t Web 1.0 or Web 2.0; Semantic Web or semantic web. This is just the Web, stu…stupendous persons who are reading this.

Now, someone give me a million dollars so I can continue creating small stuff, usefully joined.


Link stripper

Stripper plugin update: Rather than strip out the hypertext links, I’m going create the RDF data entries whenever the post is saved, but only remove the links temporarily, and only when the page is displayed. This way the links are maintained within the text, which should reassure at least one person who I know might be interested in using this plugin.

It’s the only way to be able to maintain the link order and numbering even after the post is edited. I still want to put the data in the RDF file, as the SeeAlso functionality–which includes links to resources not necessarily directly linked within the document–also adds data to the RDF file, as do other plugins I’m creating (for photos, post info, and so on).

This functionality will end up being three plugins: one to create the RDF entries when the post is saved; one to output the entries to the syndication feeds; and one to add the references list at the bottom of a post when the post is published. This is in addition to the SeeAlso metadata extension, which allows a person to add references outside of normal linking.

Yes, this is going to be better.

Diversity RDF Technology

Women and Web 2.0

Recovered from the Wayback Machine.

I have many things to do and it seems or feels little time to do them. This is compounded by some frustration in wanting to get a little of the Fall photography in while I have the opportunity, but this endless summer refuses to end.

The only tasks I have left now are writing and the preparation for the RDF tutorial at XML 2005, and they’re slow going. In addition to my commitments, I also want to get some writing here to the weblog before I take what could be a longish break. I want to finish my bottoms up RDF tutorial, and yes, finally, my part 2 and 3 of Parable of the Languages, and few other odds and ends that aren’t tech related.

I am also planning on writing a detailed response to Tim O’Reilly’s Web 2.0 writing, but was dismayed when I looked at the speaker list for the O’Reilly Web 2.0 Conference. For a topic as diverse as Web 2.0, what statement about this wonderous new world is being made when only 7 out of 106 speakers are women? Is there room for hope among the hype of Web 2.0?

We can only wish that during the parties and schmoozing, those attending will look through their glasses of bubbly and notice that something seems to be missing.

Connecting RDF

Put up or shut up

Recovered from the Wayback Machine.

Recently, irritated by what seems to be an endless round of pushback against RDF, I made a put up or shut up statement in a post. Well, whether my irritation is justified or not, telling people that they have to put code down in order to have an opinion was not only wrong, it’s bad technology.

In these situations, a more appropriate response is to listen to the critics, find those areas of common concern among them and address these concerns. One way to do so is change the technology; another is to provide additional documentation and clarification of either the technology or the specifications on which the technology is based. At this point, some critics may still remain adverse to the technology and have legitimate reasons for being so. We can, then, either take on another round of fix and/or document; or we can decide that the effort to meet every concern is just not an effective use of our time. This is what makes good technology.

This week, James Robertson responded to a Robert Scoble ‘deal’, where the latter said he would switch weblogging tools if the tool provided a certain kind of support for OPML. Leaving aside whether Scoble will actually change tools based on this response, Robertson had some legitimate concerns about OPML and expressed them:

Ye gods, it’s time someone came out and said something. OPML is a really, really crappy format. Really crappy. I had massive headaches implementing OPML support for import/export in BottomFeeder. Why? Because there’s no real specification […] I had to add tons of hacks to the OPML support in order to support the export formats of various tools. The problem? Everyone implemented it a little differently, because the spec is incredibly unspecific – about just about everything.

I’ve looked at the OPML specification multiple times, and frankly, I’ve never had the least interest in trying to implement anything on it–not because I can’t, but because I see no purpose in it. The specification makes no sense; it reads like a mystery novel more than a technical document. Why would anyone possibly want to implement anything so vague? Coding just to code has never seemed useful to me; coding just because it would make Big Dog happy seems even less useful.

Regardless, OPML has its fans and more power to them. But when they start talking about implementation, others are going to bring up issues with OPML–if for no other reason than they’re like me, and think that there really is a true OPML specification somewhere, and we just haven’t been able to find it in Google yet.

Scoble’s response to Robertson was, frankly, asinine; especially for someone who purports to be writing a technical weblog, and prides himself on the ‘geek’ circles he inhabits.

James, here’s the deal. I really don’t care about specs. I’m a user here. When users say they want something the correct answer isn’t to call what they are asking for “crappy” but it is to either say “here’s what you’re asking for” or it’s to say “here’s what you’re asking for and I made it even better.” Or, I guess an OK response would be “I can’t do that, sorry.”

But if you say the format is crappy that makes me wonder if you have something better up your sleeve. So, I’m gonna call you on it. Do you?

This is classic put up or shut up. Robertson wasn’t telling Scoble what to do; he was using the post as an opportunity to make a statement of interest to him, about the underlying specification. If it had been played right, the OPML folks could have had some valuable insight into concerns about the specification, as well as perceived weaknesses. But no, it became something else–a satellite discussion that revolved around a few big dogs and aside from ensuring they have their weekly quota of links, hasn’t led to any positive advancement in technology.

Of course, I’m linking to the Big Dogs myself at this time, but it’s not because of OPML, as much as it is about “put up or shut up” as a way of shutting down discussions on technology. I did so with the one post I wrote and that was wrong on my part. Wrong, wrong, wrong. No matter how I package it up, I screwed the pooch with my response.

But to return to this whole OPML discussion, it seemed to me that what is happening is that Dave Winer really doesn’t want a clear specification. If he has one, he loses some leverage. Right now, to do anything with OPML you have to go through Winer. You can implement what you think is the spec, but there’s no guarantee that it will be ‘valid’ unless you get a Winer stamp of approval. And even then, there’s no guarantee that you won’t lose that stamp of approval six months to a year down the line.

Any technology that is dependent on a specific person is bad technology. This is true whether you’re looking to use the technology or implement it.

In the meantime, several people have written about the OPML specification and this ‘put up or shut up’ doorstop that make good reading:


The reason that developer’s just can’t get their OPML to work with Dave’s application is because the specification sucks. There is simply no way for anyone to tell if the OPML file generated by their application is really compliant with what Dave’s editor implements, or only just happens to never tickle a bug or an ambinguity which wasn’t specified.

Blogging Roller has an excellent take:

I think Scoble and Winer are right, it’s about the users. When you create a data format or netwok protocol specification, your users are the developers who have to implement the spec. In the case of blog tech specs, the users think the specs suck.

Roger Benningfield had two posts about the discussion: one on implementing OPML in JournURL and one responding to Dave Winer’s OPML guideline. Scoble, if you’re linking those who have met your demands, you need to link Roger’s post and weblogging tool, too.

Elliot Back takes a closer look at OPML from an XML implementation point of view.

In response to ‘put up or shut up’, James Kew wrote:

Like James, I’m not convinced that OPML is the magic bullet that Robert wants it to be. But I do firmly believe that shouting down critics with “do better or shut up!” is unhelpful, unproductive, and just plain rude: macho posturing at its worst.


Speaking of which, why are people so insistent on having the attitude that you can’t criticize something unless you can do better? Knowing that something won’t work is more valuable than coming up with the idea that doesn’t work – they’ve already done more than the person that came up with the original idea just by showing why it won’t work. Besides which, there’s a different set of skills required to do something than there is to evaluate it. How many wine drinkers know how to make a better wine than the one their drinking? How many have actually done it? Would you suggest they just drink whatever wine is put in front of them because they can’t do better themselves?

A succinct Fanklinmint:

Scoble, that’s not very nice.

Rogers Cadenhead seems particular y put off by my and Jason Levine’s criticism in Scoble’s comments. (Levine not, I am assuming, being the same Jason who specifically told me to shut up in said comments.)

Let’s just accept as a given that you’re right (especially Shelley Powers and Jason Levine). OPML is utter crap. We could do so much better.

You could do so much better.

Create a better, better-specified format for the tasks supported by OPML. If the format’s as bad as you say, you shouldn’t have any trouble at all topping it.

But if you don’t have the time or the need to do that, then please have the decency to turn your critical gaze away from OPML. This format needs an RSS-style flamewar like the Gulf Coast needs tropical storms.

Jason responded with:

You should know better than to say something as meaningless as “if you can’t create something better, then don’t comment on the issue.” It’s a straw-man argument, created to be a distraction; of course reviewers don’t have to be implementors, they just have to know how to review — critically, with reason and logic, and with an understanding of the space in which their reviews exist. In this case, both Shelley and myself have been at this long enough to know how dangerous having crappy specs is — if any interest is generated in them, apps pop up that end up unable to generate compatible files, unable to interchange data, and leading to an enormous mess for the very users who helped popularize the features that the spec advertised. It’s no good for anyone at all. But the fact that I don’t have the time or energy to create a new spec is all you see, and the point you attack, leaving untouched the fact that OPML is still a terrible spec; if people want it to actually work, then what’s the harm in Dave (or anyone!) actually putting it together into a spec that’s usable by implementors?

I was going to respond in some depth until I read Jason’s comments; then I basically just wrote “ditto”.

(Speaking of RSS: I’ve played around with implementing applications that produce and consume RSS 2.0, RSS 1.0, and Atom. Of course, the RSS 1.0 is pretty easy for me as I have an API that can speak the model so I ‘cheat’ and use it rather than parse out the markup. But I found the Atom specification harder to implement out of the box than RSS 2.0. Why is that? Because the Atom specification is so precise that I can’t just slop anything in. RSS 2.0 is much easier to hack, but I’m left wondering how many tools support multiple enclosures and how many tools do not. I avoided the dilemma by going with RSS 1.1. )

I don’t want to channel Mark Pilgrim and spend a lot of my time pointing out the obvious–other than, if you haven’t been out to Mark Pilgrim’s site, Dive Into Mark, he’s got an interesting Red Cross donation page. He also has a t-shirt for sale I wouldn’t mind adding to my collection; to wear during those times when I waste my time commenting in Scoble’s ‘mudpit’.

(Mudpit. Now that’s a way to set the tone of discussions. Taking a page from Pilgrim’s biblical approach, you reap what you sow, Scoble. )

And since I have no interest in ‘putting up’ code for OPML (how many monkeys typing on a TiBook randomly can…), I guess I’ll have to shut up–Burningbird style.

picture showing kitten running in terror with words to the effect that God kills kittens when you use OPML


Ignore the fact that it’s working

Uche Ogbuji is another voice raised in the “RDF is too hard, make it more simple” crew that seems to be have reached a crises all at the same time. Perhaps it’s the moon. Maybe it’s the water.

Uche wrote:

I get the feeling that in trying to achieve the ontological purity needed for the Semantic Web, it’s starting to leave the desperate hacker behind. I used to be confident I could instruct people on almost all of RDF’s core model in an hour. I’m no longer so confident, and the reality is that any technology that takes longer than that to encompass is doomed to failure on the Web.

Well damn, there goes my use of MySQL. PHP, too. I’m also working with REST and SOAP. Then there’s syndication feeds–if anyone thinks you can talk about ’syndication feed’ in less than an hour, you don’t know the people associated with RSS, RDF/RSS, or Atom.

Uche also mentions microformats, but as he’s found, these are anchored to whatever structure is used within a web page, and that’s not encompassing enough for all metadata needs. He then goes on to say that he’ll stick with RDF for now, hoping to be able to do what he needs to do without the more escoteric elements getting in the way.

Getting in the way. Hmmm. Well, let’s see:

Mozilla/Firefox has been quietly using RDF for much of its underlying menu structure and other uses for six years or so now.

RDF/RSS, known as RSS 1.0, has been providing syndication feeds for years.

FOAF is used to drive out networking in various environments.

Isn’t there a music site that outputs its data in RDF? I know the government is heavily into it, but that’s not necessarily a recommendation.

As for my own work, I update the metadata in my photographs in PhotoShop, which is used to provide information such as name, description to Flickr when I upload the pictures. When I embed a photo in this page, I create a data store of RDF information for the photos either by accessing this data directly from the photo, or getting it from web service calls from Flickr. This includes translating the EXIF data into RDF/XML format. I then make all of this accessible just by attaching /rdf/ to any post. This is used to drive out Tinfoil Project, and the photo page. In the photo page, I also reference the Google Maps API to use the geotagging included in RDF to pinpoint on a map where the photo was taken. I also have uses to manage my syndication feed, as well as providing references and pointers to other externally associated web pages.

And these are just the beginning of the uses of RDF I’m incorporating into my pages. Best of all, the data that I generate has been picked up by others–I know because I was asked to clean up my use of dates, which I did. Which means then you can use the data however you want.

Every technology has its controversial elements, its more escoteric side. Most technology has aspects that many of the people using it aren’t even aware. RDF is no different, and one can get by using RDF without even once having to become proficient with reification, or use a container. I know this. I have proved this. I have created several applications, have tried to give away code, have written about it time and again and what…not a damn thing. But then, I’m not one of the heads of RDF.

(What makes a person a ‘head’ in RDF? I could define ‘head’ at this moment, but this is a PG 13 weblog post.)

What’s even more frustrating is that when I focused on the more practical aspects of RDF before the specification was even on the street, I did not receive universal approbation from the RDF community for the fact that my coverage of these more escoteric elements was light. Or that I covered this implementation but not that, and so on. Now, these same people are calling out for a ‘kinder, simpler’ RDF.

*bang bang bang* If a technologist falls over in the forest, does she make a sound?

I am giving a talk called “Pushing Triples: An Introduction to Street RDF” at XML 2005, but I’ve about had it with talking.

I was once challenged to put code down to prove a point. So here is my response: put your code on the table, gentleman. Put your code down. I have.