Old Skool

Recovered from the Wayback Machine. 

Lifted my head long enough from Adding Ajax to see a fooflah about Flickr’s newest announcement.

Flickr had said a long time ago that there would be a time when you won’t be able to have a login separate from a Yahoo account. Today the group announced that it’s no longer viable to maintain separate login systems and folks will have until March 15th to create a Yahoo identification and port their account to it. I must admit to some amazement about the anger this has generated. It was a given this was going to happen. It makes no sense to have two completely different sign-on systems.

Ken from Digital Common Sense writes:

I don’t like this. I have multiple Yahoo IDs. They are disposable, in part, becauseYahoo is disposable. My loyalty to Yahoo is non-existent. Their email sucks. The IM client is a bloated pig given years of creeping featurism with continual incorporation of crap the doesn’t work and users don’t want. In short, I’m not a Yahooligan.

I can understand that Ken doesn’t like Yahoo, but Yahoo did buy Flickr. In fact, chances are if they hadn’t, Flickr would have fallen under the weight of the demands on the system. It’s not many companies that have the built in infrastructure to handle the access sites like Flickr, or Yahoo for that matter, demand. True there are bigger photo sites, but the larger ones are focused around the photos, themsevles, which are a static, easy to serve and maintain commodity . Flickr is a community site with enormous CPU, complex data storage, as well as bandwidth needs.

If Flickr was asking something that wasn’t reasonable, I could understand the push-back, but not wanting to maintain separate sign-on and identity systems makes perfect sense–I wondered at them keeping these separate for so long.

Still, if folks aren’t comfortable with a Yahoo ID, they should consider dropping this account. There are other social photo systems, such as Zooomr, though I agree with Anil Dash in that …using these sort of opportunities to promote a competing business… is not cool. Predictable, but not cool.

Other concerns are that you can only now have 3000 contacts, and no more than 75 tags per photo. Wow, what a hardship. One person has 19,000 contacts, and as soon as he mentioned this on the thread, those folks who were among his contacts asked to be taken off. There’s a new thing in social system called the contact junkie, who craves contacts as others would crave the next heroin fix. I suppose that Flickr making these folks go cold turkey is cruel, but I can’t see a system being maintained just for the less than 1/10 of one percent of connection addicts.

It’s interesting about how some people seem to think there’s an ulterior motive for all of this, because, according to these folks, putting limits on data structures is never necessary for enhancing the performance or robustness of the system. Before you ask, no, none of the people making statements like this have a clue in how systems are built.

There are legitimate concerns about this move, not the least of which is it is difficult to find a meaningful Yahoo identifier. Luckily mine, P2PSmoke is one I’ve had since P2P was the hot thing; way before all this social software stuff. When folks talk about ‘old skool’, I have a meaningful Yahoo account–can’t get more old ‘skool’ than that.

(Question: why can’t cool people spell words correctly? This trend to add ‘cuteness’ to words should die a sudden and irreversible death.)

The point is moot, though, because if these forms of social environment are meant to equalize between participants than separating between the ‘old skool’ identities and the Yahoo identities for the newer folks is just another way of creating a false sense of elitism. With this switch, some of this is being swept away, and I wonder how much of this fooflah is because of this very thing?

It’s fascinating to read the threads–all those folks feeling betrayed because, according to one person in one of the threads:

Stewart, why are the oldskool members being treated like second class citizens on this issue, we are the community that made Yahoo buy you guys. Nothing like alienating your core supporters and cheerleaders. We are the bloggers, podcasters, videobloggers, and photographers that made the community. Your alienating the most vocal people on the internet. It’s going to be a shit storm of bad press for yahoo and Flickr tomorrow from the blogosphere, I promise you that.

All I can say is: When the frog farts in a pond in the forest, the cat in the city doesn’t smell it.

As I said earlier, there are legitimate concerns about this move: what are the Terms of Service differences between having a Filckr account as compared to one for Yahoo? People have had problems with Yahoo sign-ons, and other technologies, and the merge doesn’t sound like it’s well crafted: what kind of support is available to help folks with this move? What additional constraints will this move have on folks, other than having to have a separate account? Can the people use the same email addresses? Not to mention that it is really tough to find a unique user name with Yahoo: how about a Flickr specific namespace for identities?

In a way, Flickr’s sign-on merge into Yahoo may actually have a reverse effect, because Flickr’s customers tend not to be as geeky as Yahoo customers (yes, I know they’re the same, bare with me); this move might actually lead a trend into improving the overall Yahoo customer service interface. Or not, and people will quit Yahoo and Flickr both.

I have my old, old Yahoo account, which I’m now using with a Flickr free account specifically for development purposes. Flickr still has one of the better open web services, made better with the new ‘machine’ tags concept. I don’t post photos much anymore, and certainly not at a ‘social’ site, so perhaps my lack of concern doesn’t reflect the concerns of others. I am sympathetic to those who are concerned about issues of privacy, or who have had problems with Yahoo’s technology, or with the photo merge. I have no sympathy, though, for those who seem to be more concerned about losing their ‘old skool’ status, or worse, using this as an opportunity to shill for another company within the threads set up for the discussions.

Bottom line, though, is that I’ve never known Flickr to pull their punches, and if they say this is going to happen, this is going to happen. That’s one of the things I’ve always admired about Flickr: lack of smarmy marketing. What you see, is what you get.

SmugMug is offering 50% off for Flickr jumpees. There you go, Thomas Hawk.

Diversity Technology

A Matter of Language

When all things are equal, inequality reflects failure.

Virginia DeBolt responded to my earlier writing about technology education being broken with a post about Educating Women in Technology. She references two innovative programs: New Horizons, at Mills College in Oakland, California, which teaches computer technology to those with a non-technical background; and the University of Colorado’s Bachelor of Innovation degree.

Though I agree with Virginia that both programs are an excellent step in the right direction, they don’t address the fundamental issues that lead to what I consider crippled and ineffective computer science university programs. Take, for instance, the New Horizons project: it provides a way for people with a liberal arts bachelor’s degree to get a Masters degree in computer science. It’s open to both men and women, but unlike traditional computer science courses, there are many more women than men.

Much of the emphasis of the program is providing a less intimidating environment. Borrowing Virginia’s quote from a San Francisco Guardian article on the program:

Introductory CS classes at most universities “act like weeder courses,” scaring away all but the most confident students, [Mills computer science associate professor Ellen] Spertus says. Typically, up to half the students fail or drop out of introductory CS classes at other institutions. Spertus says this phenomenon hits women hardest because they may have less computer experience as well as less confidence…Spertus finds that many students going into her program suffer from low self-esteem — especially female students. She says they’ll be earning A’s in the program’s classes but will be convinced they’re not doing well and somehow “don’t belong.” Her teaching style, simultaneously rigorous and nurturing, helps change their opinion, she hopes.

I agree with the sense of ‘not belonging’ that many women experience in traditional computer science programs, but I disagree with Ellen Spertus that lack of confidence is a major deterrent to women in computer science. Women make up half, or more, of the students in several different extremely rigorous and/or competitive fields at many universities: including mathematics, medicine, law, most of the sciences, business, and others. Unless we think that computer science only attracts the less confidence, we should consider that there are other factors in play. These factors may lead to a growing lack of confidence, or may be perceived to be based on lack of confidence, but I would say that this is more an effect than a cause.

The New Horizons program is successful in that the many of the cultural issues associated with the field are eliminated, primarily because most of the students are women. Mills College is a college for women, and though this program is open to men, I would bet that most men would find it uncomfortable to get a degree, even a Master’s, from a college that is predominately a women’s college. As such, the program stays dominated by woman, and that’s one factor thats significantly different from other comp-sci programs. More importantly, the program also provides a very effective environment for women and men with families, jobs, and other non-academic priorities–something that wouldn’t be tolerated in most computer science programs. Actually, it wouldn’t be tolerated in most academic programs, which are, more or less, geared to the mindset of an 19 year old male from an affluent family.

(I also don’t know if I agree with the statement about computer science being a lucrative field, with globalization’s massive impact on this field.)

New Horizons is effective, but this approach is more of a bandage than a solution to a problem. We can’t continue the ‘separate but equal’ routine of dealing with the problem of astonishing lack of diversity in the computer field. Leaving aside culture as the only determiner–because after all if such is the sole criteria for women in college, than wouldn’t this also impact on women in law and women in medicine?–those components of computer science I consider especially broken have less to do with how the environment is managed, and more to do with the subject, itself.

Computer Science suffers from an early and inappropriate association with engineering, another field that tends to be massively male dominated. In fact the two fields, computer science and the different flavors of engineering, are always the departments in any college that have the fewest women students. Because of this early association, there’s a strong engineering bias built into the field of computer science: a bias that doesn’t necessarily make it a ‘better’ field, numerous books on the subject aside.

We assume this engineering connection makes the field of computer technology better. Why? Because the people in the field most successful are those more capable of adapting to the odd and pervasive cultural and linguistic biases inherent in engineering. Since the most successful people in the field are the ones most likely able to establish a pattern of what are ‘good’ or ‘bad’ computer science practices, an engineering bias (evidence of membership also demonstrating a gender and a cultural bias) has been interwoven into the field in such a way that it’s almost impossible to be able to view the practical application of computer technology separate and apart from engineering practices.

It is an inappropriate blending of fields; a coercion of the natural growth of computer technology. It’s like visiting a relative and wearing his or her clothes: they might seem to fit, but you’re never completely comfortable because you know the clothes are borrowed.

A good example of the engineering influence in computer science is the linguistic bias inherent in programming languages. Grace Hopper was the first to promote the concept of an English-like syntax when creating computer programs. Her work ultimately led to COBOL, which has been the butt of jokes and criticism since. One such criticism is that COBOL is excessively verbose. This is interesting when you compare it with the newer generation of languages popping up in the field just at a time when not only has the numbers of women not been increasing–our numbers have been shrinking. In particular if you follow a sequence from Perl to Python to Ruby, there’s one obvious trend: the language is losing its verbosity. Ruby is so stripped down to the barest minimum to support the programming constructs that you could almost write a complete weblogging tool in 20 lines or less.

This lack of verbosity makes for shorter programs, and less time to write such programs. However, the language is also incredibly cryptic.

Compare PHP or Java, which though not as verbose as COBOL are still ‘chatty’ compared to Python and Ruby. I’ve worked with a huge number of programming languages, over 23, yet I have found myself increasingly ‘alienated’ if that’s the word, from the languages in use today. In fact, one of my biggest criticism’s of Prototype, the Ajax/JavaScript library, is it’s use of Ruby constructs, and functions such as $(var), to access an element in the page.

Programming constructs such as this may strip away the ‘fat’ that English or other linquistic components add to other language variations, but at what price? I wrote someone once that when I first saw a ‘larger’ Ruby application (larger being relative), my first thought was: this is a language written by men for men.

A better way of saying this, though, is that this is a language that favors a certain mental bias; one that’s pervasive in engineering and that heavily influences computer science, both in an educational sense and in practice. It is a bias that favors a more mathematical, or perhaps spatially holistic would be a better term, view of an application over a more verbose, verbal view of the same.

Spatial over verbal: where have we heard that before?

We’ve all heard the results of controversial studies that report cognitive differences between women and men in two main areas: women have greater language skills, while men have more spatial acuity. Of course, many of these studies are flawed, with samplings too small to really understand what constitutes a ‘significant’ difference. It’s also difficult to strip out the environment; to deny that boys are more encouraged to indulge in solitary past-times such as taking apart the toaster or working on the car; while girls are encouraged to spend time, even hobby time, with their friends.

Regardless of whether there really is a gender bias when it comes to language and spatial reasoning, programming languages–from COBOL to C, from BASIC to C++, Java and PHP to Python and Ruby–do reflect a cognitive bias: either exhibiting a bias towards the verbal or a bias towards the spacial; a bias that can impact on how well a person uses the language, or more importantly, how comfortable they are with the language.

A better explanation of my initial perception of Ruby would be that it’s a language that’s biased towards those who favor the spatial over the verbal, and I’m most comfortable working with a language designed for those who favor the verbal over the spatial. Not to say I can’t learn Ruby or Python, and even grow to appreciate and like both. However, it’s like putting on my cousin’s pants: they might fit, but I’m never going to be as comfortable in them as my cousin.

The Wikipedia article associated with computer programming has an interesting remark:

Another ongoing debate is the extent to which the programming language used in writing programs affects the form that the final program takes. This debate is analogous to that surrounding the Sapir Whorf hypothesis in linguistics.

The quote has to do with linguistic determinism, whereby the language we use determines how we think. There’s disagreement on this, and studies supporting and studies refuting, but it is a fascinating subject. Made more so by extending it to the computer languages we use, and how they impact on the overall structure of a program. Again, are programs such as Agile arising because of the fact that our practice of technology is skewed to a specific bias, not to mention personality?

Perhaps we’ll find that object-oriented development is really an outgrowth of a bias toward the spatial over the verbal, and that we’ve managed to create an entire field that consists of one gigantic human filter. We don’t know, because we’ve never thought to challenge the disparity in the computer science field based on the development of the subject, not just the environment.

That’s why I say the computer science field is broken, and rather than focus purely on environment or culture, we need to examine the myriad ways in which it is broken, recognize each, and find solutions: we can’t depend on providing ‘warm nurturing environments’ as being the end all, be all solution for every problem.

For instance, if the computer science programs were split up in universities, with computer technology incorporated into other fields such as philosophy, library science, psychology, math and so on, we might find that each field ends up with its own programming languages–like a suit of clothes custom made for fit and comfort, compared to buying off the rack or worse, borrowing from our cousin, who has the worst taste. The Bachelor’s of Innovation somewhat reflects this, but again that’s seen more as an interdisciplinary field than realizing that computer technology is a part of lives, is a tool, and how we teach it should reflect this.


Cheap Eats at the Semantic Web Cafe

Recovered (comments and all) from the Wayback Machine.

It’s a rare event when several seemingly disparate items of interest all come together to form a compelling, coalescent whole. This event happened for me the past few weeks; an experience formed of discussions about digital identity and laws of same, LID, Technorati Tags, new and old syndication formats, Google’s nofollow, and the divide between tech and user. Especially the divide between tech and user.

I’ve written about digital identity and LID and nofollow recently, so I want to focus on Technorati Tags in this writing, and then, later, bring in the other technologies relationship to same. Besides, for someone who is interested in lowercase semantic web, how can my ear not be all a quiver when I hear about a new way of ‘adding meaning’ to what can be a meaningless web at times?

Tag, you’re it

If you’re unfamiliar with Technorati Tags, it’s a new implementation of an existing concept previously enabled by other sites such as and flickr. With Technorati tags, webloggers can annotate their entries to add keyword associations to their work forming a quasi-classification on the hoof, so to speak.

When you update your weblog, and ping Technorati (or some other service that results in Technorati’s web bot consuming your post), the link to your post is then added to the other most recent additions to the other entries that share the same tag. Not only that, but items at delicious and flickr are also shown in the page, as this entry labeled Folksonomy demonstrates.

From reading other webloggers, the main excitement behind Technorati Tags is its ability to socialize a classification. David Weinberger wrote the following when the concept was first rolled out:

This is exciting to me not only because it’s useful but because it marks a needed advance in how we get value from tags. Thanks to and then flickr in particular, hundreds of thousands of people have been introduced to bottom-up tagging: Just slap a tag on something and now its value becomes social, not individual.

Cory Doctorow shared in this enthusiasm, writing:

Technorati Tags are keywords that map to category names, keywords, and other cues in blog posts. When you bring up a Technorati Tag for “computers,” you get all relevant blog posts that Technorati knows about, presented on a page with relevant links and relevant Flickr images. Technorati Tags blend three different Internet services and three services’ worth of tags to tease meaning out of the ether. Brilliant.

Ross Mayfield writes

But below all that global heady stuff, what tags do really well is aid social discovery.

Simon Waldman jumped in with:

Smart. Smart. Smart. If a little rough round the edges.

And Suw Charman enters the lists with:

All in all, this is an interesting way of using emergent tagsonomies to pull together diverse datastreams in one place. As it happens, I’ve had a number of different conversations recently with friends about such things, and this is a useful first step along the way to creating a single entry point for a variety of sources.

It might seem at first exposure that the enthusiasm for Technorati Tags is a little difficult to understand. After all, we’ve been able to classify our writings for a long time in our weblogs; as for searching on specific topics, we’ve had considerable experience using keyword searches in Google and Yahoo. However, the interest in Technorati Tags seems to be focused on its value as a social grouping rather than as a way of categorization. Waldman referenced the term “self-organizing web”, to describe the concept.

For instance, if I were using Technorati Tags in this post, I would add whatever tags I felt represented the content of this writing, such as FolksonomyDigital_IdentityTags, and Old_Mills. Of course, when checking Old_Mills, I find that this is fresh meat from a Technorati perspective, as there no previously annotated weblog listings using this tag. This leads me to believe that perhaps there’s a different tag I want to use. After all, if I’m going to go through the bother of using a Technorati Tag, I’m would rather use one that puts me into an active social classification than one that doesn’t. So I try Missouri instead, because after all, the photos of old mills in this writing are in Missouri. I see a gratifying number of entries for this tag, providing positive feedback of my choice.

This process of refining exactly which tags to use demonstrates what we’re told is the true power of Technorati Tags–not that we, as individuals, can categorize our writing any way we want; but that people will seek out existing tags that represent their material, and therefore begins a grass roots taxonomy–or folksonomy to use what is becoming a popular term.

Returning to my ‘socialized choice’, among the other entries tagged “Missouri” are pointers in to a Metafilter discussion on the recent ruling about the KKK being allowed into the highway cleanup program, and an interesting story in reference to the New Mardras fault, both stories I’ve written about and if had tagged previously, would also show in the list. This does demonstrate the positive grouping effect of these tags.

Still, there are other entries that look more like ads than entries related to Missouri, including ones for mobile DJs. This demonstrates one of the negative aspects of Technorati Tags: their vulnerability to spammers. Another vulnerability that has been quickly pointed out is that the material can be seen as inappropriate to the topic or even offensive when placed next to the other material that’s published in the same category.

Bad tag. Bad.

Rebecca Blood was one of the first to make note of inappropriate material within the content tagged with “MLK” for Martin Luther King day.

Now, that photo is perfectly appropriate on Flickr as part of an individual’s collection, and as documentation of Sunday’s rally. It’s perfectly appropriate as an illustration for ‘protests’, or even ‘Israel’ and ‘Palestine’, even though it surely will offend some people wherever it appears. But it is not appropriate to illustrate a category tagged ‘MLK’. I personally was offended–these sentiments reflect the polar opposite to those espoused by Dr. King. More to the point, such an illustration is inappropriate–that poster has as much to do with Dr. King as would a picture of a banana peel.

Foe Romeo also noticed this, especially when looking at the Teen tag and noticing links to a pornography weblog and suggests that Technorati has taken on new roles as both editor and moderator with the introduction of Tags. In her comments, Kevin Marks responds to her concerns with:

We have confirmed with Flickr that pictures flagged with offensive are not included in external feeds, so the advice to Rebecca to visit Flickr to warn about the picture was correct; we also removed the german porn spam blog you noticed from our database.

We are still feeling our way here, and adding community moderation is one possibility.

But another commenter, Beerzie Yoink (who links to an interesting website, btw) wrote:

I’m not a technical genius, but quite frankly don’t see how they are going to manage this. Won’t tags used by spammers, pornographers, racists, and other jerks will be hard to separate from legitimate posts? It will be interesting to see how this plays out.

(em. mine)

Within a day or so of Tags being released, questions have been asked about separating out ‘good’ material from ‘bad’, and finding ways of altering Technorati so as to eliminate offensive material. Of course, as Julian Bond points out, there’s a mighty big chasm between here and there when it comes to this type of change:

We seem to be playing out the same old, same old pattern once more that’s been done a million times before in online communities. The Politically Correct Police (PCP) are making lots of noise about how “This isn’t right and SOMETHING SHOULD BE DONE”. The Anti-PCP come along, who love a good flame war, and are finding ways to wind them up. The poor developers get backed into a corner and end up coming up with a series of nasty hacks to sanitise what was once a nicely elegant, simple and minimalist solution. What makes me laugh in all this are the ludicrous solutions put forward by the PCP who clearly have never been anywhere code.

One of the challenges with self-forming community efforts is that each member brings with him or her different interpretations of why the group has formed, and what it’s purpose is. What’s particularly fascinating about it is that the same people who exult the ease with which the group can form, are also the same people who then pick through the members, saying which ones can stay, and which ones have to go.

While some of those who have questioned the overall goodness of Technorati tags have focused on the correctness of the content, others focused on the quality of the overall effort. In other words, can cheap semantics scale?

Get yer semantics here! Red hot semantics! Get ’em while they last

I took the title for this post from Tim Bray’s discussion about Technorati tags, where he wrote:

I’ve spent a lot of time thinking about metadata and have written on the subject; the most important conclusion was: There is no cheap metadata. I haven’t seen anything to make me change my mind.

Having said that, and granting the proposition that The Simplest Thing That Could Possibly Work usually wins, I still have to say that the Technorati Tags all being in a single flat namespace does seem a little, well, brittle.

Liz Lawley also wrote on her concerns about the long-term viability of tags and folksonomies, specifically, whether group concensus leads to valid, or best, results:

On the one hand, as a librarian, I understand completely the value of controlled vocabularies and taxonomies. I don’t want to have to look in six different places for information on a given topic—I want some level of confidence that the things I want are grouped together. On the other hand, I don’t share the optimism that so many of my colleagues in this field seem to have that the collective “wisdom of crowds�? will always yield accurate and useful descriptors. Describing things well is hard, and often context-specific.

Bang on the money except that I would extend this further to read, “…describing this well in such a way as to be meaningful to a great proportion of the populace…” All of us can describe things easily understood by ourselves or our immediate social groups.

Both Liz and Tim reference a post by Clay Shirky where he writes that though folksonomies (the concept to which Technorati Tags has been linked) may not have the quality of well-designed vocabularies, they’ll still persist and ultimately triumph, primarily because these efforts minimize cost and maximize user participation.

This is something the ‘well-designed metadata’ crowd has never understood — just because it’s better to have well-designed metadata along one axis does not mean that it is better along all axes, and the axis of cost, in particular, will trump any other advantage as it grows larger. And the cost of tagging large systems rigorously is crippling, so fantasies of using controlled metadata in environments like Flickr are really fantasies of users suddenly deciding to become disciples of information architecture.

Any comparison of the advantages of folksonomies vs. other, more rigorous forms of categorization that doesn’t consider the cost to create, maintain, use and enforce the added rigor will miss the actual factors affecting the spread of folksonomies. Where the internet is concerned, betting against ease of use, conceptual simplicity, and maximal user participation, has always been a bad idea.

Yet it’s interesting that those who support the concept behind folksonomies tend not to use it as effectively as they could, as pind’s dot com discovered when looking at the tags used by Liz and Clay. What’s needed, he then writes, is technology that helps him, and the rest of us, do a better job of classification. But then that takes us back to Julian’s statement about taking minimalistic solutions such as Technorati Tags and telling developers to ‘make them better’–make them so that they perform as well as controlled vocabularies, but without requiring any effort, expertise, or discipline on the part of the users of such technologies.

The concensus among all those who wrote on Technorati Tags seems to be that if folksonomies are not as sophisticated as we would wish, may not scale well, or have the quality that controlled vocabularies have, they’re still based on typically simple solutions; easily applied by the user, controlled by the user, and therefore are better than not having anything when it comes to trying to build this semantic web of ours. Or as Clay wrote:

The advantage of folksonomies isn’t that they’re better than controlled vocabularies, it’s that they’re better than nothing, because controlled vocabularies are not extensible to the majority of cases where tagging is needed. Building, maintaining, and enforcing a controlled vocabulary is, relative to folksonomies, enormously expensive, both in the development time, and in the cost to the user, especailly the amateur user, in using the system.

I grant that tags (Technorati, Flickr, and other) and the other tools of folksonomies are better than having nothing at all; but is there a possibility that they are also worse than having nothing at all?

Bad habits are hard to break

Recently I, and others, wrote about a new single sign-on digital identity system called Light-Weight Digital Identity (LID). What caught our attention wasn’t necessarily that LID was the best digital identity system proposed–there are a lot of unanswered questions inherent with the current implementation–but that it was the first that actually delivered code into the hands of the user that empowered us to control our own identities.

When I wrote on LID, I was asked in several emails what I thought of the Identity Common’s effort with XRI ((eXtensible Resource Identifiers) and XDI (XRI Data Interchange)–universal identification and data exchange protocol specifications, respectively; particularly since I am such an adherant to RDF and both are dependent on URI (Uniform Resource Identifiers) to identity objects of interest, and the implementations of the two could be made interchangable through existing technologies. I answered that I was ‘briefly’ familiar with them, the briefly based on the fact that both are still primarily in specification stage and there is no implementation that I can put my hands on. I could agree that many of the issues about digital identity and problems associated with it have been addressed by the documentation for XRI/XDI — but where’s the goodies?

In other words, XRI/XDI may be the more robust solution, but there’s nothing that I can work with (pre-alpha sourceforge projects not withstanding); where LID, perhaps not as robust, does provide something I can not only use immediately, and I can use without any form of centralized architecture being in place to support it.

Or as was noted in the mailing list for the Identity Commons efforts, sometimes the … “simplest thing that could possibly work” is very attractive indeed.

While I was being questioned about XRI/XDI, several people had emailed Kim Cameron to ask his opinion of it. Kim has become somewhat of a leader in the digital identity community through his interest and not the least because of a set of ‘laws’ he started defining for digital identity implementations.

Rather than address it directly, Kim released a sixth law of digital identities that read as follows:

The Law of Human Integration

The universal identity system MUST define the human user to be a component of the distributed system, integrated through unambiguous human-machine communications mechanisms offering protection against identity attacks.

This law references one of the difficulties inherent with the efforts behind much of the digital identity movement, in that most of the solutions are focused on organizations protecting themselves from abuse and fraud, rather than on individuals being able to safely and easily use whatever solution is provided. This would seem to support LID. However, Kim also provided a scenario earlier in his lead up to his sixth law that plays more subtly on this issue:

To take a very simple example, suppose you have a browser with an address bar showing you the DNS name of the site you are visiting. And suppose there is a “lock icon” which appears when a “secure connection” is in place. What is to prevent a piece of code running on your machine from overwriting the DNS name and throwing up a fake lock icon – so you are convinced you are visiting one secure site when you are actually visiting another insecure one? And so on.

Of course our usual immediate reaction to this type of problem is to find the most expedient single thing we can do to fix it. In the example just given, the response might be to write a new “safe address bar”. And who am I to criticise this, except that in the end, the proliferation of address bars makes things worse. By inventing one, we have unintentionally made possible the new exploit of getting people to install an address bar with evil intent built right into it. Further, who now can tell which address bar is evil and which one is not?

The point I am trying to make is that the new distributed identity system needs to be something other than an “expedient compensation”, something beyond a tactical riposte in the fight for security. And since the identity system has to work on all platforms, it must be safe on all platforms. The properties that lead to its safety can’t be obscurantist or derive from the fact that the underlying platform or software still has a small adoption.

In other words, the expedient solution may not be the best overall solution.

Whether LID can be seen as an ‘expedient solution’ or not, if LID had implementations in PHP or Python that would be simple to install and use, and there was more clarity on the license, it would have fired enough grassroots support to make it a contender for the de facto digital identity implementation, thus making it that much more difficult for other, perhaps more ‘robust’ solutions to find entry into the community at a later time.

This also applies to the concept of meta-data. If people become used to receiving value, even if it is only limited value, from folksonomies based on very little effort on their part, they’re going to become reluctant when other more robust solutions are provided if these latter require more effort on their part. Especially if these more robust or effective solutions take time to be accessible ‘to the masses’ because the creators of same are *enclosured behind walls built of scholarly interest, with no practical means of entry for the likes of you and me.

Clay expands on his general theme of the suckiness of ontologies, as compared to folksonomies because the former forces a future prediction of structure while the latter allows for dynamic growth; the former is based on a graph, with predefined nodes, each requiring a progenitor, while the latter is based on sets, and the only barrier to entry is forming a decision to belong.

Ontology is a good way to organize objects, in other words, but it is a terrible way to organize ideas, and in the period between the invention of the printing press and the invention of the symlink, we were forced to optimize for the storage and retrieval of objects, not ideas. Now, though, we can scrap of the stupid hack of modeling our worldview on the dictates of shelf space. One day the concept of creativity can be a subset of a larger category, and the next day it can become a slice that cuts across several categories. In hierarchy land, this is a crisis; in tag land, it’s an operation so simple it hardly merits comment.

The move here is from graph theory (arrange everything in a tree graph, so that graph traversal becomes the organizing principle) to set theory (sets have members, and the overlap or non-overlap of those memberships becomes the organizing principle.) This is analogous to the change in how we handle digital data. The file system started out as a tree graph. Then we added symlinks (aliases, shortcuts), which said “You can organize things differently than you store them, and you can provide more than one mode of access.�?

Yet, as we’ve already started to see with Technorati Tags, as with other implementation such as tags and flickr, low barrier to entry usually doesn’t scale well. Something like the Missouri Tag may have few enough entries to make finding the meaningful data easy, but something like Weblog results in so many members as to make it difficult to differentiate from the populace as a whole. The same applies to social networks, where people collect so many ‘friends’ as to make being a ‘friend’ of the person inherently meaningless.

So then we start exploring ways and means to make these simple systems and folksonomies more effective. In the case of Google, the developers create algorithms that try to add meaning to the results returned on a search by basing the results on number of links and popularity of a site, with an assumption that popularity equates to authority. In the case of Flickr, social behavior is incorporated into the tags, and members can label photos as ‘offensive’, in which case the photo is excluded from external feeds. However, without having a clear, not to mention shared, idea of what ‘offensive’ means, the results will always be suspect. After all, some would say that photos of a woman’s bare breasts or a man’s penis are offensive; others would say any photo of President Bush is offensive.

All of these solutions and the tricks to make them work better are based on the fact that the rich context of the data is not captured along with the data, and therefore there is only so much good we can wring out of these ‘cheap’ semantic web solutions before they’re wrung dry and spit out like overchewed tobacco cud. Or before they’re gamed by people such as the comment spammers, and then we, the blades of grass within the grassroots efforts, have to add more effort to our input in order to ‘refine’ (read that ‘fix’) the results, as witness the recent release of Google’s nofollow attribute.

(One could say that Peter Kaminksi is prescient when he remarks January 15th about annotating links in a similar manner to Technorati tags, so that Google could also participate in the new, more meaningful web.)

It is the structure, the future prediction, careful classification, and directed graph nature that Clay disdainfully rejects that allows us to capture the rich nuances of data that will persist longer than the quick transitory interests that meet efforts such as Technorati Tags. One only has to compare the Technorati Tag for Terrorism with the Weapons of Mass Destruction, Terrorist, and Terrorist Type ontologies, and associated instance database to see where the discipline to apply more robust metadata concepts can result in much more controlled, and specific, result sets. And since the data is defined in a universally understood model, RDF, you don’t even have to use the ontology creator’s own search tool (try who, what, where for the three values, in that order)–you could use my much more crude, but quickly hacked together Query-o-Matic, based on existing technologies.

Louis Rosenfeld discusses the strength of searches among controlled data sources as compared to that of folksonomies:

Lately, you can’t surf information architecture blogs for five minutes without stumbling on a discussion of folksonomies (there; it happened again!). As sites like Flickr and successfully utilize informal tags developed by communities of users, it’s easy to say that the social networkers have figured out what the librarians haven’t: a way to make metadata work in widely distributed and heretofore disconnected content collections.

Easy, but wrong: folksonomies are clearly compelling, supporting a serendipitous form of browsing that can be quite useful. But they don’t support searching and other types of browsing nearly as well as tags from controlled vocabularies applied by professionals. Folksonomies aren’t likely to organically arrive at preferred terms for concepts, or even evolve synonymous clusters. They’re highly unlikely to develop beyond flat lists and accrue the broader and narrower term relationships that we see in thesauri.

Returning to Kim Cameron’s sixth law, which states there must be an unambiguous and non-corruptable interface between the user and the technology, we could also apply to this metadata: the costs to support controlled vocabularies/ontologies and uncontrolled vocabularies/folksonomies are the same. At some point a human has to intervene with the technology to refine and validate the result. With ontologies, the intervention occurs before the data is captured; with folksonomies, the intervention occurs with each search.

I put my money on the ‘refine and validate just once’ solution.

Isgood but…is good?

Though Rosenfeld and most others I’ve listed here support folksonomy efforts, some with caveats, others unreservedly, as just one of a variety of technologies that help people find what they need, I tend to be of the camp that believes focusing on easy solutions will make it more difficult to get acceptance for ‘better’ solutions that may require a little more effort. This puts me in the exact **opposite camp of Clay Shirky.

Clay believes that ultimately ontologies will fall to folkonomies, as the latter gain rapid acceptance because of their low cost and ease of use; I believe that ultimately interest in folksonomies will go the way of most memes, in that they’re fun to play with, but eventually we want something that won’t splinter, crack, and stumble the very first day it’s released.

What we don’t need are more cheap solutions, and ultimately, I find that Technorati Tags are a ‘cheap’ solution, though a compelling one, and useful for generating conversation if no other reason. And I don’t want to deginerate Technorati’s efforts with this, because I feel in the end Technorati is going to play a major role in our semantic efforts. Still, no matter how many tricks you play with something like tags, you can only pull out as much ‘meaning’ as you put into them.

What we need, instead, is a way of making richer solutions more accessible to people, and in that, I do agree with Clay–lower the barrier of participation. In the email list for the Identity Commons effort, the members talked about how the URL which serves as identifier within LID is also a URI, which forms the basis for XRIs, and how the group should look at ways of achieving synergy with this new effort. Rather than being disdainful, they sought to turn LID into an opportunity.

This type of attitude is what we need more of–how can we make the richer, more robust solutions available to folks like you and me. In some ways, FOAF, the ontology used to identity ourselves and who we know is an example of this because its very accessible to ‘regular folk’; yet its also based on a robust and highly interchangable data model, which means it could be easily merged with other data that shares the same identity.

One hell of a ride

Clay states that whether we’re supportive of folksonomies or not, they’re going to happen–we are in a kayak floating along a river of change:

It doesn’t matter whether we “accept�? folksonomies, because we’re not going to be given that choice. The mass amateurization of publishing means the mass amateurization of cataloging is a forced move. I think Liz’s examination of the ways that folksonomies are inferior to other cataloging methods is vital, not because we’ll get to choose whether folksonomies spread, but because we might be able to affect how they spread, by identifying ways of improving them as we go.

To put this metaphorically, we are not driving a car, with gas, brakes, reverse and a lot of choice as to route. We are steering a kayak, pushed rapidily and monotonically down a route determined by the enviroment. We have a (very small) degree of control over our course in this particular stretch of river, and that control does not extend to being able to reverse, stop, or even significantly alter the direction w’re moving in.

I consider that the difference between the ‘web’ and the ‘semantic web’ to be one based on ‘meaning’ alone, not on toys and attachments. If my opinon holds true, is the transformation of the web to the semantic web equivalent to a ride in a kayak? Pulled along by forces with little control over direction and speed?

I will concede to Clay the challenging, swift nature of the transport, but argue that only a fool would put themselves into a narrow sliver of wood, hide, or plastic on a raging river without training, accepting to fate to ensure we don’t end up smashed, bloodied, and drowned. And it’s equally foolish to believe that we can, somehow, with the right use of technology, exponentially derive complex meaning out of what is, essentially, flat data.

I agree with Clay that the semantic web is going to be built ‘by the people’, but it won’t be built on chaos. In other words, 100 monkeys typing long enough will NOT write Shakespeare; nor will a 100 million people randomly forming associations create the semantic web.

* No enclosured is not a real word, but should be because it adds more description of the effect than ‘enclosed’.

** Of ontologies, Clay writes …don’t get me started, the suckiness of ontology is going to be my ETech talk this year…, which is probably one reason my own proposal, which is diametrically opposite to Clay’s talk, was not accepted. Well that and I mentioned the ‘p’ word.

Archived, with comments, at the Wayback Machine

RDF Semantics

And it jiggles, too

I’ve been playing with mash-ups lately for the book, and at one point had to slap myself in the face to get me to Stop! Stop! Not another service!

straup at Flickr’s announcement of “machine tags” is significant, because, as he demonstrated, it really is the same as RDF, except without the scary name (and we’ll shoot the first person who mentions reification). Of course, now I’m looking at my mash-up examples for the book and thinking, like jello, there’s always room for more.

Speaking of integrating services and data, I still like RDF as XML. I can do things with it, such as load it into the browser XML Parser and manipulate the data using DOM methods. Unfortunately, I have to copy the RDF file, such as Dan Brickley’s FOAF file, to my home directory before using Ajax–it’s not packaged correctly for cross-domain browser access. It wouldn’t be difficult for any RDF/XML source to be packaged as end points for cross-domain access. Leaving aside issues of trust.

Danny, who points to a nice semantic/scripting challenge (but…iPod?), asked about RDF Turtle notation and Ajax, and sure you could use Turtle in XMLHttpRequest (XHR) requests, or as endpoints and dynamic scripting. All you have to do is either return it as text for XHR, or as a valid parameter in an endpoint (wrapped in a function call, and used with dynamic scripting). What we need, is an transformation between Turtle and JSON, and return Turtle formatted as JSON (and we have it). But I like RDF/XML because I can just cram it into the browser’s parser and use the DOM. Either XML or JSON works for me.

The Flickr API’s “machine tags” works, too, basically flattening triples and squeezing them skinny thangs into a JSON response; The API provides an endpoint, too, so that you can call it from the browser. If you’re as curious like me, as who would use the dc: namespace at Flickr, click the dc: button in the example page I linked earlier, and you’ll see the most recent cases. From the pictures, it looks pretty much like everyone.

Let me say that in the crowded field of photo services, Flickr just got all pretty and sparkly, and is still *Queen of the game.

Sparkly…sparqly…say…that gives me an idea

Yup. There’s always room for more.

Update Try out the end button, which pulls in the dc:subject from my RSS 1.0 file. Click on an option, and it searches for all the matching photos in Flickr. Of course, I’m the only one who has used dc:subject with Flickr…still.

Quick note: The example application I linked works on most browsers, but this is just a quick hack, for fun. It hasn’t been heavily tested other than me playing around, nor have I optimized the code. I haven’t tried it on IE 6.x or IE7 yet, me having ‘fun’ being the operative concept in this paragraph.

Bonus points: Kingsley Idehen: SPARQL, Tagging, Ajax,…

*What, you thought I was going to say King? Don’t know me well, do you?


Honest Cruft

When I went looking for a FOAF file to copy for my playing around with Ajax, RDF, Flickr, and so on, I immediately thought of Dan Brickley’s FOAF file, and once I had copied it locally, I just plugged it into my application, without validating the RDF/XML first. I did so with confidence because I knew that, if there was one FOAF file guaranteed to be cruft-free, it was Dan’s.

There’s more to ‘trust’ on the internet than is covered by openID: a person can create cruft and still be honest. What we need, as the number of services and data endpoints expand, is a way of attaching trust to the quality of a service–not to mention trust as to whether the service can be hacked and we’d be at risk using their data in our Ajax applications.

I have a great deal of trust with Flickr, but even when I was working on the book, one of their services went out, just for a few minutes, just as I was testing something. Still, I knew it would come back. Why? It was Flickr. The entire site would most likely be taken down before the API would be stripped–or Stewart Butterfield would be fired before he’d let it be stripped.

This is a measure of trust associated with how long a service will be available. If a service is pretty stable, such as Google Maps, or Flickr, or others of that sort, we can integrate such more heavily into our work. However, if the company is a startup, in trouble financially, well then, we better keep any integration at a surface level, ready to cut loose at any moment.

There’s issues associated with whether a service was meant for internal or external access. The recent JSON endpoint service, the Tagometer, wasn’t necessarily meant for completely open-ended use. I’m sure the organization won’t yank it, but…I’d only moderately integrate it into my applications, and keep a replacement handy.

How about ads? Payment? Google has always kept the door open for adding ads to Maps, but the company has said it would provide several days notice. Still: if our mashups, widgets, what have you become dependent on Google Maps, what happens when the ad drops?

We’ve focused so much on people and trust, that we’ve forgotten how much we’re putting our applications, our widgets, our web sites, and even our businesses at risk because of the services and data we’re tying into. What we need is an OpenID for data services: can this data be trusted, is this data trustworthy, is this data coming from the correct spot–hey, is this company going belly up? Does it have dangerous elements? Perhaps what we need is a trust scale we can apply to a service to determine how much we want to depend on such. ProgrammableWeb has a rating system, but let’s face it, that’s more a rank on the ‘coolness’ factor, than the stability, trust, and general warm and fuzziness.

Then there is the issue of our service requests: how about a ‘signature’ we can attach to our requests? Hi, this is Shelley passing through. No worries, I’m not a spammer. Looks like this one has been asked at the OpenID forum. It would be nice to have an API key that I could use with all services. More importantly, though, I’d like to establish a level of trust, so when I hammer the service, hopefully those who are monitoring the service see it’s only me, and I wouldn’t hurt a fly.