Technology Weblogging

Bad Webloggers. Bad.

Recovered from the Wayback Machine.

As you can see, I’m still getting pingbacks, even with removing the link to the pingback server from my page header. The reason for this, most likely, is because in the WordPress code somewhere, my site is responding affirmatively to an XML-RPC request, and the pingback is then sent. I’ve since moved the xmlrpc.php file elsewhere, though this means I can’t remotely post for now. But I rarely do anyway.

The pingbacks are from a post that Jonathon Delacour wrote on the recent trackback and nofollow issues, over at Writable Web, the new weblog he’s writing in conjuction with Marius Coomans. In this writing, Jonathon provides a nicely done comparison of pingbacks and trackbacks and how the two have become somewhat synonymous in most webloggers minds, primarily because of trackback autodiscovery. He also covers the new nofollow attribute, automatic addition of in weblog tools such as TypePad has led the spammers this last week to basically hit webloggers across the nose with a rolled up newspaper, going “Bad, webloggers. Bad.”

In the meantime, here’s a surefire method of preventing comment spam:

Open up robots.txt, or create one, and add the following two lines:

User-agent: *
Disallow: /

It could take a couple of months, but eventually you’ll find you’ll have no more comment spam. Of course, you’ll have no Google or other search engine pagerank, either. But why bleed pagerank out of the weblogs slowly with nofollow, when we can do it quickly with robots.txt?

Seriously, bite the bullet, cut the cord, and be comment spam free. Isn’t this what everyone wants?

RDF Semantics

Accidental smarts

Recovered from the Wayback Machine.

Responding to the recent discussions about folksonomies and tags, AKMA was forced to make a confession: he is tags challenged:

“…I should pause to say that I’m not a natural for “tags.” I’ve hardly ever used tags. I didn’t begin tagging my pictures for flickr for ages; even now I’m liable to tag pretty cursorily (no, I don’t mean “with a computer pointing device”). I don’t use categories in my own Moveable Type posts, although the Seabury site that used to be (and may someday live again) integrated categories into its architectural rationale. And once I started thinking about tags, I felt chagrined; the folksonomized Web that David envisioned, that Kevin and Stewart and all had begun to implement, presents such a tremendous opportunity — but here I was, too lazy to tag. I had worked on my to care about valid mark-up, and I emphasized this aspect of the Seabury site. But I just wasn’t sure I had the determination to add Technorati tags to my posts. You’re too polite to complain, but I get long-winded — how would I tag my monologues without repeating most of the words? I was going to be a stick between the spokes of the organic semantic Web, when my friends were building and turning the wheels.”

Knowing AKMA for a few years now, lazy is not word I would have used to describe him. Dave Winer responded, comparing categories/tags to the old PIMs, writing:

Users got all excited about them too, and set them up imagining how great it was going to be to finally have an orderly life. They happily entered appointments, until they spaced out or got lazy and didn’t enter one. All it takes is one for the excitement to turn to guilt…The category stuff works the same way. At first I delighted in the ease of routing stuff to categories. Eventually I would only route to one or two categories, and then I stopped altogether. Not because it wasn’t easy enough, but because the guilt had taken over.

Knowing Dave for a few years now, feeling guilty is not a phrase I would have associated with him.

All joshing aside, among those that responded to both writings, Dan Bricklen wrote that Instead of making you feel bad for “only” doing 99%, a well designed system makes you feel good for doing 1% and proposed …another design criteria for a type of successful system: Guiltlessness. Ross Mayfield had an interesting take saying that guilt is good:

Perhaps a system isn’t social if it only has first order commons dilemmas (governing the resource) and doesn’t support management of the second order (governing each other). When a group explicitly forms around a tag, guilt may come into play (for example, shame on you people for not posting really ugly and fairly pointless parking lot photos!), and that’s not necessarily a bad thing.

Though both Dan Bricklen and Ross Mayfield had excellent responses to AKMA’s and Dave’s writing, I kept returning to AKMA’s statement of I’m not a natural for “tags.” I don’t think that AKMA is lazy when it comes to tags and categories, as much as he doesn’t see the magic that will bring all the pieces together, and he later expanded on his earlier writing, saying something to that effect. In that post he agreed with the limitations of library systems, based on controlled vocabularies, and also agreed that a bottom up folks-based approach might be better, but, as he wrote, …we haven’t turned up the device that’ll kick that engine into gear, not yet.

It can cook your pop tarts without burning the edges

If you’ve ever watched the movie “Twister” one of the better lines in the movie was delivered by Bill Paxton’s girlfriend, the reproductive therapist Melissa Reeves. When faced with the odd barrel shaped device with all sorts of gizmos on it, surrounded by beaming, happy strange people, she responded with, “Wow, it is great.” After staring at it a moment longer, she gets a pained expression on her face and asks, “What is it?”

Give that woman a weblog, she’s discovered the secret of meme!

Seriously, with the mixed bag that is weblogging you have people who see a new innovation and go, “finally!” while others look at the same thing and go: “Will it make the comment spam go away. I don’t want to hear about it unless it makes the comment spam go away. Where is the ‘Kill all spammers’ switch?” The rest fall somewhere in-between.

When I read AKMA’s statement, my first thought was, what do we want this engine to do. In other words, what does each of us expect to get out of tags and folksonomies?

Returning to David Weinberger’s after dinner speech, again, another item that caught my attention was how David perceived tags. Tags, to him, were more than a way of routing around the Dewey Decimal System–they were a way for him to keep up with as many writings as possible on specific subjects. So, for instance, David subscribes to the taxonomy tag at delicious, and with this, he’s able to see what new items pop up under this designation.

I thought this was a very compelling reason for tags–enough so that I subscribed to feeds for several tags in delicious. Through these I was able to find several new resources, including some referenced in this writing. And I found them again, and again, and again, and…

Déjà vu all over again

Tags by themselves aren’t useful for anything; it’s how they’re used in support of other services that makes them more interesting. For instance, delicious (or more properly provides a bookmarking service that happens to use tags. If you’re interested in a site or a specific page, you add it to your bookmarks, annotating it with various tags. Your bookmarks are public, so anyone interested in those tags is also, then, notified about the site. A fascinating study of distributed interest, and seemingly a great way of discovering gems hidden in the shadows of the online giants, since any link has its moment in the sun, so to speak.

However, the very nature of the concept of ’shared bookmarks’ means that the more successful a writing, the less signal per noise ratio you get. For instance, in the folksonomy category, a new, popular piece can effectively wipe out any other entry from the ‘top page’ because so many people add the site to their bookmarks. And if you subscribe to the feed, you can be treated to link after link to the same resource, added by different people. Additionally, if you’re like me and subscribed to many similar topics, you’re treated to the same link showing up in other feeds. After a while, you don’t think the Guardian’s new flickr article is all that particularly interesting.

In addition, there is nothing in delicious to indicate whether a URL is fresh or not, as I found when a link to a five year old article appeared under the RDF tag. This could be considered more of a perk than a problem– especially if it provides visibility to older works that may no longer be on today’s radar but are still valid. However, if you’re expecting fresh content, older links could clash with your expectations and may even decrease the value of the feed.

Another issue having to do with delicious really has to do with all of the tag-based services and that’s agreement as to which tag gets attached to what item. And who decides what is the ‘right’ use of a tag, in a system with few gatekeepers?

Fleas on a Dog’s Back, and Furl, are bookmarking services, used by people to publicly share their reading lists. How the lists are organized is through tags, and this is what connects these systems with other tag-based systems, such as Flickr and Technorati. These latter, though, are primarily focused at us tagging our own work; I can add ‘folksonomy’ as a tag to this post and it would show up in Technorati tags under Folksonomy. In addition, if I used folksonomy with one of my existing orchid photos in Flickr, it would also show up.

Though primarily of interest to individuals, there’s nothing inherent with either Flickr or Technorati tags that prevents other from adding tags to those we use. For instance, consider how tags are used in Flickr. Most people who use Flickr are more interested in a way of providing access to photos for friends and family, and tag the photos accordingly. Their interest isn’t in how the public perceives a tag, but how they perceive it. But since Flickr provides the ability for others to tag photos, you can, as Stewart Butterfield said in an excellent interview by Richard Koman, …upload a photo and go to sleep, and in the morning there are tags all over it.

I don’t have a large network at flickr, but since I’ve added “folksonomy” as a tag for my one orchid photo, others have since added “tag”, “tags”, and “SillyTag”. Now, I added “folksonomy” as a tag for the photo because it shows plant roots overlying the images of the plants themselves, and symbolizes the ‘grassroots’ nature of folksonomies. However, others added tag and tags probably because these are terms related to folksonomies, and most likely because they were indulging in a fit of playful mischief. We can safely assume this is the impetus behind “SillyTag”.

Now I can delete these other tags, but I won’t because I know the context of their usage, and in this context, they are meaningful to me, and to those others who share the history in regards to this example. Of course, to someone who doesn’t know me, or the others, all of these tags probably seem very puzzling. More, if this person searched on “folksonomy” in Flickr or Technorati Tags, they are going to be confused about this orchid photo showing up in the midst of conference snapshots and diagrams written on whiteboards.

One term, one understanding, and many different but legitimate reasons for attaching the tag, hence a valid folksonomy example; but to someone coming in without the proper context, I’ve just decreased the signal to noise ratio of this tag.

Still many are willing to accept the seeming chaos of expectations about tags, in favor of their dynamic and open capabilities. Though aware of the challenges associated with tags, in his in-depth essay on folksonomies and tags, Adam Mathes prefers to focus on the positive benefits of getting users involved in defining metadata:

A folksonomy represents simultaneously some of the best and worst in the organization of information. Its uncontrolled nature is fundamentally chaotic, suffers from problems of imprecision and ambiguity that well developed controlled vocabularies and name authorities effectively ameliorate. Conversely, systems employing free-‍form tagging that are encouraging users to organize information in their own ways are supremely responsive to user needs and vocabularies, and involve the users of information actively in the organizational system. Overall, transforming the creation of explicit metadata for resources from an isolated, professional activity into a shared, communicative activity by users is an important development that should be explored and considered for future systems development.

Following on this empowerment theme, Nick W from ThreadWatch writes Simply put, tags are important because they allow your users to generate content and classify that content in their own unique way.

I did it m-y-y-y-y-y w-a-a-a-a-ay-!

In the original discussions related to tags, Allan Jenkins wrote about the issue of tags and weblogger discipline:

I keep running up against two issues.

First, since tags are self-applied by tens of thousands of Flickr users and other bloggers, I suspect we are bound to end up with common categories too large to be useful (Parties, Dogs, NewYork) and, because no one need agree to any one taxonomy, a plethora of tags that refer to the same thing (insulinpump, insulin_pump, insulininjectiondevice).

Second (but related) is how we bloggers can discipline ourselves to apply tags judiciously; moreover, how will and should tags affect how we design blogs. For example, Technorati already interprets Typepad categories as tags. Does that mean Typepad bloggers should drastically expand their category lists? It would seem to be a good tagging idea, but it would also render “categories” fairly worthless.

Rather than muck around with my categories, most of which would definitely generate ‘noise’ in tagging, I tried something new: I introduced a tags-based systems as a way of grouping discussion about a topic and doing away with trackback. I proposed this approach for two reasons; the first is that trackback is now being badly spammed, and shopping for alternatives seems like a feasible activity, especially considering that we never really used trackback that accurately in the first place; the second is that tags can loosely join several separate resources around a specific topic, and to me that’s the original intent of trackback.

Instead of trackbacks, I said, we’ll create ‘tagbacks’ and then use Technorati, and other tags services, as a way of tracking related information about the post/discussion.

To demonstrate, in the post covering this new concept I created a unique tag called , based on the title of the post, but also covering the basic concept of the discussion surrounding the post: it was about me introducing a new tags-based discussion thread tracking system called tagback. I then pointed a couple of photos at the topic, by using the same tag in Flickr, as well as some additional related material by creating bookmarks in delicious and Furl. Others picked up on the concept, adding new entries to delicious, as well as using the Technorati tag.

Though there is interest in this idea, others were concerned about the use of tags in this way. For instance, if we were to create individual tags for individual posts, we would looking at running into tag name collision eventually. Even if I were to ‘namespace’ my tags, as I have by placing a ‘bb’ in front of the name, it’s still very conceivable that BoingBoing, another ‘bb’ weblog, could define a tag equivalent to one I’ve created. So intermixed with post after post about technology and hiking, would be the odd post on copyright and sex.

However, the biggest concern expressed on the use of tags in this way is that this may violate the concepts behind tags. In comments, Hans Gerwitz wrote:

This feels like an abuse of tagging, in that you are programmatically generating tags that are far too specific to contribute to the ecosystem.

If I browse the tagonomy for trackbacks on I find that blog, spam, and even politics are related. The bbintroducingtagback tag is not likely to ever bubble up to relevant status, though.

Moreover, these “manufactured tags” are never going to be stumbled upon by someone else tagging their own content; they will never contribute to the organic self-organizing soup of tagspace.

So, if these tags don’t play with the other tags, what purpose are they serving?

Simon Willison wrote:

Unfortunately, the very nature of tags is that they are designed to be shared rather than globally unique, which seems to make the concepts incompatible.

In Clay Shirky’s first post on folksonomies, he addressed concerns about synonym control and precision, writing:

Lack of precision is a problem, though a function of user behavior, not the tags themselves. allows both hierarchical tags, of the weapon/lance form, as well as compounds, as with SocialSoftware. So the issue isn’t one of software but of user behavior. As David pointed out, users are becoming savvier about 2+ word searches, and I expect folksonomies to begin using tags as container categories or compounds with increasing frequency.

In response to Clay’s writing, Thomas Vander Wal, the originator of the term folksonomy wrote:

The narrow-folksonomy, where one or few users supply the tags for information, such as Flickr, does not supply power tags as easily. One or few people tagging one relatively narrowly distributed item makes normalizing more difficult to employ an tool that aggregates terms. This situation seems to require a tool up front that prompts the individuals creating the tags to add other, possibly, related tags to enhance the findability of the item. This could be a tool that pops up as the user is entering their tags that asks, “I see you entered mac do you want to add fruit, computer, artist, raincoat, macintosh, apple, friend, designer, hamburger, cosmetics, retail, daddy tag(s)?”

Since this time Flickr has added the ability for friends and family (and possibly contacts) to add tags, which gives Flickr a broader folksonomy. But, the focus point is still one object that is being tagged, where as has many people tagging one object. The broad-folksonomy is where much of the social benefit can be derived as synonyms and cross-discipline and cross-cultural vocabularies can be discovered. Flickr has an advantage in providing the individual the means to tag objects, which makes it easier for the object to get found.

According to both Shirky and Vander Wal, then, a compound tag consisting of terms, such as ‘bb introducing tag back’ is not only acceptable, it’s to be encouraged because it adds to the ability to find the item so tagged. And since others can use other terms to tag the item, it’s part of a broader folksonomy that can then be traced back to the item through a query that combines these tags; or by using the specific tag, bbintroducingtagback, as an alias to a tag query, such as tag+tagback.

But this does highlight a problem with folksonomies and tags, and one that may be leading to AKMA’s, and other’s, wariness of their usage: no one knows exactly what are the rules related to these objects and their aggregations. We’re making all this stuff up as we go along. Or, as the wise man said, “She who gets there first, wins”.

Path Cutters

I once wrote on an ingenious experiment in social-driven architecture, when the architects of a new building planted grass but did not put in sidewalks. Over time, paths were cut into the grass, and these paths were eventually cemented over. The premise behind the effort was that the people would determine the best, and most effective way to approach the building.

However, you don’t see this approach used elsewhere, and it isn’t just because builders are concerned about liability and access of the buildings while the paths are being trampled; it’s also in that these paths may not be optimum for all people. In fact, the paths may be optimum only for a certain segment of the population. For instance, men will more likely create more of an impression in the grass than women because of their heavier weight and stronger shoes; women may not attempt to tred in anything approaching unmarked grass because women’s shoes tend to be high heeled or less sturdy than men’s. Older people will also more likely follow even a hint of a trail over non-trail because it’s just plain easier, which means younger people will also dominate in the trail cutting. Finally, as a whole society frowns on marking paths into unmarked landscape and the ones who are most likely going to cut the path are either earlier comers, who have no choice but to walk on unmarked grass; or people who don’t care, either about society, or about the appearance of the landscape.

Ultimately, in the end you have paths marked by young guys who don’t give a shit.

What could be said of the paths could also be said for the use of tags and folksonomies. Either people will search out and follow existing tag usage, or they’ll go their own way; if their way has enough appeal, they then become the path cutter. The aggregations that result in tags, then, may not arise from a true representation of the people forming these aggregations. In other words, rather than represent a collective intelligence, folksonomies may reflect the tag equivalent of young guys, who don’t give a shit.

About the dominance of path cutters

One of the first actual conflicts related to tags was Rebecca Blood’s issue with the use of the MLK tag with a possibly offensive photo. This represented a conflict in culture between the person who tagged the photo and Rebecca. On issues of classification and culture, Danah Boyd addressed issues of folksonomies and culture, wrote:

What makes the tagging phenomenon utterly fascinating is that there is a collective action component to it. We love to see how people will come to common consensus on relevant terms. But part of what makes it valuable is that, right now, most of the people tagging things have some form of shared cultural understandings. The “in the know” groups using these services are very homogenous and often have shared values and thus offers valuable related links. This helps explain why Rebecca Blood is concerned about the MLK tags – they signify a lack of shared common ground. In tagging, quality is not just about ‘accuracy’, but about what cultural assumptions dominate.

Design questions then emerge. How do we deal with conflicting cultural norms as more people are engaged in the act of tagging? How useful are tags across cultures? Do we only gain value from collective-action tagging amongst groups of shared values? If so, how do we implement that? And what are the social consequences for explicitly delimiting culture online?

Since the use of tags is so new and folksonomies so limited, does this seem like a minor problem? A favorite example of controlled taxonomies, the Dewey Decimal System, is infamous for its Christian dominated classification system, which both AKMA and David Weinberger discussed. There are 88 numbers reserved for Christian topics, while Jews and Moslems get 1 number, each. The only reason the current system hasn’t failed by now, is because topics can be added as ‘decimal numbers’ within the system.

Leaving aside the offensiveness of a system that is so biased against non-Christian faiths, the DDS is an inherently imprecise and misleading system. It’s somewhat like the current pagerank system within Google, with an implication that more numbers implies a greater authority.

And leaving aside cultural differences, how does folksonomies scale in a multi-language environment? One of the most popular Technorati Tag pages is the one for Weblog. If you access the page, the first thing you notice is that many of the entries are in Chinese. Providing support for different languages then becomes an issue with folksonomies that are intended to go beyond one country’s borders, or beyond a single language.

Even if tag systems follow Wikipedia’s use of different language domains, there are issues within the different languages that may make the formation of folksonomies from simple tags difficult or even impossible. Peter van Dijck wrote:

This post is about folksonomies (tagging), and how it might be really hard in Japanese. This is mostly speculation at this point, please comment or email me if you speak Japanese.

On the Sigia-L list, Fiona Bradley writes: “I don’t know Cantonese, but I have just started to learn Japanese and it’s not necessarily that the definitions of emotions are different, just that they are a lot more complex than in English once you factor in politeness levels and directness. And then there’s all the complications that arise from having many Kanji to choose from and many readings for each. If you’re just assigning a single word to a photo for instance, with no other words to define context, that may make the system quite difficult to search.

ButtUgly, expanded on this:

There are three cases of “language collision” on tags (I’m using English and Finnish as an example only here).

1. The tag is different in English and in Finnish. For example “fishing” and “kalastus”. This should pose no problem, as the folksonomies grow on each of the tags independently.
2. The tag is the same in English and in language Finnish, but the meaning of the tag is different. In this case, the dominant mass of the users will “hijack” the tag.
3. The tag is the same in both languages, but the web pages will be in different languages. This is the case with things like trade marks (Apple, Macintosh, Nokia), or when people like to tag Finnish pages with English tags (like me: I use the word “blog” to mark any significant articles about blogs, regardless of the language). This reduces the usefulness of tags for people who do not understand Finnish.

There is also an additional tagging problem with languages such as Finnish: the same word can be conjugated and written in multiple ways, depending on the context. It is somewhat the same as the problem of using different words for the same concept, but it does make the number of potential strings increase three-fourfold.

The discussion has been centered around the cultural bias in tags. However, the very concept of folksonomies–spontaneous aggregations of keywords–is itself based on bias, formed from a specific culture, which tends to be male, western, with English as a native language.

Scary stuff, when you consider people, such as Jeff Jarvis, have become interested in tagging people:

It’s time to tag people.

This comes out of David Galbraith’s one-line bio and out of arguments I’ve made over time that the real future of classifieds is a generation beyond Craig and Monster: It’s a distributed world where resumes and jobs (or men seeking women and women seeking men) live anywhere and they are found and matched by some specialized successor to Google that uses tags (e.g., work status, education, location, languages…. or smoker, nonsmoker, single, divorced, great personality). In that world, in essence, people, ads, and content are all tagged.

Finding the love of your life through tagging. I would rather gnaw my own leg off then live in this world. And I think we can safely assume this isn’t the folksonomy engine that AKMA is seeking.

If we reject the idea of folksonomies bringing us closer to potential mates, the patterns reflected in the popular aggregation of words are seen as the next step to bringing up closer to artificial intelligence–a concept I call accidental smarts

Accidental Smarts

I liked what Joshua Porter had to say on folksonomies and tags:

Tagging, by itself, does not a folksonomy make. It is possible, as Clay Shirky has pointed out, to tag things without creating a folksonomy. Tagging is simply an explicit activity that people can do to add metadata to content. It is common. Information architects tag things. I tag things on my computer. Every web page consists of dozens of tags (albeit with little meaning). In general, creating metadata includes a lot of tagging. Tagging as an activity is neither unique nor special.

Because we can aggregate tags, however, we can build a taxonomy out of them. More specifically, we can build a taxonomy out of the patterns we see in how people use tags. It is this act of aggregation, and not the act of tagging, that give folksonomies their power. Without aggregation, tags are just tags, with no meaning beyond the local meaning that each user gives to their own set.

It is the aggregation of tags that gives folksonomies their power. Yet aggregations of tags are based on certain understandings of language, and there is a great deal of imprecision with langauge–even after discounting culture and focusing primarily on English.

A recent Slashdot article discussed the work of two scientists who are looking at Google results as a way of communicating with machines. According to the New Scientist article on their work:

Computers can learn the meaning of words simply by plugging into Google. The finding could bring forward the day that true artificial intelligence is developed.

The problems associated with natural language processing of the English language have to do with certain types of homenyms, words that sound and are spelled alike, but with different meanings, and heteronyms, words that sound and are different, but are spelled the same. For instance, “wind” could mean “wind the clock” or it could mean “a strong wind was blowing”; a “rider” could be associated with a horse, or with an insurance policy. The only way to differentiate these words is the context, and computers don’t handle context well.

The two scientists, Paul Vitanyi and Rudi Cilibrasi, have tapped into Google’s database, analyzed search results, and created what they call the normalized Google Distance or NGD. This is the factor that measures the logical distance two words have to each other. The more closely associated, the larger the number, all based on searching for the pair of words in Google.

Within this system, “hat” has a greater number of matches with “head” than “banana”, so the context of this search supplies information that a hat usually has more to do with a head than a banana. Combining all of these searches, and eventually, it is supposed, a computer could search its way to smarts.

Of course, there is more to language than associations of words returned from a Google search. For instance, if a computer is researching the context of a word, “bush”, it is more likely to assume that a bush is a human than a plant: there are 4.5 million matches for “bush” and “tree”, but 9.8 million matches for “bush” and “leader”. Not only that, but the “bush” is a “bad” (over 10 million) person, at that.

It is this attempt to extract semantics out of incidental associations, to get more meaning out of a system than we put into it, that is the basis for accidental smarts.

In addition to their own challenges with homonyms, folksonomies have an additional problem, and that’s synonyms: different words with the same or similar meanings. An example could be ‘cat’ and ‘kitty’. However, Clay Shirky rejected this as an issue early on, writing:

Synonym control is not as wonderful as is often supposed, because synonyms often aren’t. Even closely related terms like movies, films, flicks, and cinema cannot be trivially collapsed into a single word without loss of meaning, and of social context. (You’d rather have a Drain-O® colonic than spend an evening with people who care about cinema.) So the question of controlled vocabularies has a lot to do with the value gained vs. lost in such a collapse. I am predicting that, as with the earlier arc of knowledge management, the question of meaningful markup is going to move away from canonical and a priori to contextual and a posteriori value.

Interesting reading, however, this is nothing more than verbal sleight of hand. Clay is breezily questioning the concept of synonyms rather than directly face that synonyms are one of the many issues facing folksonomies, rejecting any concerns with promises of the gold ring – the value gained.

The same issues that are going to impact on the precision with the Google Searches will eventually impact on the precision of folksonomy searches. Yet we think we can look at how people use tags, and from there map human thinking. In a new paper, Jakob Lodwick wrote:

According to Scientific American, in 1966 Ben-Ami Lipetz concluded that:

…breakthroughs in information retrieval would come when researchers gained a deeper understanding of how humans process information and then endowed machines with analogous capabilities.

Well, Ben was right, as you’ll soon see for yourself. By looking at how we tag photos on Flickr, we can understand how humans process information. Once we understand that, we can understand how to model it with computers, thereby creating better information retrieval systems.

What Ben was unable to predict all those years ago was that we will not only develop better information retreival systems, but also model our own brains on the lowest levels, and eventually create artificial intelligence.

One only has to glance at the use of ‘tag’, in Technorati itself to see the impact of our different interpretations, and contexts, of this simple, single word. In delicious, a ‘tag’ is usually associated with folksonomies, but not always; in Flickr, it’s almost always associated with graffiti, but not always. And in weblogs, through automatic translation of categories into tags, it’s associated with a recipe using asparagus. Perhaps before we teach computers how to think using folksonomies, we might want to take a closer look at how we think with folksonomies.

tag=the end

If there was an award for longest weblog post, I think this one might be a contender. And that’s after I finally deleted four sections because I could no longer manage the post within the weblog tool. A little bit of work and I have the start of a book.

My first reaction to tags and folksonomies was, “Oh what silly new thing have they come up with now.” Associating keywords with bookmarks in a publicly shared venue and hoping to extract the meaning of the universe from the cross-section of terms does stretch one’s credulity to the max.

However, when I was thinking about what I could use to replace trackbacks after I pulled support for them, the concept of tags was the first thing that came to mind; not at a macro level, with global significance; at a small level, an intimate use of the concept.

Of course, not every one agrees with my use of tags and folksonomy for something so specific and mundane, as demonstrated by some of the entries associated with the the delicious bbintoducingtrackback (one playful, one less so). But even these entries demonstrate the benefits of the approach, as these imprecise uses of delicious tags impact more on the credibility of the entries than on my idea or me–a perk of tagback over trackback.

I said in the previous posting on tags and folksonomies that a million monkeys randomly typing were not going to write Shakespeare; and a hundred million people randomly assigning tags to objects were not going to create the semantic web. I still believe this–the semantic web will never arise spontaneously from random acts on random data. But I think that tags and folksonomies can be useful, all the same. If we stop jumping up and down about what they’ll do in the future, and focus on making them work, now.

Now if I can only convince AKMA to use tagbacks…


All tagged out

I released my latest way too large opus on tagging and folksonomies and will most likely take a break from the concept for awhile. (Cheap Eats at the Semantic Web Cafe was the first.)

While I don’t share the wild enthusiasm that all we need are tags and folksonomies to fuel the semantic web, I did find tags very handy for being a replacement for trackback, and plan on using tagback from now on. The nice thing about it is that I can use tagbacks right now, with existing technology, and I don’t have to convince anyone to support anything.

I’m off for a while on a road trip, exploring inner and outer highways and byways, but when I return, finally the other updated chapters for Practical RDF. Love live the evil ontologies and taxonomies.

Semantics Social Media Weblogging

Introducing Tagback

Recovered from the Wayback Machine (includes comments).

The purpose of Trackback initially was to ping the readers of another’s post about something they may want to know about. Of course, we immediately started using it as a referrer link (“Hi, I linked to you!”)

So, we’re dropping trackback and we need something in its place. I provided the how-tos to add Blogline citations and Technorati links in the previous post, and these will provide you a listing of who has linked to the article directly. But that’s the limitation: these solutions are dependent on a link. How can we point a person’s readers to another post or article, without linking to the post directly?

Easy: Tagback.

For each post, I create a tagback consisting of the words of of my individual post, stripped of white space and dashes, preceded by ‘bb’ to differentiate my posts from other people’s posts. I also include a link to the Technorati tags page for this tag, which forms my ‘tagback’. You can see the tagback for this post at the end.

Now, you can either use the tag with a photo in flickr, or you can use it in to annotate any bookmark: your post, another person’s post, an article, a reference to a specification, whatever.

Since Technorati scarfs up delicious tags and flickr tags, all of these items will eventually appear in my Tagback page, along with weblog posts where people have linked to the tag directly in the post. And if Technorati excludes googlebots and other bots in the tags pages, thereby denying any pagerank to the tag pages, there is no incentive for spammers to spam this page.

As long as Technorati denies pagerank for the individual tag pages. Hint. Hint.

Now, regardless of what weblogging tool you use, including Blogger, WordPress, Movable Type, Typepad, ExpressionEngine, whatever, you can participate in discussions, and without having to install any code. Just use whatever tags or function calls you use in your weblogging tool to get the title, and create your own version of a tagback. Or you can manually create a tag for each post you’re interested in designating as a ‘to be discussed’ item, and leave it off from those posts you don’t want to create a tagback page for.

So, you guys were right – tags are handy. I could get the hang of this folksonomy stuff.

I did have to update the code to strip out dashes, and just create a one word tag. I don’t like it, but flickr can’t deal with dashes, and it seems like wants to use spaces, and Technorati seems to not care. Since there is no standardized word delimiter with all of these systems, I just stripped out anything that isn’t a alphanumeric character.