Boys with Toys

Along with others, I also read Clay Shirky’s Power Laws, Weblogs, and Inequality. However, unlike most others, my reaction to Clay’s newest gem was to go, “What a load of hooie”.

Don’t get me wrong, I think Clay’s sharp as a tack and smart as a whip. (Do I need any other weapon-like metaphors to make my point?) He’s a great speaker, and knows his technology, and loves what he does, and I respect that. But he has one failing in regards to his viewpoints as to social gatherings: he’s an elitist. He believes there will always be an ‘elite’ grouping within any society, something I don’t necessarily discount; however, from his writing and actions, he also tends to facilitate the mistaken belief that social groupings must follow fixed statistical patterns that support a static elite and that we must all behave as the statistics dictate. And I say, what a load of hooie.

Clay references Pareto’s work in wealth distribution, showing that 20% of the people control 80% of the wealth. He writes:

Power law distributions, the shape that has spawned a number of catch-phrases like the 80/20 Rule and the Winner-Take-All Society, are finally being understood clearly enough to be useful. For much of the last century, investigators have been finding power law distributions in human systems. The economist Vilfredo Pareto observed that wealth follows a “predictable imbalance”, with 20% of the population holding 80% of the wealth. The linguist George Zipf observed that word frequency falls in a power law pattern, with a small number of high-frequency words (I, of, the), a moderate number of common words (book, cat cup), and a huge number of low-frequency words (peripatetic, hypognathous). Jacob Nielsen observed power-law distributions in website page views, and so on.

Clay equates these versions of the Pareto curve with weblogging popularity, as a measure of weblogging elitism. In his first figure, which I copied under Fair Use and replicated here, he shows a curve that plots the number of incoming links as a function of popularity. This is, Clay assures us, a demonstration of weblogging popularity as mapped to a power law distribution.

figure1.gif

(Clay pulled the figures from NZ Bear’s old blogging ecosystem work, an effort that is now defunct, and still alive and well as the Blogging Ecosystem.)

At first glance, Clay’s diagram does demonstrate the traditional curve that marks both Pareto and power law distributions. However, Clay pulled his data from a tainted source, and then compounded the error by an extrapolation that hasn’t been born out in observed behavior.

First, the tainted data. NZ originally polled his ‘starting’ weblogs based on his own viewing patterns, which tend to reflect his warblogger interests. This created a bias towards warblogging weblogs. As NZ wrote at the time:

So, after a few days of screwing around with lots of different tools, I found a way to do it. The methodology, in a nutshell, is this: I started with a fairly large list of about 175 blogs; mostly, I stole from Instapundit and Vodkapundit’s lists, since they are pretty comprehensive, especially when taken together. Then, I built a process to do the following:

– Download the front page of each blog to my local machine
– Scan through each page and extract every link (URL) found in the HTML of the page
– For each of the original list of blogs, scan through the total link list and count how many links go to that blog
– Sort the list of blogs in descending order of their number of inbound links, and include the number in parentheses next to the blog link

NZ’s work was never based on the random sampling necessary in order to make a sound statistical measurement. Tainted data leads to a tainted statistic.

However, even if NZ’s earliest work had been based on this sampling, Clay’s extrapolation about ‘links’ forming a power law distribution is not borne out by an examination of the existing Blogging Ecosystem, which shows that the power law distribution tends to favor tools and mainstream media links over weblogs. Of the top ten link earners, only two, Scripting News and Boing Boing belong to webloggers. The rest belong to Moveable Type, Blogger, CNN, Google, and so on.

If we were to start with untainted data and then filter it to exclude anything other than weblogs, the results are not as static as Clay’s hypothesis would suppose. He wrote:

However, though the inequality is mostly fair now, the system is still young. Once a power law distribution exists, it can take on a certain amount of homeostasis, the tendency of a system to retain its form even against external pressures. Is the weblog world such a system? Are there people who are as talented or deserving as the current stars, but who are not getting anything like the traffic? Doubtless. Will this problem get worse in the future? Yes.

Using my own behavior as a guideline, perfectly acceptable if I view myself as a statistical subject, I started out linking primarily to the more well-known webloggers. However, over time, I found other weblogs and webloggers who I tended to read more and more and appreciate more than the so-called elite webloggers. Most of these people I met in my comments, and in comments on other weblogs. As I added more of these people to my blogroll, and linked to them in my postings, I tended to link to the elite bloggers less and less because I found that I just didn’t read them as much. In other words, as my experience level increased in weblogging, my reliance on linking to a set group of elite bloggers decreased.

If you look at my blogroll now, you only find a few of what can be termed ‘elite bloggers’ (if elitism is a measure of incoming links as measured in the Blogging Ecosystem and Technorati and elsewhere). My blogroll reflects what is an unmistakable human trait — my tastes have changed, my interests have matured, some people have quit, while others have gone in directions I’m not interested in pursuing.

What Clay doesn’t factor into the equation is that unlike Pareto’s work, based on a closed system with finite resources, weblogs are neither closed, and links are neither finite nor fixed.

Even without all these statistical games, Clay’s observations are just not borne out by practice. Quoting his conclusion:

At the head will be webloggers who join the mainstream media (a phrase which seems to mean “media we’ve gotten used to.”) The transformation here is simple – as a blogger’s audience grows large, more people read her work than she can possibly read, she can’t link to everyone who wants her attention, and she can’t answer all her incoming email or follow-up to the comments on her site. The result of these pressures is that she becomes a broadcast outlet, distributing material without participating in conversations about it.

Meanwhile, the long tail of weblogs with few readers will become conversational. In a world where most bloggers get below average traffic, audience size can’t be the only metric for success. LiveJournal had this figured out years ago, by assuming that people would be writing for their friends, rather than some impersonal audience. Publishing an essay and having 3 random people read it is a recipe for disappointment, but publishing an account of your Saturday night and having your 3 closest friends read it feels like a conversation, especially if they follow up with their own accounts. LiveJournal has an edge on most other blogging platforms because it can keep far better track of friend and group relationships, but the rise of general blog tools like Trackback may enable this conversational mode for most blogs.

In between blogs-as-mainstream-media and blogs-as-dinner-conversation will be Blogging Classic, blogs published by one or a few people, for a moderately-sized audience, with whom the authors have a relatively engaged relationship. Because of the continuing growth of the weblog world, more blogs in the future will follow this pattern than today. However, these blogs will be in the minority for both traffic (dwarfed by the mainstream media blogs) and overall number of blogs (outnumbered by the conversational blogs.)

What a load of hooie. Or as Dave Winer says, rightfully, and more diplomatically, Clay doesn’t understand weblogs.

What Clay doesn’t take into account is that many of the so-called A-List, or head bloggers, the ones that primarily link and comment, have always been the type of blogger who primarily links and comments. This isn’t a measure of their popularity as much as it is that’s how they started their blogging and that’s how they continue it. There are just as many webloggers who don’t have as many incoming links but are the “link and comment” type of weblogger.

This type of weblogging is a matter of preference, not time or popularity.

Clay also mentions that the ‘long tail’ of webloggers, those with the least links, will always be the ‘conversational’ bloggers. By this, I’m assuming that Clay means those webloggers who talk about their life, their interests and events in their lives, and who get into cross-blog and comment style of conversations.

What a load of hooie. I can’t count the number of times I read Dave Winer talk about what he had for dinner, or about his illness, quitting smoking, and his father’s illness. There’s been many a time I’ve gotten into cross-blog and comment debates with Dave, and others who are currently in the ‘Pareto head’.

In fact, about the only popular bloggers who never get into cross-blog or comment conversations is Andrew Sullivan and Wil Wheaton. To be honest, no real loss.

Looking at the top 100 weblogs in Blogging Ecosystem or Technorati — if you filter out the tools and the major publications, the vast majority of people in the top slots are all conversationalists.

A person not having comments does not mean they don’t get into conversations. Many a so-called non-conversational and popular weblogger has spoken up in comments, mine and others, more than once. I’ve even had a cross-blog conversation with the Great Pundit, Glenn Reynolds himself, a couple of times. Mark Pilgrim, Dave Winer, Anil Dash, Chris Pirillo, VodkaPundit, Chris Locke, Jon Udell, John Robb, Jason Kottke — these are ‘popular’ webloggers (as measured by incoming links in the systems that measure these sort of things) and you couldn’t shut any of these people up if you tried because they want to be part of the conversation.

Most of the webloggers with the highest incoming number of links thrive on conversation. It’s our drug of choice.

As for this “At the head will be webloggers who join the mainstream media…” This reminds me so much of the parable of the elephant and the six blind men. If you only read Glenn Reynolds, your view of weblogging is that webloggers link and comment and then get good jobs as journalists. If you only read Dave Winer, your view of webloggers is that they link and comment, write an occasional longer essay, and get a job at a prestigious university. If you read Doc Searls mainly, your view of webloggers is that they’re professional journalists who link a lot, but also write a lot and tend to lose things a lot (which is unfortunate).

If you only read Boing Boing, your view of webloggers is that they link and comment and write science fiction, which they publish online for free access. If you only read Jon Udell, all webloggers are technical.

But where does Mark Pilgrim fit into this? Mark’s an all over the board blogger and he’s a ‘popular’ blogger from the statistics. How about Big Pink Cookie? Christine is about as conversational and personal and connected with her audience as you can get, and she’s popular. What about Anil Dash? Boy, can’t beat Anil for getting in and mixing it up with his audience. Anyone forget the time when Anil took on Little Green Football? How about Steven DenBeste? How about Michele from Small Victory? Or Davezilla for that matter, who’s one of the most eclectic people in weblogging?

Do any of these people — do any of us — fit into the statistical cookie cutters that Clay used in his effort to bake us into his weblogger cookies?

(Speaking of which, since this is a weblog: if you were a cookie, what type of cookie would you be?)

Two years from now, if I were to write this again, chances are that I’d be using the names of weblogs that don’t even exist now. Why? Because two years ago, many of the weblogs I just quoted didn’t exist.

Clay’s extrapolations based on statistical observations about webloggers is not validated by the empirical behavior of the webloggers. Or, in layman’s terms: We blow Clay’s hypothesis all to hell and gone. Clay has too much invested in his beliefs in static social patterns to open his eyes and look at what we’re doing. And that’s okay because we’re too busy doing what we’re doing to be all that concerned.

Archived with comments at Wayback Machine