Categories
Social Media

Update on the issue of links

Well, after my experiment of providing full feeds, I have found that Ice Rocket, Bloglines, BlogDigger, and Feedster are all picking up my links. Technorati has picked them haphazardly, and I’m not sure what BlogPulse looks for. Ditto with Clusty.

I had a couple of nice comments from folks at Feedster, and also a note from Blake Rhodes at IceRocket. I do hope that if these services look at one thing, it’s the importance of letting people know how to make sure they’re included in aggregation counts (as well as how to have their links picked up). A list is only as good as the data that feeds it. It would also be nice if they provide access to the data for our own interpretation. Even summary data would be helpful.

Personally, if I can get dynamic link counts from these services, I may try a run myself at randomly collecting static link counts, and try out my ‘popularity to influence’ ratio, just for grins and giggles.

Most importantly, I’ve heard from some of you about how happy you are that I’m providing full feeds. I hope that folks still continue to visit the site; otherwise, I won’t know who is reading any particular post. I especially hope that folks leave comments now and again. But I want my writing to be read, and if full feeds helps, then I’m for it. The full feeds stay.

Additionally, if you’re interested in knowing when I add photos to Flickr, you can access my photo syndication feed here. There’s also a RSS 2.0 syndication feed, and I’m assuming the Flickr folks are updating the Atom feed to the newly released 1.0 specification.

Well, that is until I release Eve 1.0. Then I’ll be using Eve 1.0 all the way.

update

Koan Brenner has been having an interesting time with Technorati and how links are accessed and valued. I think the introduction of tags into the discussion has clouded the issue, because as far as I know, tags have nothing to do with how links are accessed, stored, valued, or used in any ranking algorithm.

If they are used, then yes, Technorati has some flaws in its reasoning.

Bluntly, to folks who run these services: time to come out and tell us how outbound links are accessed and stored, and what factors could prevent them from being recorded. More, it’s time to think about full disclosure on ranking schemes. Dropping hints and tidbits in this post or another is just going to create that much more animosity.

If you are degrading links based on time, or other factors influenced by the tech people use, it’s critical that this information be disseminated. You’re basically penalizing people for not using technology in a way that you assume it should be used, and that’s a sucky way of determining ‘popularity’, ‘influence’, and, especially, ‘authority’.

As for disclosure of techniques and spamming — I’m not sure this is the same issue with weblog-related lists as it is with Google. It’s an issue, I’m just not sure it’s the same issue. This one could definitely use some more discussion.

Categories
Social Media

Bang bang bang

From the department of you’ve got to be kidding I give you RSS 3.0:

Welcome to the RSS Version 3 Homepage. This site strives to create expanded and complete standards for syndication of online content – more specifically, it aims to recompose the RSS Version 2.0 standard due to underdocumentation and lack of concern towards modern necessities. Our goals are to provide at least one complete standard for common use under the Attribution/Share Alike Common License.

Brought to you by Slashdot–the organization that has conclusively proven that there is no tire so old or bald that it can’t be ridden on one more time.

Best response, from post at Danny’s given by David:

*bang* *bang* *bang* (head against desk)

A close second in Slashdot:

Basically, it’s all a bunch of pointless dick-waving.

Categories
Weblogging

Links not wanted

Feedster released its own version of a link ranking system, Feedster 500. It matches previous lists, but also has a number of surprises.

Unlike other lists, or even link aggregators, Feedster has been very forthcoming about how it derives its list and, more importantly, how it finds the incoming links it uses as the key component of its list: it finds them in syndication feeds. This will explain why there are some unexpected results in this list. First, blogrolls are left out of the calculation, as they are not part of syndication feeds, or at least, not traditionally part of syndication feeds. Second, and this is the kicker, if you publish a syndication feed that doesn’t provide full content, then your links are not being picked up by the service and used in its calculations.

My links weren’t picked up. In fact, when working with my Linkers tool, and the more sophisticated Talkdigger, I have found that none of my links to other sites are being picked up by any of the services. And when I went looking for how the services work, none of the tools, other than Feedster, publishes its process to find links and/or other searchable material.

This is frustrating because if I don’t care about lists and ranks, I do care about letting people know that I’ve written something about their posts. Since I don’t support trackback anymore, the only way another weblogger will know I’ve made comments on their work is if they read my weblog regularly, someone else tells them about my post, I put a link into their comments, or they see my URL show up in their referrer logs. And with abuse of referrers, these are less than useful nowadays, or even unavailable for some webloggers.

Besides, I don’t want just the weblogger to know I’ve written about their posts–I want others to know, too.

Now I know how Feedster works and that if I want links to show up in that service I have to provide full content. I don’t want to do this, I’ve never wanted to do this but either I decide to blow off inter-weblog communication, or I provide full feeds. The question then becomes: what about the other services?

Supposedly Technorati uses the syndication feed if this provides full content; otherwise it grabs the the main page and scrapes the data. By accessing only the front page, if I use the -more- link to split a larger post into a beginning excerpt with a link to the individual page, the links in this split apart page are then not included. If I then want to have my links picked up from a post, I either have to make sure they show in the very first part of the post, or not use the -more- capability.

Even when I don’t use -more- capability, my links are not showing up in Technorati. Nor in IceRocket, nor in Bloglines, nor in any of the other services as far as I can see. Now, I’m beginning to suspect that most services now use only the syndication feeds, which means I’ll have to use full content for them, also. As a test, I’ve set my site to provide full feed for now, and I’m linking to several sites in and at the end of this post to see which service, if any, picks up the links.

Other factors that could influence the feed being picked up include me repeating my permanent link to a post in the title and at the bottom of a post; publishing links to weblogger’s URLs in my comments (which could trigger spam filters); not pinging weblogs.com or blo.gs; perhaps even the fact that I only support one feed type (RDF/RSS). Without knowing how each of the services process links, your guess is as good as mine.

If I’m frustrated with the services, I also know how difficult it is to collect ‘good’ data from a site, as separated from ‘bad’; how to determine which links are coming from the outside (a commenter’s URL) versus ones from the site author; and a static link (blogroll) from a dynamic one (one included in a page). I can respect the challenge involved even as I am critical of the results.

What would I do if I were creating a service like this?

First, I wouldn’t scrape weblogs off of the global services, such as weblogs.com. These are mined by spammers so badly now as to make them useless. What I would do is provide a ping service that a person could trigger manually, or through their tool if it provides this facility.

I would access the syndication feed, and if full content is provided, I would process this for data and URLS. Otherwise, I would access these URLs directly to pick up links. By doing this, I’ll also be accessing URLs in comments and anything in the sidebars, which is why most services don’t want to access the individual entries — but I’d rather be more liberal than not when it comes to gathering data.

I would also like to send a bot once a day to access the main page, just to make sure updates haven’t happened that haven’t been reflected in the feed, and to access the blogroll and other more static data.

At this point in time, we have a lot of data. Pulling blogrolls and other static links out of content isn’t that hard if you have the storage to maintain history and can compare if a link provided today was also provided yesterday. About the only time I would refresh this in the database is if the link changed in some way– it was there one day, not the next. Or the content in which it occurred changed (and this could require a way of annotating context of a link, which could be pricey in storage and computation).

One interesting way of looking at this is to remove duplicate links when it comes to aggregation for lists, but to refresh the item in the most recently updated queue if it shows in fresh content at the site being scanned. With this you don’t need to have much context, and if a person is interested in finding out who is talking about a specific post, these top-level links won’t show.

As for links for comments — here is where the vulnerability to spam enters, but using an algorithm to find and discard multiple repeated URLs could help to eliminate these. Looking for domains that have been determined to be spamming is also another approach. Sometimes, though, we have to accept that some crap gets through. I’d rather let a little crap through than to discard ‘good’ stuff–just because I feel I’m in some kind of war with the spammers.

It could help to annotate links for blogrolls and links for comment URLs and so on. Not that abysmal ‘nofollow’, but with something meaningful, like ‘commenter URL’ or ‘blogroll link’ or something of that nature. We do something like this with tags, and though I don’t care much for tags in weblog post, I don’t agree with Bloglines’ Mark Fletcher that tags generally suck–especially when it comes to effective uses of microformatting to annotate links.

(Speaking of which, what kind of a post is: I was going to blog something about how tags are bad, evil horrible bad, and highlight the failure of existing search technology, but I couldn’t muster the energy. High level message: tags suck and are unnecessary except in cases where no other textual data exists (like photos, audio or video). Discuss amongst yourselves.. How’s this: Bloglines is indulging in evil censorship of my communication because it doesn’t pick up the links from my posts. Discuss among yourselves.)

Unfortunately, microformats generally require some technical expertise on the part of the person using them, and to base any kind of measurement on this is irresponsible.

Once I have data that is reasonably clean and fresh, if I were to create a list, I would do one based on popularity versus influence, and I would differentiate these by the number of blogroll links for a site, as compared to the number of dynamic links. A person that has a large number of dynamic links compared to static blogroll-like links to me would be a more influential person (hi Karl) than one who has a fairly even ratio between the two. I wouldn’t mind seeing this ratio in a list rather than the counts — we could then find who is influential within groups, even if the groups are smaller. Regardless, I would also provide the raw data to others, and let them derive their own lists if they want.

Why give away precious data? Because by keeping the source of the data and algorithms open, I establish credibility. In addition, flaws will be found and smart people will provide suggestions for improvement. Most importantly, I give those who would be critical of any of my processes nothing to hook on to — the algorithms are public, and mutable; the data is available to all. I have, in effect, teflon coated myself with Open Source. I agree with Mary Hodder a hundred percent on the advantages of openness when it comes to data gathering techniques and processing, and providing access to raw data–but not just for ranking.

As for business model, well knowing the algorithms and having access to the data is one thing; being able to use these effectively, consistently, and in a manner that scales is the bread and butter of this type of technology. Google never would have been Google if it was slow.

Additional links:

Joseph Duemer is teaching a class in weblogging today. Welcome to weblogging, Joe’s colleagues. Just as an FYI, I’m on the Feedster 500 list, which makes me a weblogging princess. If I were in the top 100, I would be queen. If I were in the top 10, well, I would be a lot wealthier than I am now.

Someone who is in the top 100 is the Knitty Blog. Now, this site ably demonstrates the nature of influence over popularity — it’s not that it’s linked statically by a lot of sites; but it is referenced in a large number of posts. That, to me, is influence.

Dare Obasanjo just uploaded 50 photos from his recent trip home to Nigeria. What I want to know, Dare, is why you took so many photos of billboards?

Fulton Chain carries the best b-link bar there is: with links to stories that cover a range of topics, such as a praying mantis eating a hummingbird, and how to build your own homemade flamethrower. Then there’s the Ode to Rednecks. Come on down and visit me in the Ozarks. Hear?

And that’s about enough about linking.

Categories
Insects Photography

Monarchs

The rains finally came this last weekend. They blew in strongly on Saturday and took out the power for half the city, but I don’t think anyone minded.

I did lose my internet for several hours on Sunday. When I called in, I finally got through to a lovely woman with a charming Kentucky accent who told me that the reason I didn’t have service is that the power box for the cable was hit by lightning; the only reason the cable was still working was that a cable company worker was down at the station with a power generator in the back of his truck, keeping the cable going. The internet, however, required much more power.

With the rains has come cooler weather, and I’ve been able to get out for walks. However, with gas prices being the way they are, the walks are close to town. When did someone find the secret of alchemy and turn gold into gasoline?

I don’t mind walking close to home, though. There’s a gentle feel to the air — a softness we’ve been missing all summer. It’s almost as if we’re having a second Spring. During Monday’s walk at Powder, under a canopy of dripping green leaves, I came upon a half dozen bucks; to see one antlered deer is uncommon, and to see several at once was an unexpected treat.

And today I found the monarch butterflies. After all these years with trips carefully planned to Shaw and other places, without any success, I finally find my monarchs where I least expected them. Purely by accident — I had a couple of hours to kill before picking up my roommate at work and decided to go to Busch Conservation Area to take pictures of geese. When I arrived, the fields around the main lake were full of a delicate, pink flower (milkweed), freshly bloomed from all the rain, and busy among the flowers were hundreds of monarch butterflies.

monarch8

I grabbed my camera and raced from flower to flower taking pictures, sometimes stopping just to let the butterflies and bees fly around me, close enough to almost feel the movement of their wings. No one else was about, though I could hear creatures in the grasses and in the water of the lake next to the field. It was worth the summer, all dead and dry and hot bit of it. All of it was worth those few hours with the butterflies.

Needless to say, I have a lot of photos. Be forewarned.

monarch23

What was particularly funny was the interaction between the butterflies and the bees. The butterflies would usually have their wings folded up. As a bee approached, they would suddenly open their wings, *thwack*! And there would go the bee.

monarch21

Came home and watched two wonderful movies: Strictly Ballroom and IQ. Strictly Ballroom is an Australian film about ballroom dancing, and would seem to be the usual boy and girl against all odds movies, but it has some wonderfully campy movements. And I love Spanish guitar, not to mention the dancing.

What I liked in particular with Strictly Ballroom was the ending, which I won’t give away, other than to say that the dancing is all that matters.

And IQ, well, it’s sweet and gentle, and isn’t it a wonderful time to be alive? Wahoo.

butterfly5

(The above is a swallowtail butterfly — it wouldn’t stop moving, and kept fluttering it’s upper wings. Really graceful and beautiful creature.)

monarch9

monarch6

Categories
Connecting

Foobar

This is a real red letter day. It’s a day when I come out in defense of a Tim O’Reilly event, rather than the opposite. I’m sure it will be appreciated about as much as my criticism, which is to say not. Regardless, it is the fair thing to do.

The event is Foo Camp, and there’s some folk unhappy because they weren’t invited. Among these are Russell BeattieMarc Canter, and Om Malik. Surprises me a bit because these guys are already part of the ‘insiders’, the people who are connected, those at the top. Is it that they want to be more in, more connected, and even higher?

In the past I’ve been concerned about invite-only events such as these, because women, strangely enough, usually don’t get invited. And though the numbers at this year’s camp are pretty weak, there are women attending. Could do better on the representation, but if O’Reilly is really only concerned about marketing to men, that’s the company’s decision. Besides, looking at the women invited, quality more than makes up for quantity.

I didn’t get an invite, but wasn’t expecting one. Was invited once, and had to decline–didn’t have the money to make my way over to the coast. Even if I did have an invite and did have the money to go, I wouldn’t. Something like this has no appeal to me, and if the only power of the event is for it to be known that you were at the event, then this doesn’t have much appeal for me either.

Two hundred and fifty people roughing it in tents, sharing showers, involved in a saturation campaign of connecting with as many movers and shakers in the tech community as possible? Not my thing. A quiet dinner meeting up with folks and having a chance to talk, now that sounds fine. Time to meet with folks and talk over an idea sounds good; a frenetic run from event to event, tossing frisbees along the way does not.

Oh, it does concern me that I’m out here in St. Louis, cut off from ‘action’ so to speak, and adrift without the networking that seems so necessary to my biz. However, being cut-off also means that I have a clear perspective on much of the noise coming from the coast and much of it is noise, make no mistake. In the last five years most of the jumping up and down that’s occurred has been about concepts with no technical feasibility; technologies that are five years old but new again; and concepts that seem really great, but which we soon tire of like a kid with a Christmas toy.

There are the winners that slip in, and it would be nice to meet up with those who create the works that are solid, and you know will last. But I don’t really have to travel to California, and sleep on the ground with 250 people who are virtually strangers, while standing in line at the toilet in order to experience their creativity. I’d rather get to know the people through their work, when I can go to the bathroom anytime I want. As for the boosts to career and being part of the insiders, well, if my words and ideas and code here and elsewhere can’t sell me then nothing I’ll say in person will really make a difference.

But enough about me and my less than geeky attitude: I was particularly impressed with Tim O’Reilly’s discussion in Om Malik’s comments about how the choices of who to invite are made, especially the reasons for the 4th cut:

Fourth cut: Key people from important O’Reilly business partners, with whom we’re trying to build a deeper relationship, and for whom an invite to the “it” event will help seal the deal. (Sorry, but we are a business, and the event does have a business purpose, to increase our connections with people who will benefit our business.)

Foo Camp is to benefit O’Reilly the business, and as such, O’Reilly the business should have a right to invite the people it wants. Upfront, and honest, and I can respect that.

The real issue, though, and the main reason for much of the hurt feelings, is that Foo Camp is seen as the ‘it’ event, to use Tim’s rather eloquent words. Why is Foo Camp the ‘it’ event? Because Tim O’Reilly is a damn good marketer, that’s why. Want to have a session with the movers and shakers in the industry? Don’t have a meeting and let people invite themselves — no one will show up. No, you invite the folks, imbue the event with an ever so delicate scent of exclusivity, and the best will beat at your door begging to be allowed in. Brilliant. Mark Twain would approve.

Bottom line, though, and pushing aside much of the myth, FooCamp is nothing more than a fun and active party with some pretty smart people, not unlike many others that happen over the year. We make it exclusive by wanting to go. Stop wanting to go, and it’s no longer exclusive; it’s no longer the ‘it’ event, it’s just ‘an’ event.

There’s a lot of good people going to FooCamp who I would love to have a long chat with sometime, and maybe I will in the future. But I’d like to meet them one or two at a time, not cramed in amidst all that good old American summer camp goodness.

(I will miss the beer, though. Haven’t been to a good kegger in the longest time. )

Most importantly, if the purpose to go is to network, then you have to ask what the value of our online connectivity is if we feel we have to meet people in person in order to be successful. I mean, the people who are selling the whole “online experience” thing are the same ones who are running around from conference to conference, meeting to meeting. Either this is all new, in which case the old style of networking doesn’t matter; or the people who are networking about how this is all new are propagating a lie.

I’d like to think this is new, and it doesn’t matter how many ‘it’ conferences you go to, as long as you got the goods. So, to Tim and friends, have a lot of fun, take pictures, and write lots of reports. And to those who are doing the BarCamp thing, I hope you have fun, too. As for me, well, I’m thinking of creating Atom 2.0 and seeing if I can get on Slashdot.

Better yet: Eve 1.0, the syndication feed developed exclusively for women. Cool. And I didn’t even have to stand in line for the bathroom to think of it.