Categories
Insects Photography

Monarchs

The rains finally came this last weekend. They blew in strongly on Saturday and took out the power for half the city, but I don’t think anyone minded.

I did lose my internet for several hours on Sunday. When I called in, I finally got through to a lovely woman with a charming Kentucky accent who told me that the reason I didn’t have service is that the power box for the cable was hit by lightning; the only reason the cable was still working was that a cable company worker was down at the station with a power generator in the back of his truck, keeping the cable going. The internet, however, required much more power.

With the rains has come cooler weather, and I’ve been able to get out for walks. However, with gas prices being the way they are, the walks are close to town. When did someone find the secret of alchemy and turn gold into gasoline?

I don’t mind walking close to home, though. There’s a gentle feel to the air — a softness we’ve been missing all summer. It’s almost as if we’re having a second Spring. During Monday’s walk at Powder, under a canopy of dripping green leaves, I came upon a half dozen bucks; to see one antlered deer is uncommon, and to see several at once was an unexpected treat.

And today I found the monarch butterflies. After all these years with trips carefully planned to Shaw and other places, without any success, I finally find my monarchs where I least expected them. Purely by accident — I had a couple of hours to kill before picking up my roommate at work and decided to go to Busch Conservation Area to take pictures of geese. When I arrived, the fields around the main lake were full of a delicate, pink flower (milkweed), freshly bloomed from all the rain, and busy among the flowers were hundreds of monarch butterflies.

monarch8

I grabbed my camera and raced from flower to flower taking pictures, sometimes stopping just to let the butterflies and bees fly around me, close enough to almost feel the movement of their wings. No one else was about, though I could hear creatures in the grasses and in the water of the lake next to the field. It was worth the summer, all dead and dry and hot bit of it. All of it was worth those few hours with the butterflies.

Needless to say, I have a lot of photos. Be forewarned.

monarch23

What was particularly funny was the interaction between the butterflies and the bees. The butterflies would usually have their wings folded up. As a bee approached, they would suddenly open their wings, *thwack*! And there would go the bee.

monarch21

Came home and watched two wonderful movies: Strictly Ballroom and IQ. Strictly Ballroom is an Australian film about ballroom dancing, and would seem to be the usual boy and girl against all odds movies, but it has some wonderfully campy movements. And I love Spanish guitar, not to mention the dancing.

What I liked in particular with Strictly Ballroom was the ending, which I won’t give away, other than to say that the dancing is all that matters.

And IQ, well, it’s sweet and gentle, and isn’t it a wonderful time to be alive? Wahoo.

butterfly5

(The above is a swallowtail butterfly — it wouldn’t stop moving, and kept fluttering it’s upper wings. Really graceful and beautiful creature.)

monarch9

monarch6

Categories
Connecting

Foobar

This is a real red letter day. It’s a day when I come out in defense of a Tim O’Reilly event, rather than the opposite. I’m sure it will be appreciated about as much as my criticism, which is to say not. Regardless, it is the fair thing to do.

The event is Foo Camp, and there’s some folk unhappy because they weren’t invited. Among these are Russell BeattieMarc Canter, and Om Malik. Surprises me a bit because these guys are already part of the ‘insiders’, the people who are connected, those at the top. Is it that they want to be more in, more connected, and even higher?

In the past I’ve been concerned about invite-only events such as these, because women, strangely enough, usually don’t get invited. And though the numbers at this year’s camp are pretty weak, there are women attending. Could do better on the representation, but if O’Reilly is really only concerned about marketing to men, that’s the company’s decision. Besides, looking at the women invited, quality more than makes up for quantity.

I didn’t get an invite, but wasn’t expecting one. Was invited once, and had to decline–didn’t have the money to make my way over to the coast. Even if I did have an invite and did have the money to go, I wouldn’t. Something like this has no appeal to me, and if the only power of the event is for it to be known that you were at the event, then this doesn’t have much appeal for me either.

Two hundred and fifty people roughing it in tents, sharing showers, involved in a saturation campaign of connecting with as many movers and shakers in the tech community as possible? Not my thing. A quiet dinner meeting up with folks and having a chance to talk, now that sounds fine. Time to meet with folks and talk over an idea sounds good; a frenetic run from event to event, tossing frisbees along the way does not.

Oh, it does concern me that I’m out here in St. Louis, cut off from ‘action’ so to speak, and adrift without the networking that seems so necessary to my biz. However, being cut-off also means that I have a clear perspective on much of the noise coming from the coast and much of it is noise, make no mistake. In the last five years most of the jumping up and down that’s occurred has been about concepts with no technical feasibility; technologies that are five years old but new again; and concepts that seem really great, but which we soon tire of like a kid with a Christmas toy.

There are the winners that slip in, and it would be nice to meet up with those who create the works that are solid, and you know will last. But I don’t really have to travel to California, and sleep on the ground with 250 people who are virtually strangers, while standing in line at the toilet in order to experience their creativity. I’d rather get to know the people through their work, when I can go to the bathroom anytime I want. As for the boosts to career and being part of the insiders, well, if my words and ideas and code here and elsewhere can’t sell me then nothing I’ll say in person will really make a difference.

But enough about me and my less than geeky attitude: I was particularly impressed with Tim O’Reilly’s discussion in Om Malik’s comments about how the choices of who to invite are made, especially the reasons for the 4th cut:

Fourth cut: Key people from important O’Reilly business partners, with whom we’re trying to build a deeper relationship, and for whom an invite to the “it” event will help seal the deal. (Sorry, but we are a business, and the event does have a business purpose, to increase our connections with people who will benefit our business.)

Foo Camp is to benefit O’Reilly the business, and as such, O’Reilly the business should have a right to invite the people it wants. Upfront, and honest, and I can respect that.

The real issue, though, and the main reason for much of the hurt feelings, is that Foo Camp is seen as the ‘it’ event, to use Tim’s rather eloquent words. Why is Foo Camp the ‘it’ event? Because Tim O’Reilly is a damn good marketer, that’s why. Want to have a session with the movers and shakers in the industry? Don’t have a meeting and let people invite themselves — no one will show up. No, you invite the folks, imbue the event with an ever so delicate scent of exclusivity, and the best will beat at your door begging to be allowed in. Brilliant. Mark Twain would approve.

Bottom line, though, and pushing aside much of the myth, FooCamp is nothing more than a fun and active party with some pretty smart people, not unlike many others that happen over the year. We make it exclusive by wanting to go. Stop wanting to go, and it’s no longer exclusive; it’s no longer the ‘it’ event, it’s just ‘an’ event.

There’s a lot of good people going to FooCamp who I would love to have a long chat with sometime, and maybe I will in the future. But I’d like to meet them one or two at a time, not cramed in amidst all that good old American summer camp goodness.

(I will miss the beer, though. Haven’t been to a good kegger in the longest time. )

Most importantly, if the purpose to go is to network, then you have to ask what the value of our online connectivity is if we feel we have to meet people in person in order to be successful. I mean, the people who are selling the whole “online experience” thing are the same ones who are running around from conference to conference, meeting to meeting. Either this is all new, in which case the old style of networking doesn’t matter; or the people who are networking about how this is all new are propagating a lie.

I’d like to think this is new, and it doesn’t matter how many ‘it’ conferences you go to, as long as you got the goods. So, to Tim and friends, have a lot of fun, take pictures, and write lots of reports. And to those who are doing the BarCamp thing, I hope you have fun, too. As for me, well, I’m thinking of creating Atom 2.0 and seeing if I can get on Slashdot.

Better yet: Eve 1.0, the syndication feed developed exclusively for women. Cool. And I didn’t even have to stand in line for the bathroom to think of it.

Categories
Connecting

The ABCs of frank online talk

A: “I want to have a frank discussion.”

 

B: “I’m game.”

C: “Me, too.”

D: “That’s what’s great about this environment–the honesty and openness.”

E: “Whatever you want to talk about, I’m cool.”

F: “Yo!”

 

A: “Well, the software I’m using is pretty good, but the license says I can’t help a friend install it.”

B: “Isn’t that just like the Internet? Everyone wants everything for free.”

A: “I didn’t say I wanted the software for free. I said…”

C: “You know, you’ve always been critical of Z. You’re so sad.”

A: “I didn’t say anything about…”

D: “Yeah, let’s see you write this kind of software if you’re so good.”

A: “I just made a….”

E: “You don’t know what you’re talking about.”

A: “Well, actually, I…”

F: ” Bitch.”

A: “OK! Never mind! Let me try again.”

 

A: “I’ve noticed that the ORG weblog technology company led by S has 25 engineers, but that none are women.”

B: “You know, I don’t approve of quotas.”

A: “I didn’t say the word quo…”

C: “S does so much for all of us and asks nothing in return.”

A: “I know that S has done m…”

D: “Unsubscribed!”

A; “Wow, that was…”

E: “You know, you don’t have to get all hysterical about this.”

A: “I am NOT hyster…”

E: “Bitch.”

A: “Forget it! Never mind! There has got to be something we can have a frank talk about.”

A: “I know, I’ll talk about technology. No one is going to get emotional about technology.”

 

A: “I’ve decided not to support U and V, and only support P at my site.”

B: “Wow, talk about a political rant.”

A: “Political rant ?!?”

C: “You know, you think you’re so smart. The only reason you’re not using V is because you’re jealous.”

A: “Jealous? Of a technology?”

D: *silence, still unsubscribed*

E: “You’re such a liar, too. I feel sorry for you. Ugh.”

A: “Whaa..”

F; “Bitch.”

 

A: “What is the deal, here? I thought you all agreed we could have a frank, open discussion?”

B: “I’ve known S for years, and there’s not a sweeter person.”

C: “Agreed. And W is a real leader in the industry, as is Z. ”

D: *silence, still unsubscribed*

E: “Yeah, how can you turn on your own like that?”

F: “Yeah, bitch.”

 

A: “What you’re all saying, then, is I can be frank and honest, as long as whatever I say doesn’t directly, or indirectly, reference a friend, or someone sweet, or a leader in the industry, or someone who is a part of our group?”

B: “Not a bit, you can talk about anything you want. Just not Z.”

C: “No way. This is a free country, say anything you want. But you should respect W.”

D: “I’ve decided to re-subscribe to you. I think it’s important that we listen to those who we may not agree with. But what has S ever done to you? Did I happen to mention how sweet S is?”

E: “You know, you’re starting to sound shrill. Have you thought about professional help?”

F: “Yeah, stop being a bitch.”

A: *sigh*

A: *another sigh*

A: “Well, who is somebody who isn’t a friend with any of you?”

B: “You.”

C: “You.”

D: “You.”

E: “You.”

F: “Bill Gates.”

Based on actual, frank discussions…somewhere….

Categories
Semantics Web

The business of algorithms

Recovered from the Wayback Machine.

Algorithms are big business. Recently I’ve seen several jobs where the company wants someone who is “…good with algorithms”. Microsoft is competing with Google is competing with Yahoo to hire the best algorithm wranglers (which evidently, according to the article, does not mean women). IBM is releasing it’s unstructured data architecture (UIMA), including it’s concept-based search algorithms into open source by year end. Even within weblogging the debate, and the race, is on to find the best algorithms to mine us, otherwise known as the higher income people without lives.

Suddenly, the hip and cool kids on the block can “do” algorithms.

With all this interest, though, is a lot of confusion and misunderstandings, starting with but not limited to, the very concept of algorithm– a concept which is now taking on such mystical properties that those who can “do” algorithms are being vested with an almost god-like prescience. It is time, and past time, to put the brakes on the hyperbole surrounding algorithms.

Starting with the basics: what is an algorithm.

What is an algorithm

An algorithm is nothing more than the description of the steps necessary in order to reach a goal. The goal may be something as simple as baking a cake, or as complex as mapping the gene sequence of humans; however, the concept of algorithm doesn’t change with each goal–only the steps.

You have three apples, and someone asks you for one, how do you know how many apples you have left? Seriously, this is not a joke–simple addition, multiplication, division, and subtraction are algorithms, and each is represented by specific equations that introduce specialized operators. To better see this, sometimes you need to remember what it was like to learn math, and then program this knowledge into a computer.

For instance, add two number: 14 and 17. No, not by memorization — by working out the steps. First you line up the rightmost digits, or the ones column, and perform addition on the numbers in this column: 4 and 7. The act of addition is taking one number, breaking it down into units, and then adding these units to another number: 4 + 1 is 5; 5 + 1 is 6; 6 + 1 is 7, and so on. You know this; you remember your first exposure to a calculating device–your fingers. So add 4 and 7, just like you did when you were younger.

Start with 4, turn down your left thumb, that’s 5. Turn down your left index finger, that’s 6. Turn down your left middle finger, that’s 7. Turn down your left ring finger, that’s 8. Turn down your pinky, and that’s 9. Then switch to the other hand, and continue. Turn down your right thumb, that’s 10. Finally, turn down your right index finger and that makes 11. Stop at this point, because you’ve turned down 7 digits. Of cource, we could have started with 7 and added 4, but chances are when you were younger you started with the number on the left and added the number on the right (though this may change based on culture and language).

So now you have 11. That was an amazing accomplishment. Do it enough times, and you remember the result and you don’t have to turn down fingers when you’re asked to change some figures during, say, a board meeting.

But now you have a problem: you have a value in the ones column, but you also have a value in the tens column. So what you do is ‘carry’ that number over to the tens column, and add it to the other numbers that were already there–leading to addition on three numbers: 1 and 1 and 1. Luckily, the numbers are small or we’d probably have to start removing our shoes.

Combine all these steps into sequence of actions, and you have a very complicated, multi-step algorithm. Extend these same basic steps, and you can also do subtraction, multiplication, and division. In fact, once you managed the algorithm for addition, you had the basic skills necessary in order to work with any algorithm. It is only a few short steps from addition to something like Newton’s Method. The only barrier to taking these few short steps is interest and intimidation: interest, because not everyone really wants to learn how to do Newton’s Method; and intimidation because after a while, it’s a lot easier to define new equations and new operators to represent higher-level algorithms, and our first exposure to these sends many of us running for the door.

Aside from the intimidating equations, that’s all an algorithm really is: a formalization of the steps necessary in order to reach a goal. So when I’m asked on an interview if I can “do” algorithms, in my mind I hear: can you add 17 and 14 without having to take off your shoes? I can then reply without hesitation that yes, I can.

I am now ready to work at Google.

Well, not quite.

Pattern Matching, Hypothesis, and Proofs

Any of us can follow an algorithm if we’re interested and patient enough. But it takes something more to be able to derive the algorithm in the first place.

In his book, “Vision”, David Marr writes about the steps he and his fellow researchers took to discovery a computational theory of vision. The first was to create a representational framework, a hierarchical framework of vision, from the simplest edge detection to more complex visual processes. They then searched for existing algorithms that matched the observed and representational behavior:

These ideas suggest that in order to detect intensity changes efficiently, one should search for a filter that has two salient characteristics. First and foremost, it should be a differential operator, taking either a first or second spatial derivitive of the image. Second, it should be capable of being tuned to act at any desired scale, so that large filters can be used to detect blurry edges, and small ones to detect sharply focused file detail in the image.

Marr and his co-researcher Hildreth, eventually came up with the Laplacian of Gaussian, also known as the Marr Filter, or Marr-Hidreth filter.

Marr and Hildreth were able to derive their filter, their algorithm, because of training in math and neuroscience, as well as new research in the fields of artificial intelligence and vision. The training provided the tools they needed: a catalog of existing algorithms, as well as the necessary protocols; the research then provided the necessary new data.

If you use Photoshop’s Unsharp Mask, you can judge for yourself the success of Marr’s efforts. The point is that Marr and Hildreth established a goal, observed behavior, and then went shopping to find which algorithms came closest to matching the observed behavior — in this case a combination of algorithms: applying a Guassian filter to blur the image and remove structure, and the Laplacian to detect the ‘edges’ or differences of intensity that remains.

So now if I had the opportunity to work with the late Dr. Marr, and he asked me if I can ‘do’ algorithms, in my mind I hear: “Ohmigod, what am I doing here. Do you think anyone will notice if I slip out?”

Okay, but we’re not trying to invent artificial intelligence here

Now that we’ve gone from adding two apples to programming human sight, we’ll focus on algorithms located somewhere inbetween.

Though I wouldn’t have the background to work directly with Dr. Marr, this isn’t to say I can’t work with algorithms. Anytime I write code I make use of, or even create algorithms. When I work with data, either in RDF format or as relational data, I am using algorithms. As Dr. Marr was an ‘expert’ in computer vision and neuroscience, I’m, equally, an expert in my field of interest.

Most of us work with algorithms, though we may not be aware of the fact. When we follow a recipe for Beef Wellington, the instructions for putting together a model airplane, or to knit a complex baby blanket pattern, we’re following algorithms. And if we create a new recipe, knitting pattern, or computational theory of human behavior, we’re creating new algorithms–usually derived from existing ones, if possible. (It’s easier to work with existing, proven, algorithms then have to go through formal proofs with new ones.)

In other words: there is no ‘algorithm’ gene that some people have, and others don’t. One doesn’t need a PhD to work with algorithms; the ability to work with algorithms is pretty much universal.

Cool. So where was I? Oh yes, the business of algorithms.

Blogorithms

I had to do it before someone else did it. It was only a matter of time.

Now that weblogging has established its credibility (i.e. can be used to make money) and there are millions of us (”over 14 million served daily”), the interest in creating algorithms to make use of all the rich, seductive unstructured data we generate is very strong. Understandably so.

However, unlike previous research projects such as Dr. Marr’s, current weblogging effort seems to focus on the algorithms rather than the goal. Because of this, we’re measuring every last bit about ourselves, but not coming up with anything useful. By focusing on the tools rather than the end point we’re mixing search with popularity, marketing with discovery, and then we’re throwing in a little structured data–just to make things interesting.

For instance, looking at mixing search with popularity:

The Technorati 100, Blogdex, and Daypop Top 40 are all representatives of the same general type of algorithm: notification of update, extract out the links, increment the count for any matched link, with the top n number of link holders placed on the list, ordered by number of links. Though the data is treated differently–Technorati persists the number of links per page, while Daypop and Blogdex are only interested in reflecting ‘fresh’ links–the concept, and hence, the algorithm is the same: when a link to a specific page is encountered, add a link to the source to a list, and increment a counter. Then adjust the list accordingly.

None of this activity has anything to do with search. Each tool may also grab data for a search component, but the algorithms for popularity are not specific to search. Where the two get confused is when popularity is used as a factor of search.

Adjusting search results based on popularity is to combine two different algorithms, but there is no rhyme or reason for doing so. The fact that one page is more ‘popular’ than another does not make it a better authority. It’s not the same as something like Google’s PageRank, because PageRank is not a measure of popularity.

Ian Rogers wrote a very nice writeup on the original PageRank algorithm, breaking down the formula into the various steps. Summarizing, PageRank is based on incoming links, but it is the value of the links that helps push up the PageRank, and this value is dependent on how many outgoing links a page has.

This sounds like popularity but it isn’t. The whole purpose of the PageRank algorithm is to approximate a random surfer, and the probability that they would end up at the page after randomly clicking through so many pages. According to the Sergey Brin and Lawrence Page’s original page:

PageRank can be thought of as a model of user behavior. We assume there is a “random surfer” who is given a web page at random and keeps clicking on links, never hitting “back” but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank. And, the d damping factor is the probability at each page the “random surfer” will get bored and request another random page. One important variation is to only add the damping factor d to a single page, or a group of pages. This allows for personalization and can make it nearly impossible to deliberately mislead the system in order to get a higher ranking.

The PageRank in use today is not the same one just described; the one in use today is said to feature over 100 different variables. It’s not surprising that the computation has changed, but we have to suppose that the reasoning remains: it’s purpose is not to calculate popularity, but to capture the behavior of the random surfer. The only problem is, webloggers are not random surfers.

Weblogs combine the threaded chat behavior of a bulletin board system, with the separate domains of more traditional web pages. As such, the genre creates a threaded web linking behavior that must play havoc with the traditional search engine paradigms. We don’t link just because of interest or as reference; we also link based on a personal attachments, likes/dislikes, as part of self or other promotion, and a host of other reasons, none of which had anything to do with the traditional web linking of long ago.

danah boyd discussed this in a recent post, after reviewing 400 weblogs for linking patterns. She wrote:

Linking Patterns:

The Top 100 tend to link to mainstream media, companies or websites (like Wikipedia, IMDB) more than to other blogs (Boing Boing is an exception).

Blogs on blogging services rarely link to blogs in the posts (even when they are talking about other friends who are in their blogroll or friends’ list). It looks like there’s a gender split in tool use; Mena said that LJ is like 75% female, while Typepad and Moveable Type have far fewer women.

Bloggers often talk about other people without linking to their blog (as though the audience would know the blog based on the person). For example, a blogger might talk about Halley Suitt’s presence or comments at Blogher but never link to her. This is much rarer in the Top 100 who tend to link to people when they reference them.

Content type is correlated with link structure (personal blogs contain few links, politics blogs contain lots of links). There’s a gender split in content type.

When bloggers link to another blog, it is more likely to be same gender.

As danah mentioned, 400 weblogs is too few to extrapolate any global behavior, but we’ve seen one or more of these ourselves–in particular not linking to someone but giving the person’s name; or not even directly mentioning a name (a behavior that is becoming more common).

Whether there are gender differences in linking has been the subject of much debate. danah found in her examination of 400 weblogs that gender linked to like gender more often than not. If this is true, and women account for about 50% of the weblogs, then we should expect more weblogs by women in the popularity lists. That we don’t shows that we need to continue our observation before we can begin to derive algorithms related to weblogging and popularity, and weblogging and search.

As for marketing and discovery, and the thin vein of structured data (microformats, syndication feeds, FOAF, et al) that runs through all of this unstructured mess–this will be good for a follow on topic someday.

Onward

Mary Hodder had originally listed out several different metrics for consideration when it comes to developing an algorithm and asked if the approach she proposed was the right one:

So this is my first post think about making an open source algorithm. And I’m wondering, is this a useful approach? I think it could be worthwhile, done right, and I put it out there to the blogging community to determine what is best here. As I said, after seeing what people who want to work with smaller topic communities are doing, it may be in blogger’s interest to think about how this might be done so that is it more in keeping with the desires and views of the blogosphere.

The approach–reviewing all the different metrics, looking at representative data, searching for repetitive behaviors–are all good, and, equally, all for nought unless the purpose for the effort is clearly understood. This is is worth repeating: an algorithm is nothing more than a formalization of the steps necessary to meet a goal. And replacing Technorati 100 because it ’sucks’ is not a particuarly good goal. So my suggestion to Mary would be to establish the end points for this effort, first.

And now rumors abound that Technorati is being sold, possibly to a major search company. Well, that’s one way to get rid of the Technorati 100–convert it to the Yahoo 100. Semi-serious joking aside, if we know one thing by now about all of this, it’s that the unstructured data that weblogging sits on is this year’s hottest commodity–second only in value to the algorithms used to mine it.

Categories
Diversity

Sigh Friday

But I don’t knit.

I agree with Lauren. Sigh.

Other good takes on the interview:

Home Cooked: “What these comments consistently fail to do is explain why women’s activities have been forced to take place in the realm of the domestic, and how their talk has been dismissed as trivial compared with the seriousness of the (male-dominated) public sphere.”

ejchange: And for the record, there are women who do both knitting and politics. hell, there are women who do all three, which, all things considered, is not an easy feat.

Alembic: “By looking at this interview in its full text and context, couldn’t one also make the case that Mena is telling knitters who blog that what they are doing is blogging and that they are just much part of that technological revolution as are the “men” whose voices drown them out in the media … but not in the middle of their own lives, where speaking matters and technology is just a tool.”

Mena Trott’s own post on the article.