Categories
Technology

UDDI and Discovery

Questions:

How do you compare UDDI to other methods of discovering networked resources

(may or may not be web services)

What’s the difference a global UDDI registry and…
– Google: controlled by a single organization
– dmoz.org: open, and replicated by other search engines
– DNS: governed by ICANN, but organizations can apply to be registrars
– others?

Do the above services have the same weakness you attribute to a UDDI global registry?

In some ways, we’re talking apples, oranges, cherries, and perhaps some peaches. They’re all fruit, but the similarity ends at that point.

UDDI is a centralized discovery service managed by a consortium of organizations, the content of which may or may not be striped across several different servers. Information is added to the repository by submission of those with services to provide.

Google is a discovery service that is also centralized under one authority but uses many different methods to discover information including automated agents (bots), subscription to other services (such as dmoz) and manual intervention.

Google, though, has an interesting twist to its discovery mechanism: it has a set of algorithms which are constantly evaluating and merging and massaging its raw data in order to provide additional measurements, ensuring higher degrees of accuracy and recency. The discovery of data is never the same two times running within a collection period.

The dmoz directory is a great open source effort to categorize information intelligently. In other words, the data is manually added and categorized to the directory. This makes the directory extremely efficient when it comes to human interpretation of data. You might say that with dmoz, the “bots” are human. You get the world involved then you have a high level of intelligent categorization of data. The only problem, though, is that human interpretation of data is just as unreliable as a mechanical interpretation at times.

However, dmoz is probably the closest to UDDI of the network discovery services you’ve listed primarily because of this human intervention.

Finally, DNS. DNS does one thing and as pissy as people are about it, it does the one thing reasonably well. The web has grown to huge proportions with something like DNS to handle naming and location of resources.

In some ways, DNS is closest to what I consider an iron-free cloud if you look at it from an interpretation point of view (not necessarily implementation). You have all these records distributed across all these authoritative servers providing a definitive location of a resource. Then you have these other servers that basically do nothing more than query and cache these locations to make access to these resources more quickly and the whole framework more scalable.

In some ways, I think UDDI is like DNS, also. You can have UDDI records distributed across different servers to make service lookup more efficient and to make the whole process more scalable.

This same approach also happens with Circle, Chord, and Freenet if you think about it (the whole store and forward, query and cache at closer servers or peers so that the strain of the queries aren’t channeled to a few machines).

UDDI is like DNS for another reason: controlling organization and potential political problems. ICANN hasn’t had the best rep managing the whole DNS/registrar situation. In particular, you should ask some of the Aussie ISP’s what they think of the whole thing. They’ve had trouble with ICANN in the past.

All of the services share one common limitation: they all have hardcoded entry points, and all have some organization as a controller. I don’t care how altruistic the motives, there is a controlling body. There’s iron in all the approaches. All of them.

Categories
Technology

Blogger Pro—important

–CRITICAL—-

Blogger Pro users — if you have posted today, it’s imperative that you check your weblog page, particularly if you use the Blogger template tag to identify weblog posting author.

Per Ev:

There was a major glitch in publishing this evening. If you published and your name doesn’t look right, publish again.

Categories
Web

Dot Com Bust Redux

I’m assuming the only reason that the RealNames failure is getting air time is because the former CEO has published its business dealings with Microsoft.

I glanced through Keith Teare’s papers at his personal web site, and just can’t see the fuss.

Microsoft chose to terminate the relationship with RealNames. With the nebulous nature of the product, the overall opinion against such centralized technology in today’s market, and the business proposal I don’t see how anyone could be surprised by this decision.

RealNames owed Microsoft $25.5m on May 2nd. They didn’t have it. They issued a counter-proposal. Microsoft wasn’t interested. RealNames bites the dust.

Teare believes that Microsoft isn’t demonstrating vision in its current direction, and is seeking solutions that it can control. Maybe so, but consider the proposed future direction for RealNames: Centralized, proprietary, flat architectured Keyword technology in partnership with a company such as Verisign.

I have a hard time identifying with one proprietary, centralized, patent-holding company fighting back at another proprietary, centralized, patent-holding company.

However, I do have sympathy for the 75 people in Redwood City that lost their jobs.

Categories
Burningbird Technology

Space? What space?

I was playing around with my server earlier, trying out some fun and interesting sounding new techie toys. Unfortunately, the new techie toys required ImageMagick.

Those of you with a Unix background are probably going “Oh, No!” about now. I knew I was pushing the bubble with this one, but you only live once.

Damn the server! Full install ahead!

— —- — —

Anyway, we’re almost back to normal. I’ve managed to save the server, and was able to repair the Apache installation. It was also nice hearing from the system kernel, all those “panic!” emails.

If you tried to post comments earlier during some interesting moments of turmoil and they aren’t showing up — Sorry! If you have a minute and wouldn’t mind reposting, I would be grateful!

The great thing about Unix servers is that you can do anything. The bad thing about Unix servers is that you can do anything.

Categories
Technology

Making peace with Google

I can’t wait until I get up in the morning and pop on to my machine so I can download 50+ spam emails. One of the funnest games of the day is to try and find “real” email among all of the junk. When I find one, I holler out “email whack!”

As you can tell, I am being facetious. I don’t know of anyone who likes spam, or wants to spend time on it, or wants to waste email bandwidth on it.

So why do we all like the crazy hits we get from Google?

Dave Winer pointed out a posting from Jon Udell discussing a posting from Dave Sims at O’Reilly. In it, Dave Sims wrote:

Google’s being weakened by its reliance on webloggers and their crosslinks

If Google wants to evolve into a functional resource for all users, it will have to work itself off this current path, or it will open up an opportunity for The Next Great Search Engine.

Jon responds with:

In the long run, the problem is not with Google, but with a world that hasn’t yet caught up with the web. I’m certain that in 10 years, US Senators and Inspectors General will leave web footprints commensurate with their power and influence. I hope that future web will, however, continue to even the odds and level the playing field.

Sorry, Jon. I’m with Dave Sims on this one. Weblogs are weakening Google.

When I ported the Burningbird to Movable Type and moved to the new location, I also created a robots.txt file that disallowed any web bot other than the blogdex or Daypop bots. And the Googlebot, being a well behaved critter, has honored this (as have several other bots, my referrer log is getting sparkly clean).

In the meantime, I’ve left my old site as is, bot-beaten poor little thing that it is. As a result, in the referrer log I’ve found the following searches:

rufus wainright shrek
devonshire tea graphics
missouri point system drivers license
bill gates popular science
entrenched in hatred
richard ashcroft money to burn
shelley bird
pictures of terrorists burning american flags
south carolina state patrol fishing
pictures of women in afghanistan
we start fire billy joel
fairy tale blue bird
beautiful outlook pictures
fighting fishies
high blood pressure burning
hacking statistics in Australia
lord of the rings pictures and drawings sting sword
add morpheus node

…and on and on

And all of these Google searches happened in three days time. Three days.

Comparing usage estimates, Google was effectively chewing up over 30% of my web site CPU and bandwidth on searches that were on the average accurate 3% of the time.

My regular web sites (Dynamic Earth, YASD, P2P Smoke, and Burningbird Network) have on average seven times the traffic of my weblog, with half the Google traffic and an accuracy of over 98%. This figure means that Google searches resulting in hits to the regular web sites are finding resources matching their searches. People may still continue looking at other sites, but the topic of the search is being met by the topic covered in the page.

Weblogs — might as well call us Google Viruses.

This isn’t to say that Google and weblogs can’t work together, but it isn’t up to Google to make this happen. Google is a web bot and an algorithm; we’re supposed to be the ones with the brains.

Weblogs that focus on one specific topic are ideal candidates for Google scanning. For instance, zem is a weblog focusing on topics of cryptography, security, and copyrights. Because he consistently stays on topic, he’s increasing his accuracy ratio — people are going to find data on the page that meets their search.

Victor, who’s as interested in Google as I am, is trying to work with Google by creating a new weblog that focuses purely on web development resources, Macromedia products, and browser development. It’s early days yet, but as time goes by and more people discover Victor’s weblog, he should increase his Google page rank, resulting in an increase of the number and accuracy of his Google hits.

So what’s a weblogger who just wants to have fun to do? Well, if you don’t mind the crazy searches and the waste of your bandwidth and CPU, don’t do anything. Let all those little bots just crawl all over your weblog’s butt. Google’s bandwidth and accuracy is Google’s problem (time for smarter algorithms, perhaps).

However:

-if you’re saving up to add some nice graphics or MP3 files to your weblog and your bandwidth is restricted, as most servers are or

-if you’re getting tired of crawling through the bizarre Google searches or

-if you’re getting tired of not being able to put “xxx” on your weblog page

then you might want to consider providing a few helpful aids to Google.

Google Helpful Aids

1. Create a robots.txt file and restrict Googlebot’s search to specific areas of your weblog web site — not to include your weblog page or archives.

2. If possible, create individual archive pages for each post. Otherwise, for all posts that deserve to stand alone, copy the generated HTML into a separate file.

3. For your weblog posts that you think will make a great resource, and that stay on topic and don’t meander all over the place, copy or hard link it (if you’re using Unix) to a directory that allows bots to crawl.

4. Avoid the use of ‘xxx’ in any shape and form in any of your Googlized pages

Over time, we’ll add to these aids.

Now, if only I can figure out what to do with all these XML and RDF aggregators that are now crawling all over my server….