P2P and relying on HTTP

The Don Box discussion about HTTP was a good read with valid points.

From a P2P, not a web services perspective, we need to guarantee certain capabilities in P2P services that we take for granted in more traditional client/server environments. This includes the following:

Transaction reliability — the old two-phase commit of database technology appears again, but this time in a more challenging guise.

Transaction auditing — a variation of the two-phase commit, except that auditing is, in some ways, more fo the business aspect of the technology.

Transaction security — we need to ensure that no one can snoop at the transaction contents, or otherwise violate the transaction playing field.

Transaction trust — not the same thing as security. Transaction trust means that we have to ensure that the P2P service we’re accessing is the correct one, the valid one and that the service met some business trust criteria (outside of the technology realm with the latter).

Service or Peer discovery — still probably one of the more complicated issues about P2P. How do we find services? How do we find P2P circles? How do market our services?

Peer rediscovery — this is where the iron hits the cloud in all P2P applications I know of. You start a communication with another peer, but that peer goes offline. How do you take up the conversation again without the use of some centralized resource? Same could also be applied to services.

Bi-directional communication — This is Don’s reference to HTTP’s asymmetric nature. Peers share communication; otherwise, you’re only talking about the traditional web services model.

The file transfer nature of Napster or Freenet, and the IM nature of Jabber don’t necessarily consume all of these aspects of P2P applications, so haven’t necessarily pushed the P2P bubble to the max. However, when we start talking about P2P services — a variation of web services one could say — then we know we’re going to be stretching both our technology capabilities and our trust of the same.

Fun!

UDDI Questions

Andy sent some questions on UDDI that I’m going to attempt to answer. If you agree, disagree, or have additions, please drop a comment.

Questions:

How do you compare UDDI to other methods of discovering networked resources

(may or may not be web services)

What’s the difference a global UDDI registry and…
– google: controlled by a single organization
– dmoz.org: open, and replicated by other search engines
– DNS: governed by ICANN, but organizations can apply to be registrars
– others?

Do the above services have the same weakness you attribute to a UDDI global registry?

In some ways, we’re talking apples, oranges, cherries, and perhaps some peaches. They’re all fruit, but the similarity ends at that point.

UDDI is a centralized discovery service managed by a consortium of organizations, the content of which may or may not be striped across serveral different servers. Information is added to the repository by submission of those with services to provide.

Google is a discovery service that is also centralized under one authority, but uses many different methods to discover information including automated agents (bots), subscription to other services (such as dmoz) and manual intervention.

Google, though, has an interesting twist to it’s discovery mechanism: it has a set of algorithms which are constantly evaluating and merging and massaging its raw data in order to provide additional measurements, ensuring higher degrees of accuracy and recency. The discovery of data is never the same two times running within a collection period.

The dmoz directory is a great open source effort to categorize information intelligently. In other words, the data is manually added and categorized to the directory. This makes the directory extremely efficient when it comes to human interpretation of data. You might say that with dmoz, the “bots” are human. You get the world involved then you have a high level of intelligent categorization of data. Only problem, though, is that human interpretation of data is just as unreliable as mechanical interpretation at times.

However, dmoz is probably the closest to UDDI of the network discovery services you’ve listed primarily because of this human intervention.

Finally, DNS. DNS does one thing and as pissy as people are about it, it does the one thing reasonably well. The web has grown to huge proportions with something like DNS to handle naming and location of resources.

In some ways, DNS is closest to what I consider an iron-free cloud if you look at it from an interpretation point of view (not necessarily implementation). You have all these records distributed across all these authoritative servers providing a definitive location of a resource. Then you have these other servers that basically do nothing more than query and cache these locations to make access to these resources more quickly and the whole framework more scalable.

In some ways I think UDDI is like DNS, also. You can have UDDI records distributed across different servers to make service lookup more efficient, and to make the whole process more scalable.

This same approach also happens with Circle, Chord, and Freenet if you think about it (the whole store and forward, query and cache at closer servers or peers so that the strain of the queries aren’t channeled to a few machines).

UDDI is like DNS for another reason: controlling organization and potential political problems. ICANN hasn’t had the best rep managing the whole DNS/registrar situation. In particular, you should ask some of the Aussie ISP’s what they think of the whole thing. They’ve had trouble with ICANN in the past.

All of the services share one common limitation: they all have hard coded entry points, and all have some organization as controller. I don’t care how altruistic the motives, there is a controlling body. There’s iron in all the approaches. All of them.

 

Visual C++ helper function

I popped over to bumr for a minute and came face to face with this Visual C++ code. Whoa! Work!

And yes, as noted in the comments,  _bstr_t and _variant_t are darn handy. Almost make VC++ palatable at times. The problem with Microsoft’s Visual products isn’t that they aren’t powerful. The problem is you have to really dig to find the nifty helper functions to make your life easier.*

Users shouldn’t have to dig for information about how to use a product. This is equivalent to “if you have to ask directions, you can’t afford to use it” in attitude. Arrogant.

*Another problem is that going Microsoft’s way usually implies total buy-in to the MS way of doing things; I still own my soul, thank you very much.

 

P2P Networks

I checked out Circle as well as Chord as P2P networks. These are excellent efforts and should be note to anyone who is interested in P2P systems. As with KaZaA, much of the P2P cloud is transient and located on the peers themselves. The folks at Userland should look at how this can be done with Radio 8.0 if they want a true, distributed backend to the product.

I have a feeling the cloud part isn’t the issue — it will be the Radio backend and this assumption of one controlling application per weblog. At least, that’s what I found when I started peeking around a bit. Perhaps folks more knowledgeable about Radio will have a better idea.

Back to the P2P systems: aside from a key entry point (and all of these systems need this and there’s a reason why) the P2P clouds are without iron. Aside from the key entry point.

Why is the entry point needed? Because each P2P circle is too small (yes it is) to make it efficient to send a bot out into the open Internet, knocking at IPs looking for a specific node of any one of these systems. All P2P systems are too small for this to be effective, Napster, Gnutella, and so on. Think about it — how many nodes are online now in the Internet? I wouldn’t even try and guess the number but I imagine millions and millions. Now you have a P2P network with about 200,000 nodes. Needle in haystack. Right?

Well, not necessarily. Depending upon the dispersion level of the nodes of the P2P network, it might not be that difficult to find an entry node into the network. So with a bot and a handshake protocol implemented at each node you could have a golden gateway — an entry point totally without iron.

However, the problem with this approach is you then have to have a bot for every system you want to join: Groove, Gnutella, Circle, and so on. What a pain.

Wouldn’t it be better to have all these systems provide a common form of identification as well as a common handshake and direction protocol and then have one type of bot that’s smart enough to tap on the door of the nearest P2P system and say “I’m looking for so and so”? And wouldn’t it be better to have each system learn about the others when contacted, such as when a bot returns to a node with a connection into Circle, it also happens to have information about the nearest golden gateway node to Gnutella?   And would it be such a resource burden to have the node check every once in a while to make sure it’s neighboring nodes are still online? So that when our bot of discovery comes calling, it’s given up to date information?

What’s the cost of a ping?

You know, I have so many bots crawling my servers that I’m amazed it’s still standing at times. But none of them work together. If they did, and if they were smarter, and if our sites had something a bit smarter than just open ports, firewalls, or web servers — then maybe we could do without DNS and centralized repositories of information such as UDDI.

Just some more grand ideas to throw out and see if people think I’m full of little green beans again.

Kazaa Aluminum Core

In reference to the last posting, Julian mentioned that perhaps Kazaa and it’s supernodes have more of an aluminum core because the cloud that supports the Kazaa P2P network is still mallable — the Supernodes that provide the cloud services are fluid and can change as well as go offline with little or no impact to the system.

I imagine, without going into the architecture of the system, that more than one Supernode is assigned to any particular subnet, others to act as backups, most likely pinging the primary Supernode to see if it’s still in operation. Out of operation, the backup Supernode(s) takes over and a signal is sent to the P2P nodes to get services from this IP address rather than that one. The original Supernode machine may even detect a shutdown and send a signal to the secondaries to take over.

Or perhaps the Supernode IPs are chained and the software on each P2P node checks at this IP first and if no response occurs, automatically goes to the second within the Supernode list and continues on until an active Supernode is found. This would take very little time, and would, for the most part be transparent to the users.

Again without access to any of the code, and even any architecture documentation (which means there’s some guesswork here) the algorithm behind the Supernode selection list looks for nodes that have the bandwidth, persistent connectivity, and CPU to act as Supernodes with little impact to the computer’s original use. The member nodes of each KaZaA sub-net — call it a circle — would perform searches against the circle’s Supernode, which is, in turn, connected to a group of Supernodes from other circles so that if the information sought in the first circle can’t be found, it will most likely be found in the next Supernode and so on. This is highly scalable.

So far so good — little or no iron in the core because no one entity, including KaZaA or the owner’s behind KaZaA can control the existence and termination of the Supernodes. Even though KaZaA is yet another file sharing service rather than a services brokering system, the mechanics would seem to meet our definition of a P2P network. Right?

Wrong.

What happens when a new node wants to enter the KaZaA network? What happens if KaZaA — the corporate body — is forced offline, as it was January 31st because of legal issues? How long will the KaZaA P2P network survive?

In my estimation a P2P network with no entry point will cease to be a viable entity within 1-2 weeks unless the P2P node owners make a determined effort to keep the network running by designating something to be an entry point. Something with a known IP address. Connectivity to the P2P circle is the primary responsibility of a P2P cloud. KaZaA’s connectivity is based on a hard coded IP. However, small it is, this is still a kernel of iron.

We need a way for our machines to find not just one but many P2P circles of interest using approaches that have worked effectively for other software services in the past:

We need a way to have these P2P circles learn about each other whenever they accidentally bump up against each other — just as webloggers find each other when their weblogging circles bump up against each other because a member of two circles points out a weblog of interest from one circle to the other.

We need these circle to perform a indelible handshake and exchange of signatures that becomes part of the makeup of each circle touched so that one entire P2P circle can disappear, but still be recreated because it’s “genectic” makeup is stored in one, two, many other circles. All it would take to restart the original circle is two nodes expressing an interest.

We need a way to propogate the participation information or software or both to support the circles that can persist  regardless of whether the original source of said software or information is still operating, just as software viruses have been propogated for years. Ask yourselves this — has the fact that the originator of a virus gone offline impacted on the spread of said virus? We’ve been harmed by the technology for years, time to use the concepts for good.

We need a way to discover new services using intelligent searches that are communicated to our applications using a standard syntax and meta-language, through the means of a standard communication protocol, collected with intelligent agents, as Google and other search engines have been using for years. What needs to change is to have the agents find the first participating circle within the internet and ask for directions to points of interest from there.

Standard communication protocol, meta-language, syntax. Viral methods of software and information propogation. Circles of interest with their own DNA that can be communicated with other circles when they bump in the night, so to speak. Internet traversing agents that only have to be made slightly smarter — given the ability to ask for directions.

Web of discovery. Doesn’t the thought of all this excite you?