P2P Networks

I checked out Circle as well as Chord as P2P networks. These are excellent efforts and should be note to anyone who is interested in P2P systems. As with KaZaA, much of the P2P cloud is transient and located on the peers themselves. The folks at Userland should look at how this can be done with Radio 8.0 if they want a true, distributed backend to the product.

I have a feeling the cloud part isn’t the issue — it will be the Radio backend and this assumption of one controlling application per weblog. At least, that’s what I found when I started peeking around a bit. Perhaps folks more knowledgeable about Radio will have a better idea.

Back to the P2P systems: aside from a key entry point (and all of these systems need this and there’s a reason why) the P2P clouds are without iron. Aside from the key entry point.

Why is the entry point needed? Because each P2P circle is too small (yes it is) to make it efficient to send a bot out into the open Internet, knocking at IPs looking for a specific node of any one of these systems. All P2P systems are too small for this to be effective, Napster, Gnutella, and so on. Think about it — how many nodes are online now in the Internet? I wouldn’t even try and guess the number but I imagine millions and millions. Now you have a P2P network with about 200,000 nodes. Needle in haystack. Right?

Well, not necessarily. Depending upon the dispersion level of the nodes of the P2P network, it might not be that difficult to find an entry node into the network. So with a bot and a handshake protocol implemented at each node you could have a golden gateway — an entry point totally without iron.

However, the problem with this approach is you then have to have a bot for every system you want to join: Groove, Gnutella, Circle, and so on. What a pain.

Wouldn’t it be better to have all these systems provide a common form of identification as well as a common handshake and direction protocol and then have one type of bot that’s smart enough to tap on the door of the nearest P2P system and say “I’m looking for so and so”? And wouldn’t it be better to have each system learn about the others when contacted, such as when a bot returns to a node with a connection into Circle, it also happens to have information about the nearest golden gateway node to Gnutella?   And would it be such a resource burden to have the node check every once in a while to make sure it’s neighboring nodes are still online? So that when our bot of discovery comes calling, it’s given up to date information?

What’s the cost of a ping?

You know, I have so many bots crawling my servers that I’m amazed it’s still standing at times. But none of them work together. If they did, and if they were smarter, and if our sites had something a bit smarter than just open ports, firewalls, or web servers — then maybe we could do without DNS and centralized repositories of information such as UDDI.

Just some more grand ideas to throw out and see if people think I’m full of little green beans again.

Kazaa Aluminum Core

In reference to the last posting, Julian mentioned that perhaps Kazaa and it’s supernodes have more of an aluminum core because the cloud that supports the Kazaa P2P network is still mallable — the Supernodes that provide the cloud services are fluid and can change as well as go offline with little or no impact to the system.

I imagine, without going into the architecture of the system, that more than one Supernode is assigned to any particular subnet, others to act as backups, most likely pinging the primary Supernode to see if it’s still in operation. Out of operation, the backup Supernode(s) takes over and a signal is sent to the P2P nodes to get services from this IP address rather than that one. The original Supernode machine may even detect a shutdown and send a signal to the secondaries to take over.

Or perhaps the Supernode IPs are chained and the software on each P2P node checks at this IP first and if no response occurs, automatically goes to the second within the Supernode list and continues on until an active Supernode is found. This would take very little time, and would, for the most part be transparent to the users.

Again without access to any of the code, and even any architecture documentation (which means there’s some guesswork here) the algorithm behind the Supernode selection list looks for nodes that have the bandwidth, persistent connectivity, and CPU to act as Supernodes with little impact to the computer’s original use. The member nodes of each KaZaA sub-net — call it a circle — would perform searches against the circle’s Supernode, which is, in turn, connected to a group of Supernodes from other circles so that if the information sought in the first circle can’t be found, it will most likely be found in the next Supernode and so on. This is highly scalable.

So far so good — little or no iron in the core because no one entity, including KaZaA or the owner’s behind KaZaA can control the existence and termination of the Supernodes. Even though KaZaA is yet another file sharing service rather than a services brokering system, the mechanics would seem to meet our definition of a P2P network. Right?

Wrong.

What happens when a new node wants to enter the KaZaA network? What happens if KaZaA — the corporate body — is forced offline, as it was January 31st because of legal issues? How long will the KaZaA P2P network survive?

In my estimation a P2P network with no entry point will cease to be a viable entity within 1-2 weeks unless the P2P node owners make a determined effort to keep the network running by designating something to be an entry point. Something with a known IP address. Connectivity to the P2P circle is the primary responsibility of a P2P cloud. KaZaA’s connectivity is based on a hard coded IP. However, small it is, this is still a kernel of iron.

We need a way for our machines to find not just one but many P2P circles of interest using approaches that have worked effectively for other software services in the past:

We need a way to have these P2P circles learn about each other whenever they accidentally bump up against each other — just as webloggers find each other when their weblogging circles bump up against each other because a member of two circles points out a weblog of interest from one circle to the other.

We need these circle to perform a indelible handshake and exchange of signatures that becomes part of the makeup of each circle touched so that one entire P2P circle can disappear, but still be recreated because it’s “genectic” makeup is stored in one, two, many other circles. All it would take to restart the original circle is two nodes expressing an interest.

We need a way to propogate the participation information or software or both to support the circles that can persist  regardless of whether the original source of said software or information is still operating, just as software viruses have been propogated for years. Ask yourselves this — has the fact that the originator of a virus gone offline impacted on the spread of said virus? We’ve been harmed by the technology for years, time to use the concepts for good.

We need a way to discover new services using intelligent searches that are communicated to our applications using a standard syntax and meta-language, through the means of a standard communication protocol, collected with intelligent agents, as Google and other search engines have been using for years. What needs to change is to have the agents find the first participating circle within the internet and ask for directions to points of interest from there.

Standard communication protocol, meta-language, syntax. Viral methods of software and information propogation. Circles of interest with their own DNA that can be communicated with other circles when they bump in the night, so to speak. Internet traversing agents that only have to be made slightly smarter — given the ability to ask for directions.

Web of discovery. Doesn’t the thought of all this excite you?

Web Services Working Group

I’m extremely pleased to see the formation of a new working group at the W3C. It will be responsible for generating some meaning of the chaos that is Web Services. I may not be an adherent of the “Standards at all cost” club, but we do need to have standardization in three specific areas:

  • Protocols
  • Syntax
  • Semantics

Syntax is XML, Semantics is RDF, and a  public domain Protocol will, hopefully, come quickly and efficiently out of this new group.

XML as a syntax is safely within the public domain. However, there is too much implied ownership of the protocols (openly available or not), and there’s too little understanding of the importance of the semantics.

We don’t need centralized enforcement agencies such as UDDI., or centralized security paradigms such as Passport if we only have a standardized syntax, semantics, and protocol. Give me these, and I can build you an interface to anything, from anything.

My strongest hope is that the W3C moves swiftly — they’re late in the protocol game.

A Common Interface

When people say something I want to respond to, I respond to it. And other people are, hopefully, responding to me if I say something interesting. When I respond to what others write, it is a compliment. It means that what was said definitely got my interest, regardless of whether I agree with what was said or not. When people respond to me, I take it as a compliment, even when they call me nasty things. (Go ahead! Call me a bitch! I live for this!)

Having carefully said all this, I find I do want to respond to something Dave said on Scripting News. I have to respond — to hold it in will cause me an injury.

I was a developer before the Web was even a twinkle in Berners-Lee’s eyes. I love to program, and have worked — worked mind you — with 18 different programming languages, including C, C++, Java, Perl, Snobol (any of you recognize this one?), Smalltalk, Ada, Pascal, Modula II, FORTRAN, LISP, and so on. And I still love to program, though I spend most of my time designing technology architectures and writing now.

When the web came along, it was love at first byte. I thought that this was great stuff — a universal front end to any application. I was so sold that I focused as much of my professional life on the web as I could, and still pay the bills.

I wrote books and articles on CGI and DHTML and JavaScript and XML and CSS and ASP and a host of other web technologies. Even today I find I am as fascinated by the web as I was waaaaaaaaaay back in the beginning. I’ve never seen that the web is low-tech. If anything, I find myself being stretched more by the web than by traditional programming.

In all this time, I just don’t remember there ever being a battle between C developers (I’m assuming by this Dave meant people who don’t want to use the web as an environment for their applications) and web developers. Not all applications fit the web, and not all companies have chosen the web for their environment — but that’s not developers, that’s just business. Most companies today use applications from both environments, something that will probably continue to be the norm into the future. (We don’t want to use Word over the Internet as a service, no matter what Microsoft says. Same for PhotoShop)

There’s discussions — constantly — between server-side folks and the designers. I know that I’ve had a lively chat or two with the WSP people who are, primarily, web designers. But most developers I know of, such as myself, are thrilled to play with the new technologies the web has provided. There might be a few who don’t want to play web, but most of us are as happy (or more) working with web development as we are with traditional development.

The whole thing is really about services isn’t it? Providing services to people who need them. Most computer-based functionality is nothing more than services wrapped in a front end — doesn’t matter if the front end is a VB application or a web page. All that matters is that the services are prompt, efficient, secure, accurate, and effective. If some people prefer to create the front end in VB and put both service and front end on one machine, that’s cool. If they prefer a web page, that’s cool. Where’s the battle? Apples and oranges.

As for Netscape and Microsoft and the W3C not having a vision for the future of the web, oh they most certainly do and did. Microsoft’s whole vision is .NET and owning the internet. In fact, the company’s vision scares me most of the time. Netscape also had strong designs on the web before they became the underdog. As for the W3C, we wouldn’t have the web without this organization’s efforts. I may preach chaos, but I practice chaos on top of a specific development platform, and I have that platform thanks to the W3C.

The key is that there are a lot of groups and people who have their own visions for what is the future of the web. If we continue to work towards a common interface, then we can each practice our own vision and our own chaos behind that interface. But we must have this interface, and I’d rather it be provided by an organization that doesn’t profit, then one that does. The interface cannot be owned by any one company, any one organization, or any one person.

Full Peer

Dave’s looking for a definition for a full peer. I’ve never heard of the term “full peer”, and the qualification about being connected 24 hours doesn’t necessarily fit within a P2P (peer-to-peer) environment.

In P2P, a peer both provides and consumes services. A group of peers can then provide and consume services to and from each other without dependence on any one server. With this understanding, there’s an assumption that this consumption and distribution occurs when the peer is connected.

Within some P2P enabled applications, the communication may be cached or queued when the peer is not connected. I know this the way Groove works.

Within Freenet, any one of the nodes within the network can consume or supply files. But if a peer is not connected, it’s not part of the network, it isn’t a participant and files are consumed and supplied through other participants. Either you’re a peer, or you’re not. Again, the assumption of 24 hour access is not a factor.

Some systems support a hybrid cloud whereby service requests are cached at a remote location (usually hidden from the peer), waiting for the other peer to connect. When the other peer connects, the communication is concluded. The results of the service call can then be communicated back to the originating peer, or cached itself if the originating peer is offline.

In a true P2P system, any one of the peers within the network could act as a cloud (intermediary) for other peers. Within a hybrid system, such as Groove, the system itself might provide these types of intermediary services.

As for firewall issues, most P2P tools can work from within firewalls, or be made to work within firewalls.

Dave, an interesting definition – but I don’t necessarily see it within a truly distributed system. What’s your context for the term? That would help.