Aswath Weblog: Serverless Gets You

December 12, 2005

Serverless Gets You

In a recent entry Andy tells us about a problem that I have suspected for a long time. Somebody left an “offline” message to Andy, but it took a couple of days for Andy to receive it. He suspects that the problem is started to happen recently. My opinion is that architectural and has been pointed out previously.

It is reasonable to conclude that a designated Skype client acts as an answering machine and collects the message. Then to ensure reliability may store copies of the message in multiple clients. This way, when Andy came on line, he could retrieve a copy. This may look like a reasonable alternative to a central voice mail server. Except there is a non-zero probability (for some reason engineers want to call this “finite” probability; isn’t probability finite anyway) that all the clients where a message is stored is not available when Andy signs on. (Think of a big holiday or weekend in Andy’s case). Actually, the first client may be disconnected from the network as the person is leaving the message without that person realizing it. It is all in the race conditions.

The fundamental problem is in trying to replace a server (that can be made reliable or at least whose failure is discernable) by a number of unreliable clients, however large the number may be. You see the law of large numbers is in play if these clients are independent and under certain conditions they may not be independent.

It is interesting that Andy has encountered this problem. Peerio, a system developed by one of his own clients may also have the same problem. Essentially they also have to do the same thing to earn the “serverless” moniker. I say this with a little trepidation: whenever I write about them, a representative from Popular Telephony comes swinging at me personally (read the comment by prostuda) without arguing the point.

A better alternative is for the client of the originator to collect the message and send it out as an email.

Posted by aswath at December 12, 2005 09:21 AM

If you do not have an OpenID, then please use www.enthinnai.com/unauopenid/anyblog.

Comments

Just to clean the engineers' image :)
(Though I am not one...)

The term 'finite' in the expression 'finite probability' is not in opposition to 'infinite', but 'infinitesimal'. It won't make sense to talk about infinite probability, but it is often the case that the likelihood of an event is evaluated via limits. Once the value of a variable approximates the limit, the probability of an event is shown to be zero.

Posted by: David Orban at December 12, 2005 11:56 AM

Aswath,

I won't come swinging, but a huge difference is that here are no superndes with Peerio, and that Peerio is an enterprise offering initially and it's not likely that an enterprise will all go offline.

Also, the way Peerio distributes content versus the way Skype does are likely different, otherwise they would be infringing on Popular Telephony's patents.

Remember, Skype is a consumer communications platform first.

Posted by: Andy Abramson at December 12, 2005 02:28 PM

Doesn't it just suggest that Skype has made their space/bandwidth/reliability tradeoff badly? I mean there's LOADS of free space on the average laptop, and spreading the message out a layer or two and directing it to a few hundred buddies with one or two degrees of separation from the recipient can't be that difficult. To me this is just a matter of making the P2P network do it's own "O&A" -- identify isolated or overloaded nodes and send traffic elsewhere. Still a PhD distributed computing problem, but I think they can afford a few bright folk now, don't you?

Posted by: Martin Geddes at December 12, 2005 06:19 PM

Andy:

You say Peerio does not have supernodes and I say every node is supernode. :-)

As far all nodes going offline, it is all in deployment scenario. If they are standalone devices, yes one can assume they will not all go offline. But what if they are softclients? Nonetheless, the point is there is a non-zero probability that all the storage nodes have failed. This event may amplify if the operating assumption changes, if temporarily. I am not sure that one can device an operating policy to handle all situations, as Martin seems to suggest.

Posted by: Aswath at December 13, 2005 01:39 AM