Two days back there was a post in OnSip blog describing a technique to scale stateful SIP Proxy servers. I am interested in this topic and have developed an untested design. So I wanted to study the post in detail. But alas, that post is not accessible, even though the tweets and RTs mentioning the post are still available. I have a vague recollection of the main points of the post:
In the absence of the details of the post, I thought I will document my thoughts on this topic and solicit feedback from you.
Inferred Skype Architecture
Though the full details of the architecture that Skype uses are not known, there is a general understanding on how it operates and scales. There is a central “name server” that Skype clients authenticate themselves at login. Subsequently, the end-point approaches a set of supernodes (one by one, from a list provided by the name server) that has capacity to act as a proxy to the client. It is not known how Skype ensures that the name server scales and is fault tolerant. But with the P2P architecture, Skype can add/remove supernodes at any time and they all will collectively maintain routing information among them. This allows Skype to scale. Even though this architecture allows Skpe to recover from failed supernodes, I think all the clients currently connected to a failed supernode must go through the login procedure to be assigned a new supernode.
Monkey Infers, Monkey Does
So my great idea is to replicate the inferred Skype architecture. So as a first step, we need to have a name server, but something more. Our name server will contain all the users’ information like userid and credentials. It will also contain information about all the deployed Proxy servers - IP address and set of clients associated with them. To ensure scalability and fault tolerance, the name server will be a Casandra system running on commodity servers. Additionally, it will store for each userid, information about registered clients and associated Proxy servers.
The set of Proxy servers are networked using Chord, a distributed hash table indexing. Also these Proxy servers will use Cassandra to store active session information.
When a client authenticates itself with the Name Server, the Name Server will provide the client with a set of Proxy servers. The client can register itself with one of the Proxy servers. If no Proxy server is available then it will request a new set of Proxy servers from the Name Server.
When a user initiates a session from a client, the associated Proxy server will query the Name Server for the list of clients of the called user and the associated Proxy servers. Then the originating Proxy server will use Chord to reach the Proxy servers associated with the called user.
The clients associated with a failed Proxy server will notice the failure. When that happens, the client will try to connect with another Proxy server from the list provided by the Name Server.
It is clear that the proposed system is highly scalable. A failed proxy server can be functionally replaced. Furthermore, all the active sessions will either be brought under control or will be forcefully closed by the Proxy server with the session information stored in the Casandra.
Recently a new VoIP related product called Obi110 was launched. I have not personally evaluated this product but Phoneboy and Tom Keating have reviewed it. Based on their review I am disappointed that the industry has once again failed to move the needle, but has opted to continue to be wedded to a service provider model.
Based on these two reviews, Obi110 looks like an ATA that can interface to PSTN, the company’s own VoIP service and an additional two SIP providers. Additionally, it can switch a call from one interface to another. In these respects it is very much like Phonegnome and Ooma. But unlike Phonegnome, it doesn’t use the PSTN phone number as the id for its service. This means users of this device have to provide their Obi ID to their contacts. On the other hand, like Ooma, it allows for a call originated at an Ooma to be switched to a far away box to be terminated at the local PSTN. But Ooma faced considerable push back because people were concerned about potential misuse. Obi110 addresses this concern by restricting this capability to a handful of pre-configured phone numbers, called “Circle of Trust”. But otherwise, the functionality and business model are replicating the good old wireline POTS.
There are so many consumer pain points that can be alleviated with proper consumer technology. For example we all have had frustrating experience of being put on hold when we call a call center. Shai Berger describes a service called virtual hold and apparently three approaches are being deployed. In particular one approach being pursued by Lucyphone is getting good press and at the same time has some potential privacy concerns. Of course this approach deployed as a consumer premise solution would alleviate the privacy concerns. Obi110 could have added this capability.
Fonolo made its debut by attempting to eliminate irritating IVR experience by offering a service that they call “deep dialing”. Here is another example of a real consumer need being met with an intermediary service. Consider an alternate approach: websites provide the key sequences to reach each leaf of the IVR tree, visiting consumers can pass the appropriate one to the Obi110 box (after all the browser and the box are on the same LAN) and the box dials aout the IVR sequence at the appropriate time. This way, the two end points eliminate the pain point without involving a third party.
We all are used to the benefits of SMS on our mobile phones. As Google Voice has demonstrated that this could be offered to landline numbers as well. But there are no indications that incumbents are even considering such a service. So a third party can step in and offer a form of SMS service if only appropriate equipments are at home. Since Obi110 is connected to the internet, it could receive the text messages sent by the service provider. If they also provide cordless phones like Ooma, the base station can deliver the text messages using DECT technology.
Like these one can add additional, useful services and capabilities to consumers. Of course the industry has consistently failed to offer any of them during the past 10 years. Many in the industry talk “Intelligence at the End” talk, but their walk is decidedly “bellheaded”. I hope this changes in the near future.
Last week a new iPhone app called Viber was introduced. In this initial version, the app can establish voice calls between two Viber users using 3G data or wifi connection. These calls are free of “in application” charges. But the application is very basic in its feature set: Missed Call Notification is the only feature that is supported in the first release. Initial reviews have been positive with reviewers pointing out the simple setup process, claimed frugal use of battery and high voice quality. In the first three days, there has been 1M downloads. So it is worthwhile to have a better understanding of this app. But please realize that I have not used Viber. This is not a review of user experience, but an analysis of its architecture.
A Brief Description of Viber
Viber is like many other VoIP services with 2 very important twists. Firstly it uses iPhone’s phone number for id. Secondly it uses Contacts stored in iPhone as the buddy list. Once the app is installed, it copies the local Contact list and appends a special icon to all the contacts who are also Viber users. The iconized list is updated whenever a person in the Contact list joins the service. From now on, this new Contact list will be used for calls. The app will use Viber service to call other Viber users and for all others, the app will direct to the native phone application.
1. Viber reminds me of PhoneGnome. It also uses landline number to derive the SIP URI it assigns to the device. Indeed they have patented this process. It is not clear to me whether Viber infringes on this patent or not. Apple’s FaceTime also uses the phone number as the id for its service.
2. This app is developed by the same team that had developed and operating iMesh, a P2P network. But Viber itself is not strict P2P. The devices talk to dedicated “proxy servers” (it is not clear whether the signaling protocol is a proprietary one or SIP). But as I had observed a long time back in the context of Skype, Viber is using iMesh technology to scale its infrastructure. Essentially they have integrated iMesh supernode and the required “proxy server” function. This way they can easily scale up and down as the number of active users vary.
3. If the app is running in the background or has been closed, the service will use Apple Push Notification Service to prompt the user of an incoming call. Once the user opts to take the call the app will be launched and the call will be answered. This has multiple benefits. First is this helps in extending the battery life. Secondly, this scheme reduces data consumption. Finally, this reduces load on the service’s infrastructure because only active devices will be maintaining signalling connection.
4. This use of APNS may introduce call setup delay, since the app may have to be launched, connected to the server before call can be answered. It is not known to me how long is the delay and whether some callers may prematurely abandon the call attempt. We have to wait for a while to hear anecdotal data.
5. An iPhone with Viber installed could potentially be involved in two calls one on Viber and another on the native app. It is not clear how and who handles the feature interaction. For example, if a user is on a Viber call and a call comes in on the native side. Will the phone ring or the user gets a call waiting tone? How does the user answer the call? Is UI different? PhoneGnome handles it beautifully. Again, we have to wait before we collect some empirical data.
Copyright © 2003-2009 Moca Educational Products.