Aswath Weblog: On selecting a signaling protocol to use in a WebRTC-enabled app

February 17, 2014

On selecting a signaling protocol to use in a WebRTC-enabled app

This is cross posted from EnThinnai Blog. Please post your comment at the original location. Thanks.

In a post that prompted me to write this, Tsahi discusses different alternative signaling protocols one can use in a WebRTC-enabled app. In this post, I approach the issue from a different angle and I hope this sheds additional light and helps you to reach a choice appropriate for you.

Before we dig deep, we have to recognize that we have to decide on two independent matters: 1) how will the signaling messages be carried and 2) what will be the signaling protocol. There are very many variables that will affect the optimal answer for your scenario. So it is best that we discuss them in general and let you decide on a case by case basis.

First let us consider the transport mechanism.

Pure HTTP: Since the app will be accessed from a browser, an easy choice would be to use HTTP as the transport. It works great if the browser is initiating a signaling procedure and the server responds.

HTTP w Long Polling/Comet: But there are times, when the server needs to initiate asynchronously. Some examples are when the server wants to notify one user of another’s action like placing mic or speaker on mute. Or the server would like to notify of an incoming call request. Since the server can autonomously initiate an HTTP session an alternate will be to use long polling or Comet. This may increase the load on the server due to excessive polling or may introduce latency and its undesirable effect on UX.

HTTP w Push Notification: Alternatively the server can use Push Notification offered by both Chrome and Firefox to push a notification and upon receiving such a notification, the browser can initiate an HTTP session to continue the procedure. Of course this addresses the server load, but does not address the latency issue, especially for “in-session” procedures. Worse, the latency is affected by a third party service.

Websocket: This where use of Webscoket has its advantages. Since Websocket starts as an HTTP session which is then converted to a persistent TCP session. Almost all browsers (most recent versions) support Websocket and there are server implementations that are very efficient. So it addresses both the issues.

Websocket w Push Notification: If maintaining a Websocket connection during an idle period (so as to inform of an incoming session request), then one can use Push Notification during idle periods and then use Websocket only during active sessions.

Data Channel w X: Final choice is for the server not to be involved during an active session, but allow the browsers to handle the signaling procedures directly between themselves via a WebRTC Data Channel. But this approach does not address how to handle notification during idle periods.

As you can see there are many choices with each having its own trade-offs. But knowing the trade-offs, you can decide the appropriate transport for your use case.

Deciding which protocol to use is either “no-brainer” or “not so fast”. If the paramount objective is to work with already deployed system and WebRTC app is just another access mechanism, then there is nothing more to consider. It is optimal just to use the signaling procedure used by the deployed system and that is that. Otherwise, it is better to start from scratch and ask questions differently. From the time of Q.931 in ISDN Basic Access up to and including SIP, the standards bodies have focused on defining the protocol so as to ensure interoperability between two autonomous systems. Since the end-points will be of different capabilities and present different user experiences, the best a standard can do is to design a protocol that drives basic user interface. Thus for example, when the far-end places a call on hold, the near-end is not notified. It is not clear how to abstract the notification so all variation in the UI can be handled.

Next, let me quickly dismiss a faux use case, but one that is widely considered. It is know as “trapezoidal connection”. In this connection, the two end points are each connected to its own WebRTC app and the two apps are federating between themselves. The fact that the two end-points are using WebRTC as access is incidental; the real crux is that the two apps are federating and they have agreed on a protocol for this. So what the apps will select for protocol belongs to the “no-brainer” category. The apps will select a protocol that is optimal for the agreed upon federation protocol.

So the real interesting use case is where the end-points are directly connected to the app server, the so called “triangular connection”. Since both the end-points are directly connected to the app server and the server can dynamically download the signaling procedures via Javascript, it is in a position to offer a rich user experience by dynamically driving UI elements. The app designer can freely devised the needed signaling procedures - conforming to a standards is not critical. A good analogy is to compare the choice to paint by number and free-form painting. At first glance, paint by number looks straight forward; but in fact it is tedious, no room for error and not very expressive. On the other hand, free-form painting, if you are good at it, is fluid, very expressive and gives lots of freedom. If the choice were only free-form painting, then I will have only blank canvas; with paint by numbers, there is a hope that I will have something that looks like a painting. So I say to each, his own.

Posted by aswath at February 17, 2014 03:43 PM

If you do not have an OpenID, then please use www.enthinnai.com/unauopenid/anyblog.

Comments