September 11, 2005

Voice Services without Intermediaries

Last week Skype announced partnership agreement with three VoiceXML technology providers so that third parties can offer voice services to Skype users. This announcement has been widely commented on with some observing that the fee is exorbitantly high – almost a third is taken by Skype and another third by the technology partner. Skype claims that they bring in the audience. I leave the third parties to decide on their own whether they require Skype or not to bring in the audience. They also take the responsibility to collect fees, even though alternate mechanisms are available that charge much less. But the focus of this post is to suggest that there is an alternate way to offer content without the need of these VoiceXML technology providers and that this can be done with other clients as well.

First, we should note that these technology platforms have two components: playout the content stored in a website and recognizing the user’s speech to decide what needs to be played next.

The former requires a media player and a text-to-speech conversion (TTS) utility. Inexpensive TTS utility and good quality voice engines are available in the market. But let me caution you that voice engines require licensing fees. The lowest I have seen is $200 per year.

The latter requires an automatic speech recognition (ASR) utility. You need this to be of high quality. This where companies like Tellme excel. This is required if the medium of interaction is the standard telephone. But IM clients like Skype, Google Talk, Yahoo Messenger et al. do not require this utility. The customer can easily type in the preferences via the text window rather than speak it and then later interpreted by an ASR.

In other words, I visualize the following method of interaction when a customer wants to access a speech based content. The user initiates an IM session to the content provider and indicates the preference via the text window. The content provider plays out speech, possibly listing some choices. The user in turn can select an item and indicating the choice again in the text window and so on. In this way, the user sends information on the “text channel” and the content provider delivers information on the “speech channel”. This is another way of using the multimode communication system.

By the way, this scheme does not work for the current crop of ATAs as they bring down the user experience to the POTS phone. This one my long standing gripe about the ATAs.

If the network is truly “stupid” and the intelligence has really moved to the end, then let us resist inserting intermediaries as much as possible. Let us not turn into Bellheads!

Posted by aswath at September 11, 2005 10:17 AM
Related Posts Widget for Blogs by LinkWithin
If you do not have an OpenID, then please use www.enthinnai.com/unauopenid/anyblog.

 

Comments

Oy! Your description of multi-modal communication reminds of the earlier days of Yahoo Audio Chat in the Yahoo Messenger. Often, only one of the parties could talk, and the other had to resort of typing. That was fun!

What could be somewhat realistically multi-modal is the Yahoo! Shopping or photo sharing in the newer versions.

Posted by: Manoj Sati at September 13, 2005 01:27 PM

Have to agree, it's a bit perplexing. Shouldn't they be trying to do multimodal stuff that the telcos can't copy? Why compete with the incumbents when you can be a market of 1?

Posted by: Martin Geddes at September 13, 2005 01:34 PM

It might be difficult for telcos, but it is in the relm of possibility. After all, they had ADSI phones long time back. But the difficulty is in engaging the incumbents in what is possible.

Posted by: Aswath at September 13, 2005 05:08 PM



Copyright © 2003-2014 Moca Educational Products.