Manolis Perakakis world

News, diary, journal, whatever

Prime time for Distributed Speech Recognition? February 23, 2009

While an undergraduate student a few years ago I worked on Distributed Speech Recognition (DSR). The main purpose of DSR is to compress the acoustic features used by a speech recognizer, over a data (instead of voice) network, thus saving bandwidth (cost effective) and allowing the use of full speech recognition in mobile terminals. As it compresses acoustic features for speech recognition (not speech signal transmission/reproduction) purposes it can achieve very low bit rates. You can think of it as analogous of what mp3 is for music transmission and storage.

Depicted next is a simple overview of a DSR architecture (model 2). Note that the mobile terminals depicted are Symbian’s reference devices corresponding to smartphone, handheld and PDA respectively (Ooops too old images – it should be back in 2001; should upgrade to something like iPhone or Android …)

My work with Prof. V.Digalakis concluded that one can successfully take advantage of DSR with only a 2 kbps coding, which is an extremely low data rate. After that i ported the DSR engine to a Zaurus Linux PDA and made it work in real-time (a 16MB, 200 MHz StrongArm processor).

Although my recent work focus is now on Multi-modal (speech) interfaces I still keep an eye on DSR. It seems that with the emergence of powerful mobile terminals and the announcement of speech recognition support for Android and iPhone by Google, DSR might become soon a hot topic!

P.S. I just found out my DSR page is ranked 3rd by Google after W3C and ETSI. Holy moly!

Coolness factor: ?

Advertisements
 

2 Responses to “Prime time for Distributed Speech Recognition?”

  1. Mike Says:

    Just passing by.Btw, you website have great content!

    ______________________________
    Don’t pay for your electricity any longer…
    Instead, the power company will pay YOU!

  2. […] enrich (or almost supersede) the poor (of that time) mobile interaction experience by working on distributed speech recognition. Look ma(!) touch modality just won the game; it was so much simpler as a technology (well by […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s