Manolis Perakakis world

News, diary, journal, whatever

Prime time for Distributed Speech Recognition? February 23, 2009

While an undergraduate student a few years ago I worked on Distributed Speech Recognition (DSR). The main purpose of DSR is to compress the acoustic features used by a speech recognizer, over a data (instead of voice) network, thus saving bandwidth (cost effective) and allowing the use of full speech recognition in mobile terminals. As it compresses acoustic features for speech recognition (not speech signal transmission/reproduction) purposes it can achieve very low bit rates. You can think of it as analogous of what mp3 is for music transmission and storage.

Depicted next is a simple overview of a DSR architecture (model 2). Note that the mobile terminals depicted are Symbian’s reference devices corresponding to smartphone, handheld and PDA respectively (Ooops too old images – it should be back in 2001; should upgrade to something like iPhone or Android …)

My work with Prof. V.Digalakis concluded that one can successfully take advantage of DSR with only a 2 kbps coding, which is an extremely low data rate. After that i ported the DSR engine to a Zaurus Linux PDA and made it work in real-time (a 16MB, 200 MHz StrongArm processor).

Although my recent work focus is now on Multi-modal (speech) interfaces I still keep an eye on DSR. It seems that with the emergence of powerful mobile terminals and the announcement of speech recognition support for Android and iPhone by Google, DSR might become soon a hot topic!

P.S. I just found out my DSR page is ranked 3rd by Google after W3C and ETSI. Holy moly!

Coolness factor: ?

 

The year of Augmented Reality February 23, 2009

Filed under: android,augmented reality,mobile — perak @ 5:24 am
Tags:

Wikitude AR Travel Guide

Untill now there was too much hype around augmented reality since except for some really cool demos and research prototypes no real end user apps existed. Well, it seems that with the emergence of power mobile devices, augmented reality will find it’s way to the public with mobile users to be the first. Wikitude Android App is one of the first ones, with many more following this year.

Coolness factor 5/5!

 

My new geek blog … January 26, 2009

Filed under: personal,technology — perak @ 7:20 am

Although I will keep updating this blog I will posting my geek related stuff to a new blog entitled: Manolis Perakakis geek Universe, enjoy!

 

Firefox cloudlet plugin January 21, 2009

Filed under: web — perak @ 8:49 am

In a previous post i said that Opera is one of the few non-open source programs I use due to it’s speed, standard compliance (100% Acid test), simple yet intuitive and extremely configurable interface.

I have (at last) finally moved to Firefox, since it has become fast, secure and has this enormous set of useful plugins. Some of the plugins i use try to resemble Opera a bit :

There are some really invaluable plugins like greasemonkey, zotero and ubiquity but the coolest one i have found so far is the cloudlet search plugin. It filters Google searches by Tag or site allowing not only to narrow down your query results but also to discover very similar content!

Coolness factor : 5/5!

 

Opera 9.5 beta March 29, 2008

Filed under: technology,web — perak @ 8:53 pm
Tags: ,

In a previous post I said that Opera is one of the few non-open source programs I use and gave the best credit for their browser. Well, since I upgraded to the 9.5 beta 1 (build 1643) I am reconsidering. It is by far the worst version of the Opera browser I have ever used! I really can’t believe they released such a buggy version to the public!

Here are some issues :

  • it doesn’t respect the cache limits: gradually took over all the (limited) free space of my home partition and I lost my (>5000) bookmark  and contacts file!!! What worse could happen, really?
  • it doesn’t respect address limit size: trying to write to the address bar may take a few minutes!!!
  • Opening loaded pages takes too much time and a 100% CPU
  • The program periodically freezes, also freezing everything!!! I get this nice message all the time: “(operapluginwrapper:9762): Gtk-CRITICAL **: gtk_widget_destroy: assertion `GTK_IS_WIDGET (widget)’ failed
  • I upgraded to this latest version in order to be able to surf sites using the latest Flash, but most of the time this also doesn’t work, so I use Firefox instead.
  • Dragging a web address to bookmarks, doesn’t save the creation time
  • Many more bugs tired to list here…

Please, Opera people get serious!

 

Ten reasons why we love Linux March 24, 2008

Filed under: Linux — perak @ 4:00 pm

I first used Linux in ’95 for the operating systems course. A couple of years later I permantely switched to Linux, getting rid of the Windows crap for two main reasons:

First, Windows was too damn unstable, to the point of being useless. I was too tired of the “blue screens of death” and maintenance was a hell.  Microsoft was even advertising “innovations” such as sym links and a file system with more than 16 character long names… (no comment here!)

Second, Linux was such a better development platform & a  pretty learning tool for wannabe geeks, you could not resist playing with!

I was trying to persuade people and friends to give Linux a try for a long time. This came up to be difficult even inside an academic institution, so I gradually gave-up.

Although much has changed since then (Windows finally became somewhat stable,  Linux got easier installation methods/better hardware support and wider adoption), the debate is not over yet!

Dan Martin in his post “Things I can do in Linux that I can’t do on Windows” gives 11 reasons why he likes Linux. Although i disagree with item 6 (thus 10 used in the post title), overall it is a nice article that can help some people to finally change their mind :)

 

My 15 minutes of fame! March 11, 2008

Filed under: HCI,interfaces,Multimodal,Speech,technology — perak @ 8:41 pm

Our work in Telecommunications Lab, at Technical University Crete (TUC) was featured in “Orizontes” documentary series of Kydon TV channel. We demonstrated some of our demos :

  • My work on multimodal interfaces (part of my PhD), including a travel. reservation multimodal (GUI + speech) application running on a Zaurus Linux PDA
  • The automatic video summarizer system (part of MUSCLE NOE european research project showcases).
  • An audio-visual (AV) recognition system (also part of MUSCLE NOE european research project showcases).
  • The multi-mic robust speech recognition demo (part of Hiwire european research project showcases).

We could not showcase the augmented-reality demo, we developed in cooperation with VTT (speech recognition integration), since we currently miss the appropriate hardware, hope we get it soon.

Some of these demos will go public, either by posting videos on YouTube or by releasing the source as open source in Sourceforge/Google code.

More on this as well as a more detailed description of the demos in following future posts!

Stay tuned!

 

Aibo, Lego mindstorms, Wii remote (wiimote), iPhone & Google’s Android! March 11, 2008

Filed under: HCI,interfaces,Multimodal,programming,robotics,Speech — perak @ 8:12 pm

What all these have in common? They will be my playground for a while …

I will have the chance to play with all of them during this samester!

As far as aibo and mindstorms are concerned, i will use them for the two robotics related courses i have enrolled in. Some possible projects I am thinking of :

  • Distributed speech recognition (DSR) : enchance the limited speech recognition capabilities of the aibo by exploiting the wireless link and a  speech recognition server.
  • Distributed image processing : enchance aibo’s limited machine vision capabilities by exploiting the wireless link and a machine vision server (similarly to DSR)
  • robot localization using multiple input modalities : machine vision + audio
  • enchanced gesture based interface or multimodal (speech + gesture interfaces)

Wiimote hacks for enchanced HCI, similar to these demos from CMU.

iPhone will be used,  to augment my speech & GUI multimodal interface prototype already  running on the Zaurus PDA, with the gesture modality.

Finally, i can’t resist from playing with Google’s  new Android platform,  for porting  various apps  I have in mind.

Whoa, my hacker alter ego will be definetely be back for good!!!

 

 

Fetch aibo, fetch! March 4, 2008

Filed under: technology — perak @ 9:22 pm

These Sony Aibo robots are really great, so cute and smart!

I just had the chance to play with them for a while (Automomous Agents course). They are part of TUC Aibo RoboCup team. They do rock! Depicted in the following pictures is Paionaios named after one of the five Kouretes brothers, who were ancient Cretan warriors.

AiboAibo looking for ibone!

On the righ picture, the Aibo is looking for its ibone, not so hungry it seems!

Coollness factor : 5!

 

My last course enrollments – Robotics! March 4, 2008

Filed under: technology — perak @ 8:43 pm

Perhaps it is because spring has already arrived and my energy levels go straight up. Perhaps it is because i will hopefully finish my PhD this year and i won’t have the chance to enroll to more courses any more.

Despite beeing extremely busy these days, I decided to enroll to three more university courses :

  • Computre Graphics and Virtual Reality
  • Autonomous Agents (AI + Machine vision)
  • Algorithms for Robotic Problems

Although robotics are interesting from a Pattern Recognition/Machine Learning/Machine Perception view, I am also interested in viewing them from a Human computer interaction perspective, e.g. aibos have a very limited vocabulary speech recogntion. By using Distibuted Speech Recognition one can efficiently offload the speech recognition task and thus being able to use a much more rich speech interface. Other modalities such as gestures, face tracking may also be applied for human-robot communication. Emotion is also of high importance, refer to the Johnny Walker commercial.

Back on the more “pure” AI/robotics problems. One interesting issue is the localization and map building problem, e.g., how autonomous robots can identify their location in an unknown environment.

Although most solutions are based on machine vision it would be interesting to investigate how helpful it would be to exploit multi-modal information e.g.,  sensor information from vision,  sonars, sounds. Another idea that came to me is to exploit the sensor information from more than one robots, to mitigate individual erroneous beliefs about location information a robot may have.

The robotics team in  our university is really a great one, they came second place in the Florida RoboCup competition with the Kouretes team last year!

Coolness factor : 5.0!

 

 
Follow

Get every new post delivered to your Inbox.