Manolis Perakakis world

News, diary, journal, whatever

Multimodal mobile interaction – blending speech and GUI input (iphone demo) October 15, 2010

Update: Since Apple yesterday (Oct 5, 2011) announced full integration of Siri personal assistant in IOS 5, I think the title of this post could become: A Siri like (personal assistant) interface developed as part of my PhD research (focus on mutlimodal interaction), circa 2009 🙂

Well, it was about time for a new blog post after errrr…. almost 2 years!

These recent years were so exciting regarding mobile interaction, … I wonder how cool-est(!) the following years may be.

A few years ago I envisioned how speech modality would enrich (or almost supersede) the poor (of that time) mobile interaction experience by working on distributed speech recognition. Look ma(!) touch modality just won the game; it was so much simpler as a technology (well by today’s standards), error free & intuitive. iPhone really revolutionized the mobile interface by exploiting multi-touch input but speech as modality still has a bright future, not by replacing but by enriching mobile interaction.

So the question is: how to build interfaces that combine more than one modalities? Generally speaking, to successfully combine multiple modalities, one has to exploit the synergies that emerge when mixing these modalities. For example, in blending speech and GUI (touch) modalities the following synergies arise:

  • visual output (GUI) is much faster (and informative) than speech output (sequential); this is due to information bandwidth of visual and audio channels of human brain
  • speech input is usually much faster than GUI input (and also the more natural form of communication) A speech sentence can reveal info that would require many GUI actions to complete, e.g. I want to fly from Athens to London
  • speech input is inconsistent due to recognition errors! The same utterance spoken twice can yield different recognition results & fixing errors solely through speech may be difficult. Allow for easy error correction through extra modality instead! (e.g. GUI input)

Multimodal interfaces (interfaces that support more than 1 interaction modalities) thus may offer a richer user experience; they are more flexible and robust at the cost of greater design and implementation complexity.

The video is about a multimodal mobile interaction application demonstrating how to exploit speech and GUI (touch) modalities to enrich user experience. The application scenario is a travel reservation service. The user can use either GUI or speech input at each interaction turn, that is, selecting values from a list by touch or directly speaking, e.g. “I want to fly from Orlando to Chicago next Friday evening“.

This specific demonstration showcases 4 different interactions modes, one unimodal (GUI only input) and 3 different multimodal ones:

  • “Click-to-Talk”: user clicks speech button to talk
  • “Open-Mike”: speech input using voice activity detection
  • “Modality-selection”: default input modality chosen on modality efficiency; the system switches between
    “Click-to-Talk” & “Open-Mike” depending on current context to favor GUI or speech input respectively, .e.g. GUI input might be faster for short lists like date.

Note that the same (and also the simpest possible, e.g. one way trip without car/hotel reservation) scenario (New-York to Chicago, etc.) is demonstrated for all different interaction modes (Of course everything you can do with GUI you can do with speech). This video was shot to showcase the porting to iphone platform (with the help of V Kouloumenta); the platform also runs on PCs and various PDAs (e.g. Zaurus), since 2006.

This demo is part of my PhD work at Electronics & Computer Engineering Dept, Technical University Crete under the supervision of A. Potamianos. For more info you may refer to:
M. Perakakis and A. Potamianos. A study in efficiency and modality usage in multimodal form filling systems. IEEE Transactions on Audio, Speech and Language Processing, 2008.

Advertisements
 

Trolltech releases Qt Jambi! July 28, 2006

Filed under: C++,embedded,GUI,Java,programming — perak @ 11:31 pm

Trolltech just released Qt Jambi, a java library for the desktop version of QT. Development with QT & QTopia (QtEmbedded) had been a really exciting experience for me in the past (well that was 2002; with my QtJim QTopia jabber client). It was a cool paradigm shift away from Swing’s mess of that time.

Sometime in 2005 i ported Qtjim to SWT in just a day! I found SWT to be also extremely nice as an API. And yes, i am still waiting for those SWT bindings for Qt/Qtopia (SWT uses Gtk bindings). Some people also proposed that IBM buys Trolltech, so it seems Jambi was a way for Trolltech to fight back!

Now, i think it’s really nice to use the power of Qt directly from Java. I recall how painful, the porting of an AWT application to Zaurus was : I had to fight different QT bindings for CVM/J9 and the worse part is, i could not use the OpenZaurus ROM (had to stick with Sharp’s original or TheKompany’s ROM).

But will Jambi be a success or just add fragmentation?
Well, time will tell

pros :

  • exposes excellent QT, to java programmers (java on desktop might get finally real )
  • eliminates jvm-vendor QT-bindings lock-in

cons :

  • Only for J2SE 5!! (the real usefulness would be for the embedded space, trolltech folks! – hope they make it for embedded too!)

coolness factor : 4.25!