This essay is developed based on attempts to summarize the current state of speech and pen input technology and to identify its strengths, limitations and lastly, report on the key multimodal research challenge.

“As applications generally become more complex, a single modality does not permit the user to interact affectively across all tasks and environments. A multi-modal interface offers the user freedom to use a combination of modalities or to switch to a better-suited modality, depending on the specifics of the task environment.”

        Multimodal technology can be useful in many different environments such as multi-modal interaction for people with disabilities, multi-modal interaction for distributed applications, multimodal systems is emerging in which the user will be able to employ natural communication modalities, includingvoice, hand and pen-based gesture, eye-tracking, body-movement.

        Multimodality allows taking benefits in an optimal way of the human communication capacities. Multimodal interface aim at integrating several communication means in a harmonious way and thus make computer behavior close to human communication paradigms, and multimodal is very easy to learn and use

        Major evolution in new input technologies and algorithms, hardware speed, distributed computing and spoken language, and spoken language technology in particular all have supported the emergence of more transparent and natural communication with this new class of multimodal system. (Designing the user interface for multimodal speech and pen-based gesture applications, 2002, p422).

        Pen input technology have advantage of allow users to engage in more powerfully expressive and transparent information-seeking dialogues in human language technology form. Speech is the preferred medium for subject, verb, and object expression. Compare with speech-only interaction to speech and pen interaction for visual-spatial tasks, multimodal pen or voice interaction can result in 10 percent faster in completion time, 36 percent fewer task-critical errors, shorter and simpler linguistic constructions, 90 to 100 percent user preference to interact this way, and 50 percent fewer spontaneous disfluencies.

        Compare to unimodal recognition-based interface, multimodal interface design has particular advantageous feature which is can support superior error recovery. There are both user-centered and system-centered reasons why multimodal system facilitates error recovery. First, in a multimodal interface users intuitively pick the mode that is less error-prone. Second, in a multimodal interface user language is often simplified. Third, users intuitively switch modes after an error, so the same problem is not repeated. Fourth, users report less subjective frustration with errors when interacting multimodally, even when errors are as common as in a unimodal interface. Lastly, a well-designed multimodal architecture can support mutual disambiguation.

        While there are a lot of large individual have different way to communicate or interact with the computer, a multimodal interface allow users to control or to make their on selection how to communicate or interact with the computer.

        In conclusion, interest in multimodal interface design growing largely by the goal of supporting more transparent, flexible, efficient, ease use, and powerfully expressive means of human-computer interaction. Multimodal interface is important nowadays not only very useful for difference ages, skill level, or even for disabilities people, but also in dealing with business environment. With multimodal interface system business environment will be more efficiently for example word processing using the speech recognition or pen input technology.

        However, there are several limitation for the multimodal system, which is  speech and pen input systems are not cost effective in other word still relatively expensive, both in terms of software, additional hardware needed and memory requirements, some care is needed before deciding that speech and pen input will benefit a particular user. And multimodal interface system needs to adapt so that their robustness can be enhance. Therefore there are two candidates for system adaptations are user-centered and environmental parameters.

Reference List

  • Bolt, R.A, (1980). Put-that-there: Voice and gestures at the graphics interface, Computer graphics, 14, 3, 262-270
  • Cohen, P.R., McGee, D., Oviatt, S., Wu, L., Clow, J., King, R., Julier, S., and Rosenblum, L., (1999). Multimodal interaction for 2D and 3D environments, IEEE Computer Graphics and Applications, 19, 4, 10 -13, IEEE Press
  • Landay, J., Larson, J., and Ferro, D., (2002). Designing the user interface for multimodal speech and pen-based gesture applications: State-of-the-art systems and future research directions, In Carroll, J.M. (Ed), Human-Computer interaction in the new millenium, New York: ACM Press, Addison-Wesley.
  • http://www.lobby7.com/press_121001.htm

