Interacting with VoiceXML applications via a Voice User Interface.

Authors Avatar

The Web has revolutionized how people communicate and share information. Businesses deploy millions of web services to consumers with Internet access. Internet and telephony used to be two separate technologies requiring a specialized telecom expert to build applications accessible over the phone. VoiceXML bridges the gap; it leverages the existing web infrastructure and enables web developers to build voice-enabled web applications accessible from any telephone, by anyone, anywhere, anytime.

Users interact with VoiceXML applications via a Voice User Interface (VUI) similar to the way they interact with traditional web applications via a Graphical User Interface (GUI). Poorly designed VUI frustrate users quickly, resulting in operator assistance or disconnected calls. It does not matter how powerful the application is or how many features it supports, if users cannot or will not use it. Therefore, a well-designed VUI is essential to the success of any voice application.

VoiceXML is designed for the rapid development of voice web applications, but it does not address usability. A quality VoiceXML application requires a well-designed VUI. This paper discusses the VoiceXML application development lifecycle focusing on application usability. The paper also describes key roles and skills required in each phase of the development cycle. It looks at how the HP VoiceXML tools help developers simplify the development process and improve the usability of their VoiceXML applications before deploying to the .

This paper targets Java/J2EE developers who want to put a voice interface on their existing web applications or develop new voice applications leveraging their web skills. For VoiceXML background information, please see . (Log-in required)

VoiceXML applications require humans to converse with a computer. Designing a usable VUI is more difficult than designing a GUI. Challenges include:

  • VUI is invisible. Information is transient; callers forget what they just heard.
  • Conversation is linear. Callers get very impatient listening to long prompts or hearing messages played repeatedly.
  • Caller is "lost in space." Callers get disoriented when they get distracted or when there are many layers of spoken menus.
  • Caller can say anything. Humans make mistakes. The big challenge is to cue the caller with clear wording so that the expected responses are received.

Callers hang up when they get confused, become impatient with long prompts, or think the application is not working.

Speech recognition is another VUI challenge. Speech recognition technologies have advanced dramatically, but speech recognition is still not perfect. Background noise and poor voice quality from poor telephone connections or low-quality microphones may cause the speech engine to misinterpret the caller inputs, leading the application to perform incorrect actions.

The VoiceXML application development life cycle is similar to that of a web application but includes VUI design and speech recognition tuning. The development cycle consists of six phases:

  1. Definition
  2. Design
  3. Development
  4. Testing
  5. Pilot/Tuning
  6. Deployment/Monitoring

VUI design and usability testing are critical for developing a successful VoiceXML application. Iterative usability tuning is required throughout the development cycle to build useful VoiceXML applications.

Key roles and responsibilities

Although VoiceXML is easy to learn, building a successful VoiceXML application requires not only software development skills, but other skills like understanding human factors for the telephone interface, linguistics, speech recognition and audio production.

Join now!

Definition phase

During the definition phase, the business managers and application architect collaborate to address the following key issues:

  • What is the purpose of developing the application (e.g. provide new service, solve customer problem, reduce operational cost, etc.)?
  • What is the business case?
  • What is the callers' profile (age, gender, general background such as immigrant, education level, etc.)? What is their usage profile? How will they use the application and what are their expectations?
  • What are the use cases, features, and required functionality?
  • What is the project scope?
  • What are the performance, capacity, ...

This is a preview of the whole essay