Microsoft’s Windows Phone (WP), with version 8, brings the ability for apps to include voice recognition based on a controlled vocabulary. While this can be employed for any type of interaction between the user and the app, I am particularly interested in its potential for serving up Help content. Although voice-enabled interactions have been available previously in WP, iOS, and Android, what we have in WP8 is the first API that supports voice control for any third-party apps. Siri for the iPhone/iPad, for example, only works for the core Apple-controlled apps.

The recent Build Conference Microsoft held in Seattle back in late October provided a lot of great technical and design updates for Windows and Windows phone. One of the topics of interest to me was presented by F. Avery Bishop: Speech Enabling Apps for Windows Phone 8.

The Build session begins with a demonstration of a voice request to complete a specific task in a WP app. The interaction format is for the user to first state the name of the app and then say the task to execute. In the demo, the example for an app called “MagicMemo” was “MagicMemo, show memo number 5”.

The handling of that statement is a combination of voice recognition technology built into WP combined with a controlled vocabulary created by the app developer. A controlled vocabulary is a list of words and phrases that are likely to be spoken by the user. A content strategist assembles this list based on user research or search query data.

The controlled vocabulary resides in what WP calls a VoiceCommandDefinition file. This is an XML file where each planned voice interaction is represented as a separate section in the data set. The data includes: a command name variable, an example, one or more phrases to listen for, optional feedback, and the action to take in response to the command. The VCD is included with the app build and initiated the first time the app is run.

An example VCD file from the Microsoft documentation is shown below.

A Help scenario might be something like:

User: “Contoso Widgets, how do I get a text message alert for specials”

WP: “Showing Help for text messages”. (the screen display shows the appropriate procedure.)

The associated VCD code might be:

<ListenFor> [How] {widgetViews} </ListenFor>
      <Feedback> Showing Help for {widgetViews} </Feedback>
      <Navigate Target="/help_alerts.xaml"/>
    <PhraseList Label="widgetViews">
      <Item> alerts</Item>
      <Item> text messages </Item>

Design and Production Issues

The technical aspects of this voice command system are fairly straightforward. However, the success lies in the effective design of the controlled vocabulary.

Coming up with a phrase list that anticipates words spoken by the user is the key element. Existing index entries and search logs would be the place to start. As with indexing, attention to synonyms and acronyms is important. A usability test or survey of typical users could help to refine the phrase list. The content strategist will also need to identify the sentence structure to use for listening and feedback.

The scenario above shows the app responding by displaying a xaml Help topic. A smarter, more complicated design might be for the feedback to say what to do, rather than just interpreting the request. For example, “To set alerts for specials, say ‘Enable Text Alerts for Specials’”

As the power of voice-enabled apps grows, we will have to consider how to add support for the data in our content management systems. The example above represents commands and phrases in a hard-coded text file. A more efficient approach will have that information authored and stored in a database. The VCD file will become one more output format for our content management systems.

The use of a CMS will also be important for supporting translated content. The VCD file supports multiple language versions.

I am preparing a live demo of a voice-Help scenario to be presented at the WritersUA 2013 event in Seattle.


There are several useful articles posted on MSDN.

A very detailed companion article by Bishop:

The step-by-step documentations for voice command is also available on MSDN:

Here are some best practices for designing the voice content: