3 Tips on Designing for Voice

“For voice recognition technology, grasping all the necessary contextual factors and assumptions in this brief exchange is next to impossible.” Ditte Mortensen, UX Researcher

Why start off with this dire statement?

This sets the expectation that designing bots requires you to embrace the pursuit, but not the expectation, of perfection. Because bots cannot retain context like humans naturally can, we must ask the right questions at the right time and be transparent in why we’re doing so.

Those designing the conversations we can have with a voice interface strive for:

  1. Making conversations sound natural

  2. Enabling users to complete an action with the fewest number of steps, and therefore, as little friction as possible

  3. Asking users the right information and avoid assumptions to provide customized, relevant information.

With these goals in mind, here is my check-and-balance system for designing bots:

  1. Use Time Wisely.

App builders fight for space on mobile. Alexa skill builders fight for time-- people’s attention span listening to a prompt.

This goes for both text-based and voice user interfaces. The same way space is of the utmost importance in text bots, time makes or breaks a voice experience. While we try to mimic natural conversations, we do have to keep in mind that we have less patience with bots than with human interactions.

When we see that someone is nervous, we can feel empathy for them and tell ourselves to be a bit more patient with trying to get an answer out of them. We don’t have the same lenience with technology devices.

So it must get to the point right away, and anecdotes that show personality need to be used sparingly to serve the purpose of the bot.

It’s the same when talking with humans, right?

There are expectations in back-and-forth conversations with humans we apply to bots as well. For example, we don’t appreciate customer service representatives that go on and on about their personal lives, what they had for dinner, etc. when all you want to do is get your router fixed. This applies to machine interactions as well, perhaps even more so. Make it clear what a user can do through the interface, provide action items, then confirm before the bot takes any decisive action.

2. Take advantage of sounds and conversation markers.

In some ways, designing voice can feel easier than designing for text. Designers can take advantage of the fact that they are designing for audio and the UI is simply words. Of course, there are separate factors within delivering these words: speech breaks, tone, pitch, volume, female vs. male, etc. I outline it in detail here, but here’s an example:

Speech breaks

After a lengthy sentence, put in breaks so you give user a bit more time to comprehend what they were told or what they were asked to do.

Ex. If someone asks where they can find the verification number for their credit card to confirm a purchase, the bot must provide instructions at a speed at which the user would perform the action.  

Screen Shot 2018-11-13 at 11.42.59 AM.png

Wouldn’t you do the same when you explain a step-by-step process to someone?

3.Guide the user and ask for what you need-- but you must justify it.

This is where a designer’s role is important to take advantage of all capabilities with the least impact from limitations, such as retaining context of the conversation.

For a user asking for restaurant recommendations, the conversation may go something like this:

Screen Shot 2018-11-13 at 11.39.06 AM.png

Here’s where the bot could run into trouble. At this point, the user may change the topic by asking how the bot makes these recommendations:

Screen Shot 2018-11-13 at 11.39.14 AM.png

The user now has their question answered about how the bot works, so she wants to go back to getting food recommendations. She wants to see what else is around, so she now asks,

“Well, how about Mexican restaurants?”

Uh-oh, here’s where the trouble begins. The Alexa skill did not store the fact that the person is currently in San Francisco, so it’ll ask,

“Got it, what city should I look for Mexican places in?”

The user will think:

……. I just told you.

Instead, to justify why they’re asking for this and show transparency, the bot can give a quick explanation.

Yes, it’s not ideal. But it shows transparency and makes your questions reasonable.

We apply these principles in designing bots because an automated system, whichever medium it’s on, needs to earn the right to be wrong.

The bot has earned this right if it has tried to clarify and ask relevant questions. It does not have the right to be wrong if it never asked the questions to help the user and it proceeds to expect the user to be understanding of repetitive questions without an explanation of how the answers to these questions will be used.

Maximizing shared knowledge is key to creating better conversations.

Without this mutual understanding, each conversation we delve into with machine creates more opportunities for errors that waste our time. In the end, we as users will have to dig ourselves out of the dark hole we’ve gone into because the right questions were not asked and thus, wrong information was provided. As bot designers, by putting ourselves in the user’s shoes with each iteration, we can design bots that serve our needs with a higher success rate each time.