Why Bots Fail: The Importance of Usability Testing

“Some of these companies need to focus at least as much on testing their products as they do drafting press releases.”


Thomas Gouritin calls the lack of focus on user experience the “AI bulls***” — specifically getting one error message after another for tasks that logically should work.

His theory for this widespread problem?

Not enough focus on testing the product before launch, and way too much on the selling. Specifically, those press releases that tout their Artificial Intelligence-powered chatbots(capitalized, mind you) that in reality are programmed questions and answers.

His commentary is caustic, but accurate.

Reserving enough time to test is beyond valuable. It is essential to avoid becoming a screenshot of a failed bot on Twitter.

The goal is to account for both predictable and unpredictable errors. For the latter, the bot does not need to have a solution, but needs to route the user to a place or person with the solution.

1. Predictable Errors

Test ideal user paths and account for unforeseen edge cases to determine impact vs. effort of changes.

Happy paths are ones in which the user is able to complete a task smoothly by providing expected answers. These paths must absolutely work, because the bot is otherwise useless.

Example of a happy path: finding a restaurant using quick replies and expected inputs.

Example of a happy path: finding a restaurant using quick replies and expected inputs.

In edge cases, a user types or says something the bot was not designed to answer. For edge cases that are discovered during usability testing, we need to determine the value it provides vs. time and effort it takes that may compromise timely delivery.

Ex) When the bot asks the user what type of cuisine they’re looking for, the user may ask for restaurants with the least wait time. Here, the bot should tell the user that it can only give recommendations based on cuisine preference.

The user may misunderstand your bot as broken if you don’t provide an explanation that the feature is not supported at this time, as well as a call-to-action. However, in order tell them why something isn’t working, the bot must be trained to pick up why the error is happening. This is why training the bot to understand common user inputs is important for the overall user experience.

Otherwise, they’ll keep trying the same thing over and over again.

The novelty surrounding chatbots makes people’s expectation of chatbots much greater than what most bots can handle at the moment — and I’ve seen this first-hand in testing.

The point of usability testing is to gather insights, prioritize, then iterate; it is not to make changes to account for every finding.

2. Unpredictable Errors

Reserve time for these to arise, then prioritize again.

Then there are errors that we simply can’t predict, such as platform issues. The restaurant recommendation bot may experience issues pulling up database from Yelp and return an error message. Testing should ideally be done in a specified time window, after which you prioritize solving for usability issues over nice-to-have features.

How do you train the bot to respond to these errors?

The data should not be pretty; it should be comprehensive and reliable, like this example.

The data should not be pretty; it should be comprehensive and reliable, like this example.

Organize a spreadsheet that categorizes each intent* expressed by the user that the bot should understand, then gather a list of common phrases that they would type or say for each intent. One intent could be “unsupported feature,” under which you’d put common features requested, but unavailable in the bot. You can then train the bot to respond with a copy explaining that the feature is not supported at this time.

Example of a spreadsheet categorizing user intents and corresponding phrases

*Intents: what we determine the user is requesting when they say something to the bot

How to get the right help for testing

Take advantage of different expertise within your team: UI design, copy, NLP, and more. (Credit:  Framer )

Take advantage of different expertise within your team: UI design, copy, NLP, and more. (Credit: Framer)

Allow each team member to use their domain expertise and get involved in the right steps, so that you can create a prototype, deploy to a testing environment, test specific user flows, then iterate. Though you should leave time for design changes, accounting for different scenarios as much as possible from the start will cut the time required to iterate.

Let’s take this scenario: when a user types “help” while shopping for shoes, what options should the bot offer? Are they asking for “help” with navigating the bot or with the item they’re looking at, thus asking for a customer service rep?

The NLP Trainer (link to Aldo interview) would work with the Bot Designer (link to Diana interview) to come up with possible solutions:

  1. Direct user to menu options (one of which is talking to a human agent)

  2. Connect user to a human agent directly

  3. Ask what exactly user needs help with, then redirect

As you may have guessed, I usually go with the third option to avoid assuming the user’s intention.

Getting the answer to “what exactly are they likely asking for?” correct as often as possible — through usability testing and analytics — makes the bot “sticky,” encouraging users to use the bot again and again.

Like any good design process, the decisions must be collaborative and iterative.

In conclusion, you need to consider:

1. Realistic Users

Testing with the intended users informs designers of tweaks, and sometimes entire redesigns, that need to be made.

This includes users in the worst case scenario.

For chatbots, users in the worst case scenario would actually be not people who have never used the chatbot, but those who have used it andhated it.

We want to observe how those who are biased from previous experiences and those who are brand-new would use the bot.

2. Language

The tone should be in line with how the user speaks, and avoid information fatigue.

For text-based chatbots, the designer’s job is to ensure that text fatigue doesn’t hinder task completion.

As UX Designer Eunji Seo says, “Don’t make users go TL;DR.” Generally, anything more than three lines of text is too long.

3. Handling Errors

The goal is to have as few fallback messages (“Oops, I didn’t get that!”)as possible.

As mentioned above, this requires the designer to adjust wording or change the order of messages so the conversation feels natural and helps users achieve the task quickly.

*Fallback messages: messages noting the request can’t be understood, then often followed by menu options

4. Task Completion Rate

One way to test fatigue is through the 60-second test.

“Can users perform a certain number of tasks with just one hand in under 60 seconds?”

Customer satisfaction can be achieved when a bot strategy is carefully developed. At Wizeline, we define chatbot objectives, identify the software integrations that work best, build types of experience, and design intents and flows. After validating the experience of using the chatbot with clients, we get them involved in testing to check if we need to add NLP, edit copy of the messages, shift actions when certain messages are displayed, etc. We also make sure to determine a marketing strategy to launch the bot so that it can be discoverable. Talk to us here.