How we analyzed 200K messages to design a chatbot

This piece was originally published at Chatbots Magazine.

We took 200,000 messages that a major real estate company received from site visitors over the course of six months, and did a deep-dive analysis to design a chatbot that addresses their needs.

We needed to determine:

  1. Most frequently asked questions/ requested service

  2. The topics of these questions, and the distribution of questions amongst the topics

  3. What the visitor was trying to achieve by asking these topics (intents)

Why did we decide to do this?

Put simply, we want to make data-driven decisions.

A client came to us with the chat history of 200,000 messages that their prospective or existing customers sent to a live chat service when they visited the website. These include both messages answered by the live agent or ones that were missed during offline hours.


We ran a data algorithm to categorize these messages into clusters of similar information. At this point, we cannot confidently consider them intents. It took human effort to put them into intents we designed the chatbot to fulfill.

The 200K messages came down to 90 FAQ’s, and through human effort, we put them in 3 buckets to design the user experience:

  1. Questions that can have an automated answer

  2. Questions that require a string of questions to get contextual information before answering

  3. Questions that require the bot to connect the visitor to a human agent for the accurate answer


What problem does it solve?

Previously, we had to do guesswork when deciding which conversation flows to build into chatbots. We didn’t have data to inform us of which dialogs would be most useful for users, because we didn’t know what they want to know.

How did we do this?

It’s a process called intent clustering. It helps analyze the chat history between two humans. We take those and put it through an algorithm that determines clusters of sentences with similar meaning, known as semantic clustering.This helps us prioritize which phrases to train the chatbot to understand and carry out a conversation (NLP training), and which ones should provide a static answer in a command-and-response fashion by training it in a question-and-answer service, in this case the QnA maker from Microsoft.

Who uses this?

UX/Conversation designers. Analysis like this enables user-centered design as it informs which user paths to build out. Our goal is to provide contextual answers to questions, something that chatbots are known to mishandle.

Bot trainer. They now know which intents are the most important. Trainers can prioritize training the bot to understand different phrases for topics they know are most asked about.

Clients. We provide an overview summary with major data points that show the type and frequency of mentioned topics, the number of messages that the live agents were not able to answer, and more.

What is the step-by-step process? Explain for a layman.

  1. The client or their third-party chat vendor provided the conversation scripts.

  2. We ran an algorithm that takes the conversations and processes the sentences within each conversation through a machine learning model that allow us to categorize the sentences by semantic meaning.

  3. We then run another algorithm that groups these sentences that are predicted to have similar intents. Note that at this point, the clusters may contain sentences that are not completely related to each other. They just contain similar information.

  4. Then, a human manually goes through these sentences and analyzes those clusters to assign tags that summarize the content of the cluster. We now have a list of separate intents.

  • Clusters vs. Intents: Clusters and intents have to be differentiated here. A cluster is a group of sentences with similar semantic meaning. You may determine that 10 sentences all seem to be about buying property. But because it’s an unsupervised machine learning algorithm, with no human context, a human has to go through them and validate that a certain cluster can be transformed into an intent.

5. The designer takes these intents and FAQ’s to design workflows to take people through when they ask a question or make a request.

We provided an overview to the client, with the main findings:

  • Main topics asked by customers and number of messages in each topic

  • Within each topic, the type of subtopic (ex. Number of people asking for agents’ contact information out of everyone asking for agents. Another subtopic could be agent availability or hours of operation.)

  • Frequency and distribution of messages among the top 10 topics

  • Comparison of topics asked about when agents were online vs. offline(either because it was after work hours or an agent was not available)

We found surprising data about the most popular topics:

  • Out of the top 10 message topics, half had to do with home prices, amenities and locations.

Three topics were responsible for half of visitors’ inquiries to the live chat service.

  • The next big topic was the tax rates for each location.

  • The least popular topic was about home lots, taking up only 3 percent of inquiries.

How do these algorithms work?

One algorithm takes sentences from the conversation and vectorizes sentences. At a high-level, vectorizing sentences assigns relations between words; it’s what allows the algorithm to determine that vegetarians eat vegetables, for example.

We then use another algorithm that based on the vectors, groups the sentences into clusters of similar semantic meaning.

What challenges did we face?

Assigning intents to clusters of information accurately requires manual human effort. When you have 100 clusters, you can imagine it’s no easy feat for the team.

In addition, sorting through 90 FAQ’s to come up with workflows that address user intents require cross-functional collaboration. We always came back to the goal of creating a user journey that serves their needs and complements the company’s internal processes; to create an efficient, pleasant experience with our design.

By using the best of automation and human effort, we’re proud to have enabled technology that solves a real problem.

Made possible by the dream team of developers and NLP trainer: Marcial Puchi, Manuel Candelaria, Luis Amador, Aldo Zuniga, respectively, at Klug.