Building Better Phone Trees With Twilio Using Multivariate Testing


In our earlier blog post, Twilio for Phone Trees, we showed you how to create a phone tree using Twilio, Ruby, and Sinatra. In this post, we are going to apply one of the web developers most powerful tools, multivariate testing, to many company’s favourite communications tools, the phone tree.

Multivariate testing will allow us to answer a lot of different questions about our phone tree:

  • Are the options in the phone tree as clear as possible?
  • How many callers complete their call using self service options?
  • Are there steps we could improve in the phone tree?
  • How many steps does a user need to go through on average?
  • Are we improving the experience with each new release?

The Science

The idea of multivariate testing is simple: You start with a hypothesis that changing one aspect of your website will improve a metric that determines success. For example, suppose we have a website that sells toy owls. We currently have a picture of a Barn Owl on the site, but our hypothesis is that an Snowy Owl would lead to more sales.

To test our hypothesis, we would show half our users the Snowy Owl and the other Half the Barn Owl. Then for each sale we record the type of owl displayed. Once we have a few weeks worth of data, we can draw up a simple comparison:

Type of Owl Displayed Number of Visits Number of Sales Conversion
Snowy Owl 500 75 15%
Barn Owl 500 42 8.4%
Total 1000 117 11.7%

From this we can see that the Snowy Owl leads to almost twice as many sales as the Barn Owl. Our hypothesis has been validated by measurable facts. We should replace the Barn Owl with a Snowy Owl. We’re using the scientific method to improve our product or service.

Applying the Science

How can we use multivariate testing to make our IVR better?
What if we were able to have two phone trees and randomly assign customers between them? We could do all kinds of interesting experiments with multivariate tests:

  • How many customers complete their call entirely on self-service?
  • How many speak to an operator?
  • How long do the calls last?
  • What is the mood of the caller when they speak to an operator?

We don’t actually need two complete phone systems, we can use a single phone tree and adapt it to use different content for certain parts of the system. Rather than build a new IVR from scratch, we can get the code from my previous post Twilio for Phone Trees. In that post we described how to build a simple IVR system in Ruby and Sinatra. The code in that post provides a good framework for us to modify to use different variants of each step in the menu, so we can scientifically test our hypothesis.

The step in the menu we want to test allows a user to request a specific piece of information. Your phone menu may allow callers to ‘hear your account balance’, but my phone allows them to hear ‘how many owls there are’. My hypothesis is that the following is not a very clear menu option:

You need to hear how many strigiformes we have, 1.

I believe that the following text is much clearer:

To hear how many Owls we have, press 1.


Applying the Code

In order to get started we need a way of storing the variants. My preferred approach is to separate the content from the structure of the tree. Let’s start by changing the Step class we made last time. Previously, we used this class to generate both the structure and the content of our IVR systems. So that we can easily A/B test the content, we will create a new class called Content.

We now need to remove a line from the Step class that contains the ‘content’ of each step, and add a link to multiple Content records.

Instead of storing the spoken text for each step, we will simply associate one or more Content objects with a Step.

The ‘create_tree’ method of the Step class can now be simplified. We first create the Step objects that define the structure of our IVR:

Next we can create the content for each step. To allow richer experiments we can also specify the language and voice that we use to generate the TwiML:

Finally we can add a variant of one of these content objects to use in our experiment. In this case, we will test some different text for option 1:

We now have a complete tree with complete content. One of the step has 2 variants of the content. We could use the same approach to change the voice, or language we’re using. We could also add a variant to the Step class to allow us to easily perform experiments on the structure of the IVR. Let’s just work on the content for now.

Previously, we used the Step class to generate the TwiML. To keep our code clean we’ll use the Content class to generate the ‘<Say>’ in our TwiML. We will need a to_twiml method on our Content class:

In this method, the parent parameter is the parent XML element in the TwiML response. Before we modify the existing version of to_twiml in the Step class, we need a way of getting the correct Content record.

The purpose of this method is to get the correct Content record based on the stated variant. If there is no content for that variant we fail over to the default. As a result we can have multiple experiments running at once that only affect a tiny aspect of the tree.

We need to use the new get_content method of the Step class, and the to_twiml method of the Content class to render the TwiML. This is still handled by the Step class:

There are three core changes here. We use the get_content method to find the Content record and then calling the to_twiml method of the instance it returns. A subtle change the ‘action’ attribute on the <Gather> verb is to include the variant in the URL. This means our application is also stateless. We also provide a variant parameter, and that is itself a parameter of the to_twiml method on the Step object. Where does this come from? We’ll get this from the call_handler script itself.

As we’re finished with the Step and Content classes, we have to make a few changes to the actions in the our call_handler.rb. We need a global variable that holds all the variants we have configured. It would be much more sensible to store this in the database, but the hard-coded variable will suffice for this simple example.

Next, we need to modify the default ‘/step’ action for new calls to use the variant. We randomly select a variant using Array.sample. You may not want a 50/50 split, so use whatever algorithm you prefer to allocate a variant. We then need to pass this to the Metric we create for the call, and to the Step object.

We now need to change the second action that handles subsequent steps. We will change the URL to respond by Step ID and Variant. We’re going to change the Metric object to update the call record, rather than log every step. Finally we pass the variant (this time from the URL) to the Step class to render the content correctly.

We need to make a few simple changes to the Metric class. We had previously used it to track every step of the call, but now we are using it to track the outcome of each call. This means we have to update the records rather than create new ones. We have already changed the ‘/step/:id/:variant’ action above so now we need to change the ‘/fin’ action. This is used to update the state of the call after it completes using the Twilio Status Callback.

We only need to change a single line in the Metric class. We need to add a new property called ‘variant’.


…we’re done! There are quite a few changes so I’ve committed them all to a new GitHub repository. You can see the diff of all changes here.

If you’d like to try out a live version of this app you can call in and hear it on one of the numbers below:

+1 312-234-0386

+44 333 344 1070

You can easily see which of the two variants is most successful in the graph below.

That is how we can easily use Twilio to apply multivariate testing to our phone systems. We can use this to constantly make improvements and changes to our phone menus and provide ever improving customer service. Neat!