Build Scalable Interactive Voice Response with Twilio, AWS and Python

June 09, 2020
Written by
Meryll Dindin
Contributor
Opinions expressed by Twilio contributors are their own
Reviewed by
Paul Kamp
Twilion

Calaster 911 IVR Triage

Today, we're happy to host Meryll Dindin of CalAster, who prototyped an IVR to deal with emergency call centers saturation during rapidly changing situations. You can read more about the team's journey here, and you can find the Github repository backing this tutorial here. CalAster’s core technology helps emergency call takers efficiently avoid saturation through the classification and routing of incoming 911 emergency calls.

During crisis situations such as the current COVID-19 pandemic, call centers are overwhelmed and lack the human resources to deal with the sudden uptick in calls. However, new technology is available to build scalable filters that help avoid saturation.

Among other possibilities, there are IVR (Interactive Voice Response) systems that react to both voice or typed digits. In the case of COVID-19, life-threatening calls could be diverted to medical services, COVID questions redirected to green numbers setup by the government, and symptom checks partially automated and sent to medical students.

In this tutorial, we will build an autonomous IVR using Twilio Programmable Voice, AWS services, and the Python Flask framework.

Prepare your environment for a cloud-based Interactive Voice Response System

This tutorial is based on two major service providers: Twilio, and Amazon’s AWS. We rely on Twilio to process inbound calls through a phone number we purchased, while AWS is used to both provide voice samples and the background decision-making.

Good news for us – both services come with many free features, that I highly encourage you to get familiar with! If you haven’t yet, set up free accounts with both providers:

Purchase and configure a phone number

Once our accounts are set up, the first step on our quest to build this IVR is to purchase a phone number. Twilio makes it pretty straightforward to get one, independently from the country you live in.

Some countries have different regulations though, and purchasing a number in France requires more paperwork than purchasing a phone number in the United States. Nonetheless, go through the process and you will be equipped with a brand new phone number in the matter of a few clicks.

Flow diagram to purchase a number on Twilio

Set up a public AWS S3 bucket

We have one step left until we’re ready. The last requirement is to build an S3 bucket on AWS with public access enabled.

To do so, we simply need to log into the AWS console, look for the S3 service (or skip to the console here), and click on ‘create bucket’. Once redirected to the creation wizard, we can give it a fancy name, define its region of availability (it will be available anywhere, this is more of a latency optimization problem), and its public/private settings. Set it up with whatever name you like, and uncheck the “Block all public access” checkbox.

We are now all set with the prerequisites, let’s dive into the fun parts!

Interface your telecommunications with TwiML Bins

An IVR assumes some interactivity. Interactivity, in addition, assumes user input and a corresponding output. There are multiple ways of doing that, but in this post we’ll lean towards the simple and powerful – TwiML Bins.

Practically, a TwiML Bin consists of XML depicting if-else decision trees in a language called TwiML. You can ‘Say’ a batch of words, ‘Gather’ user inputs (either voice or DTMF), or even ‘Play’ audio files – among many other possibilities. In our IVR, we will set up a TwiML Bin that will take advantage of that functionality.

Set up your TwiML Bin

TwiML Bin flow diagram from sidebar

Once in your Twilio console, follow the clicks laid out above to create your first TwiML Bin. (You can also start from the TwiML Bin console and click the button.)

Before filling in the TwiML Bin configuration, we need to think through our scenario. Our use-case template is provided below, and depicts the following:

  1. Play a hosted introduction audio file to explain to our user what this IVR is all about thanks to the <Play> TwiML verb.
  2. Play a ‘beep’ sound to announce we will be gathering our user input through Twilio’s <Gather> function. This function has a few parameters: the input will be speech for human voice (it could be dtmf as it is also supported by Twilio), expected to be in French (fr-FR) with a timeout of 3 seconds.
  3. After the timeout, if voice was captured, Twilio will send the transcripted response to a specific callback URL through a POST request. Otherwise, if the <Gather> response was empty, it will play another audio file to handle the error case and retry the process.
  4. If this second attempt also ends up with an empty input, then the IVR plays a final audio message before hanging up.
<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Play>{ audio file - introduction }</Play>
  <Play>{ audio file - beep sound }</Play>
  <Gather input="speech" speechTimeout="3" language="fr-FR" method="POST" action={ callback }>
  </Gather>
  <Play>{ audio file - first error }</Play>
  <Play>{ audio file - beep sound }</Play>
  <Gather input="speech" speechTimeout="3" language="fr-FR" method="POST" action={ callback }>
  </Gather>
  <Play>{ audio file - final error }</Play>
</Response>

If you’ve got a sharp eye, you’ll see that we missed the definition of two objects of relative importance: the hosted audio files and the callback URL. The first part is solved through the S3 bucket we initially created, while the second will require a bit more explanation.

Go ahead and save the TwiML Bin now even though it will warn you about ‘Invalid TwiML’. We’ll return to it in a minute.

Make your audio recordings accessible anywhere

Before even thinking about our recordings, we have to properly draft the decision tree that will define our IVR. That is the key to an organized response – drawing the multiple routes will lead us to a prototype faster.

Adaptation of SAMU guideline IVR flow

In Calaster’s case for our demo IVR, we adapted the guidelines provided by the SAMU (medical emergency responders) in France for the COVID-19 pandemic, as they were suffering from saturated lines.

Their protocol relies on this storyline:

  • if your emergency is not about the coronavirus, then we will keep you queued
  • if you have simple questions about the virus then we will redirect you to a green number set up by the government
  • if you are worried about several symptoms, we will run a quick six questions symptom check whose answers will either redirect you to the emergency service (due to their gravity) or divert you to a medical student on a specific line who will assess the gravity of your situation with a medical judgment.

This simple decision tree provided us with a list of questions and a list of actions. We could then begin recording the corresponding audio files – which is great, since  a human voice is generally better perceived than a generated one.

Once your recordings are done, you need to upload them to your S3 bucket. Now, those files are available everywhere in the world – given that the right URL is provided to access them!

In our case, the URL had the following standardized format:

 https://calaster-funnel.s3.eu-west-3.amazonaws.com/{audio-file}

Example S3 bucket with audio recordings

Yours will be a bit different, but note the recording path since we’ll be adding it to the TwiML Bin in a few minutes.

Build a Python Flask application for interactivity

This section will be a bit more technical, and require basic knowledge of Python, Flask, and AWS. As we previously defined our questions and our actions, we have to add the logic that will give life to interactivity.

At the macro level, we are looking for a responsive endpoint that tells our TwiML Bin what to do depending on the input provided by our user. Meanwhile, let’s keep in mind that this response, as specified by Twilio, has to respect the initial TwiML Bin XML formatting.

Initialize your Flask app

If you haven’t yet, install Flask, then initialize a Flask application in an application.py file. Make sure that cross-origin is handled to efficiently communicate with Twilio and AWS services – in this app, that means you should also install Flask CORS.

Finally, install the lxml module. We’ll use its functionalities to build XML responses.

Now you’re ready. Insert the following code in your application.py file:

import os, re
from lxml import etree
from flask import Flask, request, Response
from flask_cors import CORS

# Server instantiation
app = Flask(__name__)
# Add a secret key to your application
app.secret_key = os.getenv('FLASK_SECRET_KEY')
# Handle cross origin properly - AWS requirement
CORS(app)

Template your XML response

Next, build an action template template.xml that gathers the shared XML framework that we will be sending to Twilio as a callback.

This template serves two purposes: walking deeper in our decision tree through the first <Gather>, while handling empty inputs with the second and third <Gather>. In the end, respecting our initial TwiML Bin’s template will ensure the continuity of our application.

<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Play>{}</Play>
  <Play>https://www.soundjay.com/button/sounds/beep-07.mp3</Play>
  <Gather input="speech" speechTimeout="3" language="fr-FR" method="POST" action="https://funnel.dev-calaster.com/{}">
  </Gather>
  <Play>https://calaster-funnel.s3.eu-west-3.amazonaws.com/error-01.mp3</Play>
  <Play>https://www.soundjay.com/button/sounds/beep-07.mp3</Play>
  <Gather input="speech" speechTimeout="3" language="fr-FR" method="POST" action="https://funnel.dev-calaster.com/{}">
  </Gather>
  <Play>https://calaster-funnel.s3.eu-west-3.amazonaws.com/error-02.mp3</Play>
</Response>

Build the Flask app routes

Our Flask application is now working as a web server, but we still have to build the routes constituting our service.

The initial route (POST enabled) will handle the Twilio POST request on our defined callback URL, taking as input the embedded transcription results provided by Twilio. Given our previously presented decision tree, this introduction route is being used to decipher between COVID and non-COVID emergencies. In the latter case, it keeps the user queued to the initial emergency service, while in the former it starts screening for a need for medical assistance.

All those routes will be added in the previously defined application.py file, and added to your app server. Due to Twilio’s expected formatting, the responses provided by those routes have to respect the format of TwiML Bins – that is to say, XML configuration files.

An example of such a route is provided hereunder – merely paste it under the content you have in application.py to add it to your app. (More examples are provided in our Github repository.)

Pretty convenient isn’t it?

# Decision Tree: Step 01
@app.route('/', methods=['POST'])
def step_01():

    # Retrieve the output of Twilio transcription
    src = request.values.get('SpeechResult').lower()
    src = re.sub('[^a-zA-Z]+', ' ', src)

    # This is a test, in your application you would write the full logic
    coronavirus_related = True

    # Test condition
    if {coronavirus_related }:

        snd = 'https://calaster-funnel.s3.eu-west-3.amazonaws.com/divert-samu.mp3'
        xml = '<Response><Play>{}</Play></Response>'.format(snd)
        xml = etree.tostring(etree.XML(xml), method='xml')
    
    # Walk deeper inside the decision tree
    else:
        snd = 'https://calaster-funnel.s3.eu-west-3.amazonaws.com/classification.mp3'
        url = 'need-for-medical-assistance'
        with open('template.xml') as f: xml = "".join([e.strip() for e in f.readlines()])
        xml = etree.tostring(etree.XML(xml.format(snd, url)), method='xml')
    
    # Attach the XML mimetype
    arg = {'status': 200, 'mimetype': 'text/xml'}
    return Response(xml, **arg)

Time to launch our app

Finally, it is time to launch our application locally, and see whether everything is running smoothly. The Flask application and the routes being described in application.py, we need to create the worker process, saved as worker.py:

from application import app

if __name__ == '__main__': app.run(host='0.0.0.0')

Next, if you haven’t yet, set an environment variable FLASK_SECRET_KEY. For example :

export FLASK_SECRET_KEY=hunter2

And you should see some cryptic XML output – our instructions to Twilio about how to handle the condition. Well done!

<Response><Play>https://calaster-funnel.s3.eu-west-3.amazonaws.com/divert-samu.mp3</Play></Response>

Take our Flask app to production

Having an application that runs locally is great, but having it hosted in the cloud is a whole different story. My usual go-to is AWS Elastic Beanstalk, which makes it super convenient to run python applications in the cloud. (You can read more about creating Elastic Beanstalk applications directly through AWS or another article of mine.)

Having a preference for the command-line interaction myself, let’s walk through classic EB creation together. Ensure you have added your current files to git already, and made at least one commit.

If it is your first time working with the AWS CLI, you will need to configure it first. The default configuration will run your application on a t2.micro instance, whose specifications are 1vCPU and 1Gb of RAM.

$ pip install awsebcli

$ eb create

Enter Environment Name
(default is tmp-dev): ENTER
Enter DNS CNAME prefix
(default is tmp-dev): ENTER
Select a load balancer type
1) classic
2) application
3) network
(default is 2): ENTER
...

 

You may ask for even greater control and robustness – in that case, I would recommend using Docker and gunicorn. The general build is presented on the project repo, but the Dockerfile given here depicts the steps to follow.

This setup will pull a Python3.7 image, install gunicorn and the packages required by our project (all from a requirements.txt file), and run 2 concurrent threads of the same  application called in worker.py (that number of threads is highly dependent on the specs of the instance you decide to spawn on AWS).

First, create the requirements.txt file through the PyPi module ‘pipreqs’:

pip install pipreqs --upgrade
pipreqs --force .

Now you can create the docker file:

FROM python:3.7-slim

MAINTAINER Meryll Dindin "meryll@calaster.com"

RUN mkdir /app
VOLUME /app
WORKDIR /app
COPY requirements.txt /app/requirements.txt
RUN pip install gunicorn
RUN pip install -r requirements.txt
COPY . /app

EXPOSE 5000

CMD [ "gunicorn", "-w 2", "-b 0.0.0.0:5000", "--worker-class=gthread", "worker:app" ]

Connect the dots and wire the app

Now that we have hosted our application on Elastic Beanstalk, our Flask endpoint will have an AWS URL of the form  http://{application}.{vpc}.{region}.elasticbeanstalk.com

It is official – we now have set up an access door to our audio files as well as our decision making algorithm!

Our interactivity is just missing one thing: the ability to properly inform Twilio how to handle things. Go back to the TwiML Bin console and edit your Bin, and update the placeholders. Add your S3 bucket’s endpoint (with the mp3s you recorded), as well as our callback application's first URL.

This will give you something similar to this code:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Play>https://calaster-funnel.s3.eu-west-3.amazonaws.com/introduction.mp3</Play>
  <Play>https://www.soundjay.com/button/sounds/beep-07.mp3</Play>
  <Gather input="speech" speechTimeout="3" language="fr-FR" method="POST" action="https://funnel.dev-calaster.com">
  </Gather>
  <Play>https://calaster-funnel.s3.eu-west-3.amazonaws.com/error-01.mp3</Play>
  <Play>https://www.soundjay.com/button/sounds/beep-07.mp3</Play>
  <Gather input="speech" speechTimeout="3" language="fr-FR" method="POST" action="https://funnel.dev-calaster.com">
  </Gather>
  <Play>https://calaster-funnel.s3.eu-west-3.amazonaws.com/error-02.mp3</Play>
</Response>

Final content of our TwiML Bin attached to our phone number.

You can try it with my code above, but remember to replace https://funnel.dev-calaster.com and https://calaster-funnel.s3.eu-west-3.amazonaws.com/ with your own URLs

Configure our Twilio phone number with our TwiML Bin

This next step is done on Twilio: update the configuration of our recently acquired phone number. By following the steps presented below (or, alternatively, starting from the Phone Numbers console), we redirect the voice calls towards the TwiML Bin that we configured above.

Webhook flow from Twilio console sidebar

The key is setting the Voice & Fax section to accept Phone Calls and to configure it with Webhooks, TwiML Bins, Functions, Studio, or Proxy. Then under A Call Comes In, select TwiML Bin and drag the right menu down to the Bin you created. (Our TwiML Bin was given the name calaster-funnel in this case.)

Now, by calling this phone number, you’ll be able to hear our sweet pre-recorded voice tree as hosted in our publicly accessible S3 Bucket! You did it – you are all done with the IVR!

Tricks & Tips: Securing and naming your AWS elastic beanstalk application

As you’ve seen, this application is hosted on a specific domain name (in my case, dev-calaster.com) and uses SSL (transcription in transit).

To get those fancy upgrades, you’ll need to do two things: purchase a new domain name and configure the elastic beanstalk instance. AWS did a great job describing the first suggestion in this tutorial. Regarding the latter, here are the required files to configure your instance (FYI: You will have to rename .os-ebextensions and .os-elasticbeanstalk given in the Github repository into .ebextensions and .elasticbeanstalk, respectively):

Example folder layout for Elastic Beanstalk project

The configuration files that matter the most here are https.config, which tells our NGINX server how and where to configure the protocol, and listeners.config, which tells your instance load balancer to redirect inbound traffic on port 443 to port 80 of our instance (then connected to our Docker image).

Both files are also available on the project Github. Using this configuration will enable you to use your custom domain name, activate the relative SSL certificates, and rely on pure HTTPS transmission of data.

(Security will be fully in-place with encryption between your load balancer and your Docker, as well as encryption at rest. But that is beyond the scope of this post)

option_settings:

  aws:elb:listener:80:
    InstancePort: 80
    InstanceProtocol: HTTP
    ListenerProtocol: HTTP
    ListenerEnabled: 'false'

  aws:elb:listener:443:
    InstancePort: 80
    InstanceProtocol: HTTP
    ListenerProtocol: HTTPS
    SSLCertificateId: <ssl-certificate-arn>

Building a cloud IVR using Twilio, AWS, and Python Flask

This tutorial provided a straightforward overview of what it takes to have a fully functioning IVR hosted in the cloud using Twilio and AWS. Retrospectively, both services are very user-friendly and make the prototyping very smooth.

Once developers are equipped, the rest is all about creativity! ;)

In our case, the services allowed us to build an IVR that had the potential to help desaturate an emergency response service, while allowing  authorities to better dispatch  resources to the callers. If you’d like to push your prototyping even further, I would suggest looking into the usage of multiple languages through Twilio (as easy as replacing ‘fr-FR’ with ‘en-US’) or the usage of pre signed URLs to keep your audio files fully secured at all times.

Meryll Dindin is an entrepreneur who loves taking on a variety of different roles and responsibilities. At the end of the day, he’s most concerned with transforming the world in an impactful and helpful way. To get in touch with Meryll, find him on one of these sites: