Adding Automated Speech Recognition for Phone Calls to Ruby on Rails Applications

May 30, 2018
Written by
Daniel Phillips
Contributor
Opinions expressed by Twilio contributors are their own

doJyHh5xFnyLIfC-eP0eI-UwOWDoRGyMjukbkeAYHUu4Fb_gjPPZOvPUropy2C0fX7326mqih6EV2Dk8ThtJgJav1_mFy3rIhvYFrMH0FIWLJdm6aNXLrsRBS2wTPJBuxyZzmF8-

With the advent of personal digital assistants and in-home, voice-controlled gadgets, voice technologies are on the rise. Working with voice and speech recognition technologies is a crucial skill to have, not just in emergent technologies, but also in robust, existing applications. In this tutorial, we will build an implementation of Twilio’s Automated Speech Recognition (ASR) service in a simple Ruby on Rails Application, in hopes of you being able to see the wide-reaching possibilities for other implementations in your own projects.

In this tutorial, we will build a simple “Feedback Service” that receives, responds to, and stores voice messages from user phone calls—using speech recognition—and then displays them for review at a later time.

Getting Started with Our Rails App

For this walk-through, we’ll be using Ruby 2.5.0, best installed and managed using either rvm or rbenv. Also, we will use PostgreSQL for storage and Bundler for package management. If you’re on Mac, I’d recommend using homebrew for installing these; otherwise, see install instructions for your environment in the documentation for each service. You’ll need the Rails gem installed as well, which you can do with:

 

gem install rails

 

All set. Start a new Rails application with:

 

rails new feedback_service -d postgresql
cd feedback_service

 

For this project we will need to install the twilio-ruby gem. This library allows us to interact with the Twilio REST API, as well as generate valid TwiML.

In your Gemfile, add:

 

gem 'twilio-ruby', '~> 5.7.2'</p>

Then, on the command line, run: 

<pre class="lang:bash decode:true ">bundle install

 

Lastly, we will need to install the command line tool ngrok. This tool allows us to expose our local development server to the internet by generating an endpoint that can be interacted with as if it were a production server. We will need this capability when we start to actually make test calls to our application. Instructions for installing ngrok on your machine are here.

Development in Rails

At a high level, the interactions between the user, Twilio, and our app will look like this:

Note: A code-complete example of this project is located on GitHub.

If you’ve worked with Rails before, you will be familiar with the different application environments: ’Development’, ‘Production’, and ‘Testing’. In this post, we will only be dealing with ‘Development’, though certain details would need to change if you were moving the work done here into a production environment.

Create a local database for development:

 

rails db:create

 

Our last piece of development setup will be to confirm that ngrok installed correctly and routing requests to our local server. Run the ngrok command after installing ngrok (see above) in your terminal:

 

ngrok http 3000

 

You should see some type of ngrok log, something like:

This tells ngrok to send all request to your localhost:3000, when it receives them at the created ngrok url (yours will be different and unique).

But wait, there’s nothing listening on your localhost:3000! To rectify this, in a new terminal, type:

 

rails s

 

Cool, but we need to make sure it’s working. In your browser, paste in the ngrok url and you should see:


Great! We are ready to get started building our Twilio integration.

Twilio Phone Numbers

We need sign up for a free account to use Twilio’s service to obtain phone numbers. From the top left dropdown menu, select ‘create a new project’. There are many ways to customize your project settings, but for now, let’s choose to ‘skip project settings’, navigate to the Twilio Console and create a new Project:
        

A fundamental piece of programming with Twilio is the programmable phone number. Acquiring a Twilio number is what allows the connection between the Twilio API, your users, and your app.

We need to get a number. Under manage numbers, choose “manage numbers” and follow the prompts for getting “your first Twilio number.”

Once you have purchased (for free on a trial account) your Twilio number, return to “manage phone numbers” and click on your newly-purchased number. You should see something like:


We need to tell Twilio that we want to “send” incoming phone calls to this number, and we need to be able to do this in a development environment (i.e. probably your local machine, rather than a production server). This is where ngrok comes in handy. When someone calls this number, we want to send the caller to our app, at a specific endpoint, and then we can further build how our app interacts with the caller from there.

We can grab our ngrok address (you still have your local server and ngrok running, right?) using option webhook as our base URI and say that we will route these incoming phone calls to /messages as a GET request (except you will use your own ngrok provided base url). As in:

Great work! Time to write the code for our feedback service.

Building the Integration

Start thinking about the web service interactions that will make this app possible. We need to have an interface for incoming calls. When a call comes in, TwiML is rendered, which tells the Twilio API what to do and what to listen for next. In our config/routes.rb let’s add:

 

get '/messages', to: 'twilio#index'

 

Now we need a corresponding controller and action, app/controllers. Let’s generate a controller with an index action:

 

rails generate controller Twilio index

 

Open the file app/controllers/twilio_controller.rb and add skip_before_action :verify_authenticity_token under the class definition. The file should look like this:

 

class TwilioController < ApplicationController
skip_before_action :verify_authenticity_token

  def index

 end

end

 

This gives us an appropriate action for incoming requests. The skip_before_action: method overrides the CSRF protection that Rails provides for us out of the box. Whilst we don’t need CSRF protection for a webhook, we would eventually want to verify that requests to this controller came from Twilio, but we can move on for now.

We have an endpoint for requests, but we need to have this endpoint return something that the Twilio number will be able to interact with. Enter: TwiML.

From the TwiML docs:

Twilio Markup Language (TwiML) “is a set of instructions you can use to tell Twilio what to do when you receive an incoming call or SMS.”

“When someone makes a call or sends an SMS to one of your Twilio numbers, Twilio will look up the URL associated with that phone number and make a request to that URL. Twilio will read TwiML instructions at that URL to determine what to do: record the call, play a message for the caller, prompt the caller to press digits on their keypad, etc.”

With that in mind, we need to take advantage of TwiML to both respond to a call, and listen for input, which in our case, will be speech from the caller.

What allows us to execute such magic is the TwiML <Gather> verb. <Gather> allows us to receive and collect different types of input from a caller, and determine what to do with it. We want our Markup returned from this endpoint to look like this:

 

<Response>
  <Gather action="/messages" input="speech" method="POST" timeout="2">
    <Say>What is your message for Daniels Banana Cabana?</Say>
  </Gather>
</Response>

 

The attributes on <Gather> allow us to give details on handling the call to Twilio. action="/messages" and method="Post" instructs Twilio to send the gathered speech as a POST request to /messages. input="speech" indicates spoken speech input, and timeout="2" tells Twilio to wait 2 seconds until ending the call if no input is received (typically, Twilio defaults to timing out after 15 seconds of non-speech). The nested <Say> verb is speech that will be played to the caller.

This is a simple example, but TwiML, like other Markups, can become a drag to write on larger projects. Luckily, we have the twilio-ruby gem, which abstracts TwiML and lets us write clean Ruby code that creates TwiML for us!

In the aim of keeping responsibilities discrete, we will build out a specific service for building the Twilio interface layer in app/. Add add a /services/ folder and a twilio_service.rb file.

We can use the tools given to us by the Twilio-Ruby gem to render TwiML on an instance of this class; as such, the instance methods we build for this class will indicate how we want to interact with the caller. Thinking about the behaviors of our caller and our app, we will build out the structure of this class accordingly:

 

class TwilioService
  def initialize
  end

  def get_speech
  end

  def say_goodbye
  end
end

We know, at the very least, we need to “get” the speech from our caller, and “say” goodbye.

Time to build our response object. In our initialize method, type:

 

  def initialize
    @response = Twilio::TwiML::VoiceResponse.new
  end

 

This gives us an instance to work with to build our Twilio response interface. Remember the markup we looked at above? Well, we can build that in Ruby like this:

 

  def get_speech
  @response.gather(input: 'speech', timeout: 2, action: '/messages', method: 'POST') do |gather|
      gather.say('What is your message for Daniels Banana Cabana?')
    end
  end

 

To put this to work in our twilio_controller, type:

 

  def index
    @speech = TwilioService.new.get_speech
    render :xml => @speech
  end

 

To make sure Rails detects our Twilio Service, let’s restart our dev server. With both our local dev server running and ngrok listening for requests, open up a browser tab and navigate to /messages. You should see:

 

<Response>
  <Gather action="/messages" input="speech" method="POST" timeout="2">
    <Say>What is your message for Daniels Banana Cabana?</Say>
  </Gather>
</Response>

 

We are almost ready for a test call, but we are still missing one piece. In the above markup, we are still telling Twilio to send the gathered speech to /messages with a POST request. In our config/routes.rb we’ll add:

 

post '/messages', to: 'twilio#create'

 

And back in our Twilio Controller:

 

  def create
  end

 

Great! Now we are ready to receive calls.

Our First Test Call

We are set up to receive calls to our app, but we don’t really know what the data looks like as it comes in. To do this, let’s take advantage of one of the coolest debugging tools in the Ruby ecosystem: pry.

In your Gemfile, add:

 

gem ‘pry'

 

Then on the command line:

 

bundle install

 

We can now catch a call with our debugger. In our empty create action above, throw in our pry debugger:

 

  def create
    binding.pry
  end

 

Let’s recap the process before we run a test call.

  1. Someone calls your Twilio number.
  2. Twilio re-routes this call to your specified apps endpoint; in our case, it is our /messages endpoint as a GET request.
  3. When the request comes in, TwiML is returned. This tells Twilio how to handle the call.
  4. The TwiML that we rendered using our twilio-ruby library indicated that the speech from the caller that is recorded (and parsed) by our <Gather> verb should be sent to a /messages endpoint as a POST request. This is where our ‘create’ action exists on our Twilio controller, which currently only houses our debugger.

Cool? Let’s make a call.

After calling, you should eventually hear ‘What is your message for Daniel’s Banana Cabana?’ or whatever you put in your attribute.  In a test call, you could say something like “This is a test message”. If everything worked according to the process we just outlined, in our development console, we should have hit our debugger:

 

pry(#<TwilioController>)>

 

The <Gather> verb sends data over url parameters and Rails gives us a params object that will show us what was sent over. If you specify a key you should see our spoken message:

 

pry(#<TwilioController>)> params['SpeechResult']
=> "This is a test message."

 

Also worth looking at is parameters["Confidence"], which Twilio provides as a score between 0 and 1.0 to indicate the likelihood that your speech transcription is correct.

 

pry(#<TwilioController>)> params['Confidence']
=> "0.83758247"

 

With that, your app has automated speech recognition! Next we’ll talk through storing and displaying these messages.

Storing and Displaying Messages

We’re not going to ship an app with a debugger in it, and what good is a message service if we can’t store and see our messages? So, let’s do the following:

In your console, type: 

 

rails g model Message caller body

 

This will generate a migration for a Messages Table, along with a Message Model, with the attributes “caller” and “body” on the Model. Run the migration with:

 

rails db:migrate

 

Now, back in our create action in our twilio_controller.rb, we should take out our debugger and replace it with something that works to persist our message to our new message table.

 

  def create
    if params['SpeechResult']
      asr_message = params['SpeechResult']
      asr_caller = params['From']
      Message.create!(caller: asr_caller, body: asr_message)
    end
  end

 

This allows us to create a message, unless no “SpeechResult” parameter is present.

We still don’t have a graceful way to end the call, so let’s add a “say goodbye” method to our TwilioService class:

 

  def say_goodbye
    @response.say('Thank you, this call will now end')
    @response.hangup
  end

 

And in our controller,

 

  def create
    if params['SpeechResult']
      asr_message = params['SpeechResult']
      asr_caller = params['Caller']
      Message.create(caller: asr_caller, body: asr_message)
    end
    @speech = TwilioService.new.say_goodbye
    render :xml => @speech
  end

 

We can think of our message service as a way for users of an app to call and leave feedback about a product or service. To make it more useful, we can incorporate the idea of an “inbox” that an admin could review.

Generate a new controller with:

 

rails generate controller Inbox index

 

In routes.rb, replace the line get 'inbox/index' with:

 

get '/inbox', to: 'inbox#index'

 

Open app/controllers/inbox_controller.rb and load the messages in the index action:

 

class InboxController < ApplicationController
  def index
    @messages = Message.all
  end
end

 

This code allows our application to send a collection of messages to the HTML template layer. To complete this view create an erb file under app/views/inbox/index.html.erb (or copy it from GitHub) with the following markup.

 

<% @messages.each do |message|%>
<div class="comment">
    <div class="comment-avatar"></div>
    <div class="comment-author">Caller: <%= message.caller %></div>
    <div class="comment-text">
      <%= message.body  %>
        <div class="comment-date"><%= message.created_at %></div>
    </div>
</div>
<% end %>

 

Add to your application.css file as well:

 

body {
    font-family: "PT Sans", sans-serif;
    font-size: 14px;
    color: #777777;
}

.comment {
    position: relative;
    margin-top: 30px;
    margin-right: 20px;
    margin-left: 50px;
}

.comment-avatar {
    width: 70px;
    height: 70px;
    background: #7f8c8c url("http://icons.iconarchive.com/icons/icons8/windows-8/48/Mobile-Phone-icon.png") no-repeat 50% 50%;
    position: absolute;
    left: -40px;
    top: 0;
}

.comment-author {
    margin-bottom: 5px;
    padding-left: 45px;
    padding-right: 20px;
    font-size: 16px;
    font-weight: bold;
}

.comment-text {
    padding: 12px;
    padding-left: 45px;
    background-color: #f8f8f8;
    border-bottom: 5px solid #e5e6e6;
}

.comment-date {
    margin-top: 5px;
    font-size: 12px;
    color: #bdc3c7;
}

 

Make a call, and leave a message for “Daniel’s Banana Cabana”, saying:

“I love your amazing Cabana!!”

Navigate to localhost:3000/inbox (or your ngrok endpoint) and you should see:

What’s Next?

Congrats! You just integrated Twilio ASR into a Ruby on Rails Application. This is just the tip of the iceberg of what’s possible with Twilio’s voice services. If you want to dig deeper into automated speech recognition with Twilio, I’d recommend looking into adding Partial Result Callback to fine-tune your speech recognition integration.