Transcripts, Search, and Keyword Spotting With Twilio And VoiceBase

Here’s a little exercise. When you get off of an important call, try to remember the last thing you said. Your words have a funny way of escaping you. VoiceBase makes it incredibly easy to programmatically pull up call recordings, powered by Twilio, to find the information you need. In this blog series they’ll walk you through integrating and managing VoiceBase in your app.

The following is a guest post from VoiceBase

voicebase-logo-2Your Twilio app is up and running, thousands of connections are being made daily, and your library of recordings is growing fast. For your customers, knowing what is spoken inside these recordings is absolutely critical for monitoring agent compliance, keyword spotting, trend detection and much more. By taking a comprehensive look through this content, VoiceBase provides tools for Twilio users to identify and distinguish between hot leads, opportunities, complaints, etc. Wouldn’t it be great to deliver this valuable information straight into your customers’ laps?  The VoiceBase solution adds value to any Twilio user whether you are a contact center, CRM platform, sales organization or conferencing solution.

In the past, the only way to gather spoken information was to have a human physically listen to every recording, write descriptions, and manually tag calls or fill out scorecards. However, this is an expensive and time-consuming process. Fortunately, we have a solution! Twilio’s simple integration with VoiceBase allows users to incorporate speech analytics to multiple layers of businesses at disruptively low costs. 

VoiceBase fully indexes calls, making all of your content searchable and discoverable within minutes of being uploaded. Users can search into the timeline of a recording to play the precise part of any audio or video file. VoiceBase also provides a Web SDK enabling easy setup for a slick end user display.

VoiceBase-Player-Plugin2 (1)VoiceBase’s powerful API is a cloud-based solution, with zero upfront costs.  Simple API calls allow VoiceBase to grab .wav.files from Twilio recording URLs. VoiceBase utilizes parallel processing to quickly index these individual recordings. The machine transcripts are then made available with time-stamped keywords and search capabilities within minutes. All of this data is made accessible as JSON responses.

We caught up with Greenrope CEO, Lars Helgeson, to talk with him about his own experiences with Twilio’s VoiceBase integration.

We will focus on how to efficiently use the VoiceBase API with Twilio to index a recording.  Check out the VoiceBase Landing Page here for more info on retrieving the keywords, topics, and transcripts as well as building end user displays using Voicebase and Twilio.

Indexing Content with TwiML

In part 1 we will be specifying an action URL in the TwiML so VoiceBase can be notified when a recording is complete.  We will then write some code to receive the end of call event and make an API call to VoiceBase to upload and index the recording. We will simply pass along the URL to the recording we get from the Twilio event as a parameter to the VoiceBase API. We will then specify another callback in the VoiceBase API so we can be notified when the indexing is complete.

Often in your TwiML you use the dial command.   For example here is a dial command that is being used for joining a conference room.  Twilio enables you to record the call with the record flag and specify the action to take when the call completes like this:

Or you can do something like this to record a conference call.

In the callback, you will have access to a link to the recording. The callback is where you can initiate the indexing request to VoiceBase, which we will show below. Alternatively you can find links to Twilio recordings through the Twilio API –

Let’s take a look at the code in indexConferenceRecording.php, which initiates the request to VoiceBase that eventually creates a transcript of your recording, indexes it so it is searchable and also extracts keywords from the recording.

First we will extract some important values that Twilio provides in POST form.

Then we set some of the values we will want to include in the API call to the VoiceBase upload method that will initiate the processing on VoiceBase servers.

Next we do the HTTP GET (POST is also supported).

And finally we get the JSON response from the API call.

A few words about the VoiceBase API’s upload method above. Every method has 4 required parameters in common.

  • API key and password
  • Version: Current version is 1.1
  • Action: In this case the action is uploadMedia

For upload media, we are passing in the following parameters:

  • mediaUrl: Location of the recording. Used at upload time for building the transcript, keyword extraction and indexing. Use the wav file.
  • sourceUrl: You can use this setting if you plan on storing the mp3 yourself.VoiceBase will use this URL for streaming and will not transcode the audio to mp4.
    If not set VoiceBase will transcode the audio into a standard mp4 format and store it for streaming purposes later. If you plan on using Twilio store and stream (using the mp3), we recommend that you set this.
  • transcriptType: The transcript can be either machine or human transcribed. It is machine by default.
    Human transcripts are more costly but can be useful for important calls.
  • Externalid: This ID is external from the point of view of VoiceBase. It allows you to use your own uniqueID’s to reference the recording, its transcript and other metadata. It is your responsibility to make sure the externalId’s are unique.
  • SearchHitUrl: By default the search method will return a link to the recordings player page on the VoiceBase server.
    You can use this field to override that URL. This is useful if you plan on using the VoiceBase Web SDK that includes a search component and a player component. More about this in a future blog or at
  • machineReadyCallBack: This is similar to the Twilio action parameter.
    This is a way to get notified when the machine transcript and associated processing has been completed.

    The response from VoiceBase includes a mediaId, which you can store and use to reference recordings, transcripts and keywords later. The externalId allows you to use your own unique Id and can be used in place of the mediaId in any VoiceBase call. In this example we are setting the VoiceBase externalId to Twilio’s RecordingSid . This way we do not have to store the mediaId and maintain a mapping between the Twilio ID and the VoiceBase IDs. You can do the same or use another unique identifier. It is your responsibility to ensure that the identifiers are unique. Or you can just use the VoiceBase unique mediaIds if you like.