More Accurate Call Transcriptions Available Now

December 08, 2016
Written by
Billy Chia


Phone call transcription is a must-have for any modern contact center. Not only do transcriptions play a role in training and quality assurance, but they also offer important insights into customer experience. Having the greatest possible accuracy is critical, because these words are driving business decisions. Anyone who’s experienced a failed autocorrect knows how errors in a conversation can change the meaning of what’s being said.

Today, we are increasing the accuracy of call transcriptions by making dual-channel recordings and the VoiceBase High Accuracy Transcription Add-on publicly available. For anyone using Twilio today, adding dual-channel recordings is a simple one-line change to your code and integrating VoiceBase takes only a few clicks in the Console. There’s no need to patch together multiple technologies and vendors to make it work. Read on to learn more about dual-channel transcriptions or get started now by jumping straight to the code.

What makes dual-channel recording transcription more accurate?

Background noise, spotty mobile coverage, and people talking with different accents can all contribute to a poor transcription. In a mono-channel recording audio from both the caller and the agent is combined in one track. This is less accurate because noise from one side can interfere with talking on the other side. Mono-channel recording also introduces new problems like cross talk. When both callers are talking at the same time, the transcription can become garbled.


With a dual-channel recording, each side of the conversation is recorded in a separate track. This is just like a stereo audio recording with different audio on the right and left channels. Some issues like cross talk are completely eliminated. Additionally, dual-channel helps to reduce errors from background noise because noise from one side of the call won’t interfere with the other side. All of this adds up to create a more accurate transcription.

Where do accurate transcriptions count?

Duplicate your best agents – Call recording is a great mechanism to see what’s working and what isn’t without the need to be live on a call with an agent. Transcribing recorded audio into text takes this a step further, greatly reducing the time needed to consume recorded calls.

Using machine learning to parse all of your call transcriptions is the best way to leverage this data at scale. With Twilio dual-channel recordings and the VoiceBase Add-on, you’ll automatically receive keywords extracted from your transcripts. This allows you to measure keyword density across many calls and note which keywords your most successful agents use.

Hear your customer’s voice – When a customer mentions a competitor on a sales call, it means something completely different than if an agent mentions a competitor’s name. When using Twilio dual-channel recordings and VoiceBase Add-on you’ll be provided speaker identification in the transcript. This makes it easy to use the agent side of conversations for training purposes and the caller side to hear your customer’s voice. Learn what your customers truly love and what they don’t like about your products.


The pricing for dual-channel recordings is $0.0025 per minute for generation (the same as standard mono recording). Storage of dual-channel recordings is $0.001 per minute (double the price of mono recording because the file is twice the size). VoiceBase transcriptions are priced at $0.015 per minute.

Get started with dual-channel and VoiceBase


There are currently two ways to record a Twilio call with with dual-channel:

Note: Dual-channel transcriptions are currently supported only by the VoiceBase High Accuracy Transcription Add-on. Twilio’s native transcription and the IBM Watson Speech to Text Add-on can process dual channel recordings, but do not take advantage of the two channels. (Dual-channel support for IBM Speech to Text Add-on is coming soon.)

I’ll walk through how to set up a VoiceBase dual-channel transcription with TwiML.

Configure a voice phone number

First we’ll set up a phone number with a simple call-forwarding TwiML Bin. Head to the Console buy a new phone number or configure an existing one. In voice section set Configure with to Webhooks/TwiML, set A call comes in to TwiML, and click on the plus to add a new TwiML Bin.



Give your TwiML Bin a recognizable name. I’ll use “Dual-chan Record.” Then paste the following code into it:

<?xml version="1.0" encoding="UTF-8"?>
   <Dial record="record-from-answer-dual">

Replace the +14155551234 with your phone number. Be sure to include your country code in the E.164 format. Click Create to save your new TwiML Bin. (Remember, you can always edit or update this TwilML Bin in the dev tools section of the Console.) Finally, select your newly created TwiML Bin and Save the phone number configuration.


Configure the VoiceBase Add-on

Head over to the VoiceBase High Accuracy Transcription Add-on section of the Console. Click to install the Add-on, accept the Terms of Service, then click Agree & Install.


On the Configure tab select the recording option(s) you’d like to use. In this case, <Dial>. Then, set the callback URL to a web server that can receive post requests. RequestBin is a great tool to use that allows you to configure and test webhooks quickly with no need for a server. Using ngrok with Twilio webhooks works nicely as well. If you’d like to set up a local server you can follow this tutorial to process Add-ons Recording Webhooks using Python.


And that’s it! Setting up Twilio dual-channel + VoiceBase is about as difficult as microwaving a Hot Pocket.

Test it out

Have a friend dial your Twilio number. It’ll forward to your phone and you both can chat.

Important Tip: Be sure to let them know they’re being recorded. Depending on your region it may be illegal to record a call without informing the caller. I’m not authorized to give legal advice in any region, so be sure to double check your local laws regarding phone call recording.

You’ll get a webhook with a JSON response similar to this:


To get the results of your transcription do a do an authenticated GET request to the payload url with your Twilio API key and secret. You’ll receive a JSON object similar to below with with each speaker identified. (I’ve truncated the individual words for readability.)  You can find more info the Add-ons Payload Subresource Docs.

         "text":"Speaker 2: Check Speaker 1: Test test test this is the side of one of the conversation Speaker 2:.  Yes this is side.  Two of the conversation.  Speaker 1:. ",
         "srt":"1\n00:00:08,41 --> 00:00:09,32\nSpeaker 2: Check\n\n2\n00:00:11,34 --> 00:00:16,27\nSpeaker 1: Test test test this is\nthe side of one of the conversation\n\n3\n00:00:21,02 --> 00:00:26,70\nSpeaker 2:. Yes this is side.\nTwo of the conversation.\n\n4\n00:00:28,82 --> 00:00:28,82\nSpeaker 1:.\n\n"
            "descriptive":"28.0 sec",


Introducing the Recording Status Callback

So, in addition to the transcript, you’d also like to get a copy of the recording file?

The transcription webhook will include a Recording field that contains the URL where your file can be downloaded. But, you can also be notified as soon as the recording is ready.

In the past, you’d have to poll the API to know when your recording is done, but now we support a recording-specific webhook. recordingStatusCallback contains all the relevant recording related information and is now supported for , , the Outbound API, and recordings.

Here’s our TwiML from above, updated with the recording status callback:

<?xml version="1.0" encoding="UTF-8"?>
   <Dial record="record-from-answer-dual" recordingStatusCallback=””>

And the response looks something like this:

CallSid: CAaabbcc
RecordingSource: DialVerb
RecordingChannels: 2
RecordingStatus: completed
AccountSid: ACaabbcc
RecordingSid: REaabbcc
RecordingDuration: 4


What are you building?

We’re very excited about the release of dual-channel recordings, improvements like recordingStatusCallback, and the availability of Recording Analysis Add-ons from VoiceBase and IBM. I’d love to hear more about what you’re building with these new tools and answer any questions you have. Tell me what you are up to by leaving a comment below or pinging @billychia on twitter.