Voice Intelligence is currently available as a public beta release. Some features are not yet implemented and others may be changed before the product is declared as Generally Available. Beta products are not covered by a Twilio SLA.
Learn more about beta product support.
Twilio Voice Intelligence transcribes your call recordings to then generate data insights from your conversations. This document includes are some best practices for working with the recordings intended for transcription, assigning Participants to the transcripts, and using webhooks.
Using dual-channel recordings with Voice Intelligence provides not only a higher accuracy, but also adds the ability to map and override participants with additional metadata for search and business reporting. The following are guides about enabling dual-channel recordings in the different Twilio products.
- Twilio Call Recordings are by default dual channel. Learn more about dual-channel recordings in this blog post.
- To record a phone call, please follow the following guide to record phone calls.
- If you are a Flex user, learn more about how to enable dual channel recordings with Flex.
Any custom implementations that use Conferences to orchestrate a meeting need to change how the recordings are created.
By default, the conference recording is single-channel. To get a dual-channel recording, it’s recommended to record the
Participant leg of the call when the
Participant joins the Conference. Learn more how to create a Conference Participant with Record set to
The call leg being recorded would be on the left channel of recording, and all other participants will be mixed on the second channel. When recording a particular call leg, it’s recommended to record the call leg with the most call time to avoid incomplete recordings. For example, for an Inbound call, recording the customer's leg would ensure any customer utterances are recorded, even if the agent has not yet joined the conference.
Voice Intelligence supports third-party media recordings. If your call recordings aren't stored in Twilio and you want to use them with Voice Intelligence, the recordings need to be publicly accessible for the duration of transcription. The recordings can be hosted or better used on a time-limited pre-signed URL. For example, to share a recording on an existing AWS S3 bucket, please follow this guide. Then add the public recording url to the
media_url when creating a
If you use Twilio Video and want to transcribe the audio of a Twilio Video recording, it needs additional processing to create an audio recording that can be submitted for transcription.
To create a dual-channel audio recording first, transcode a separate audio-only composition for each participant in the Video Room.
Next, download the Media from these compositions and merge them into a single audio stereo audio.
In case the recording duration for each participant is different, you can avoid overlapping audio tracks. Use
ffmpeg to create a single-stereo audio track with delay to cover the difference in track length. For example, if one audio track last 63 seconds and the other 67 seconds, use
ffmpeg to create a a stereo file with the first track, with four seconds of delay to match the length of the second track.
Finally, send a
CreateTranscript request to Voice Intelligence by providing a publicly accessible URL for this audio file as
By default, Voice Intelligence assumes
Participant One is on channel One, and
Participant Two is on channel Two and associates a phone number from the recording. Since a recording can be created in different ways, this assumption may not work for all use cases.
For any such cases and/or the need to attach additional metadata to call participants, it’s recommended to use the Voice Intelligence APIs to create a
Transcript by providing optional
Participant metadata and mapping the participant to the correct audio channel.
CustomerKey with the
CreateTranscript API allows you to map a
Transcript to an internal identifier known to you. This can be a unique identifier within your system to track the transcripts. The
CustomerKey is also included as part of the webhook callback when the results for
Operators are available. This is an optional field and cannot be substituted for
Transcript Sid in APIs.
Use the webhook callback to know when a create
Transcript request has completed and when the results are available. This is preferable to polling the
GET /Transcript endpoint. The webhook callback URL can be configured on the Voice Intelligence