Call Transcription

Rate this page:

Call transcription is the conversion of a voice or video call audio track into written words to be stored as plain text in a conversational language. Call transcription can either be live - as a call or event happens - or based on the recording of a past conversation.

The Importance of Speech-to-Text Transcription

Call transcription is an important and powerful tool for business, training, medical, or legal reasons. As text has far more advanced search and analysis features available than audio, a text-based history of conversations is necessary (or superior) for many use cases. Additionally, real-time speech-to-text transcription services (such as Closed Captioning) are used to increase accessibility, improving understanding for people who are hard-of-hearing or new to a language.

Using Call Transcription In Your Business

When it comes to voice calls, call transcription is often used in a business context, for example, to improve training and feedback for call center employees. Logging the context and words spoken in a call can help you identify business problems algorithmically, making it easier to deploy resources in an evidence-based manner. Additionally, call transcriptions and recordings are valuable for legal purposes, where contemporaneous transcriptions, recordings, and notes are superior to other types of records.

Twilio makes it very easy to add call transcriptions to our Programmable Voice product. For recorded transcriptions, you can use our REST API's provisions to translate recordings to speech. Twilio additionally has a real-time transcription service with multiple language support and contextual analysis and Natural Language Processing support. Talk to Sales about your call transcription requirements for information on that product.

Legality of Call Transcriptions

Note that call transcription legality differs by locality. For some localities, transcribing recorded calls, recording calls or even transcribing real-time speech over a call or video is banned or requires informed consent by some or all parties in a conversation. Twilio cannot comment on the specifics of your local laws; you'll have to read the relevant laws or consult with your legal representation for your unique situation.

Dual Channel vs. Single Channel Recordings for Transcriptions

Because of differences in volume, accents, timing, and connection quality, the final mixed track of a voice or video call can often be unintelligible even for professional human transcribers. So-called Single-Channel Recordings only store the one final mixed track pre-transcription, which can vastly increase the eventual number of transcription errors - especially if participants are speaking at the same time.

Dual Channel Recording  and Call Transcription Flow

With the highest accuracy call transcription solutions, both (or all) sides of the call are recorded separately. With individual recordings, a Dual-Channel Recording solution (or Multi-Channel Recording solution) is superior for eliminating cross-talk and cancellation noise which would otherwise interfere with the final mix. It also prevents most (or all) misattribution errors.

See more about our dual-channel call transcription options, here.

Getting Started With Call Transcription

Twilio makes it very easy to get started with Call Transcription. The Gather or Record TwiML Voice verbs both support eventual transcribing, while our Phone Call Speech Transcription Product can help you with your real-time requirements. Also, speak to sales about Natural Language Processing and determining caller intent or sentiment in real-time.

More Resources

Rate this page:

Thank you for your feedback!

Please select the reason(s) for your feedback. The additional information you provide helps us improve our documentation:

Sending your feedback...
🎉 Thank you for your feedback!
Something went wrong. Please try again.

Thanks for your feedback!