Twilio Changelog | Oct. 23, 2025
Multi-language Detection Public Beta for Twilio Real-Time Transcriptions
TL;DR: Twilio’s Real-Time Transcriptions product has added Deepgram Nova-3 Multi-language speech model support, now in Public Beta, for use with either Webhook-delievered speech results, or with Persisted Transcript Resources and Twilio’s Conversational Intelligence.
What are Twilio Real-Time Transcriptions?
Twilio Real-Time Transcriptions allows you to transcribe live calls in real-time. When Twilio executes the <Start><Transcription> instruction during a call, the Twilio platform forks the raw audio stream to a speech-to-text Transcription Engine, which then provides streamed speech recognition responses back with each of the caller’s uttered phrases. Developers can choose to send that stream of speech recognition results to their downstream app through Twilio Programmable Voice, either using webhooks, or they can opt instead to send them to a configured persisted transcript resource on the Twilio Platform – and, in either case, Developers can opt to use either Google or Deepgram as the Transcription Engine providing those transcribed speech results.
What are the New Multi-Language-Detecting Capabilities of Real-Time Transcriptions?
Now as a Public Beta capability of Real-time Transcriptions, when Developers opt to use the Deepgram Transcription Engine and opt for Deepgram’s Nova-3 speech model, they can also select “Language = multi” – and then Deepgram’s multi-lingual Nova-3 speech model will detect (and programmatically return) the languages being spoken on the call from among the 10-plus languages that Deepgram’s Nova-3 Multi-language model supports today, and will also transcribe all the speech on the call in those (multiple) languages detected, as text in each of those detected (multiple) languages.
Customer benefits
With the streaming speech recognition capabilities of <Start><Transcription>, businesses can capture the full text of what all their customers are saying – whether to a human agent or an automated self-service AI agent or LLM – now, even across a mixed set of (multiple) languages, with this Public Beta language detection feature.
Multi-Language Detection and transcribing is perfect for capturing conversations:
where multi-lingual agents may be speaking any one of a set of languages they are fluent in with customers, but with an individual customers’ language not known apriori,
where the caller themselves mix languages – such as using a mix of Spanish and English, switching back and forth multiple times during the call depending upon caller language comfort and word complexity – and then in adding that all that speech data accurately to a caller’s customer record, be that in a CRM or another application/system built by the developer, and
where customer data collection via programmable outbound calling is the objective, either for follow-up, post-service, or post-care surveys, etc – no matter what combination of languages may be used in the prompting or in the called parties’ responses.
Twilio Real-Time Transcriptions allow developers to automate the capturing of customer speech data, programmatically, for each and every call (instead just having the data for an ad hoc sampling of calls), create a repository of structured data for those voice conversations with customers, and easily and cost-effectively stream the speech results to downstream applications during calls with customers.
More information:
https://www.twilio.com/en-us/speech-recognition
https://www.twilio.com/docs/voice/twiml/transcription#speechmodel
https://www.twilio.com/docs/voice/api/realtime-transcription-resource
https://www.twilio.com/docs/conversational-intelligence
https://www.twilio.com/en-us/voice/pricing/us (See “Conversational Intelligence - Transcription, Streaming (Real-Time) Transcription)