Twilio Changelog | Jan. 28, 2026

Word Timings and Multi-language Detection for Twilio Real-Time Transcriptions now Generally Available

TL;DR: Twilio Real-Time Transcriptions now supportsindividual word timings, and Deepgram’s Nova-3 Multi-language speech model,  for use with either Webhook-delievered speech results, or with Persisted Transcript Resources and Twilio Conversational Intelligence.

 

What are Twilio Real-Time Transcriptions?
Twilio Real-Time Transcriptions allows you to transcribe live calls in real-time. When Twilio executes the <Start><Transcription> instruction during a call, the Twilio platform forks the raw audio stream to a speech-to-text Transcription Engine, which then provides streamed speech recognition responses back with each of the caller’s uttered phrases. Developers can choose to send that stream of speech recognition results to their downstream app through Twilio Programmable Voice, either using webhooks, or they can opt instead to  send them to a configured persisted transcript resource on the Twilio Platform – and, in either case, Developers can opt to use either Google or Deepgram as the Transcription Engine providing those transcribed speech results.

What are the New Word Timings and  Multi-Language-Detecting Capabilities of Real-Time Transcriptions?
Now in Real-time Transcriptions, when Developers set enableProviderData to true, Twilio will also pass through individual timing information from our transcription providers (offset in seconds from ProviderConnectTime, also provided) for each word in an utterance that is transcribed, along with confidence scoring for each word. This timing information allows Developers to get a more granular picture of the timing of utterances in a transcript, to, for example, help them create a “texting bubble”-type presentation of the inbound (caller) and outbound (agents) tracks of a conversation.

 

And when Developers opt to use the Deepgram Transcription Engine and choose Deepgram’s Nova-3 speech model, they can now also select “Language = multi” –  and then Deepgram’s multi-lingual Nova-3 speech model will detect (and programmatically return) the languages being spoken on the call from among the 10-plus languages that Deepgram’s Nova-3 Multi-language model supports today, and will also transcribe all the speech on the call in those (multiple) languages detected, as text in each of those detected (multiple) languages. 


Customer benefits 

With the streaming speech recognition capabilities of <Start><Transcription>, businesses can capture the full text of what all their customers are saying – whether to a human agent or an automated self-service AI agent or LLM – now, even across a mixed set of (multiple) languages, with this Public Beta language detection feature. 


Multi-Language Detection and transcribing  is perfect for capturing conversations: 

  • where multi-lingual agents may be speaking any one of a set of languages they are fluent in with customers, but with an individual customers’ language not known apriori,

  • where the caller themselves mix languages – such as using a mix of Spanish and English, switching back and forth multiple times during the call depending upon caller language comfort and word complexity –  and then in adding that all that speech data accurately to a caller’s customer record, be that in a CRM or another application/system built by the developer, and

  • where customer data collection via programmable outbound calling is the objective, either for follow-up, post-service, or post-care surveys, etc – no matter what combination of languages may be used in the prompting or in the called parties’ responses.

Twilio Real-Time Transcriptions allow developers to automate the capturing of customer speech data, programmatically, for each and every call (instead just having the data for an ad hoc sampling of calls), create a repository of structured data for those voice conversations with customers, and easily and cost-effectively stream the speech results to downstream applications during calls with customers.

 

Voice API GA

Additional Resources

Blog

Read more about our latest product updates, product tutorials, and community projects.


Docs

See API reference documentation, quickstarts, SDKs, and multi-language code samples.

Events

Find upcoming events and join us virtually or in person to learn more about our products.