TwiML™ Voice: <Transcription>

(new)

Legal notice

Real-Time Transcriptions, including the <Transcriptions> TwiML noun and API, use artificial intelligence or machine learning technologies. By enabling or using any of the features or functionalities within Programmable Voice that are identified as using artificial intelligence or machine learning technology, you acknowledge and agree that your use of these features or functionalities is subject to the terms of the Predictive and Generative AI/ML Features Addendum.

The <Transcription> TwiML noun allows you to transcribe live calls in near real-time. It is used in conjunction with <Start>. When Twilio executes the <Start><Transcription> instruction during a call, it forks the raw audio stream to a speech-to-text transcription engine that can provide streaming responses almost instantly.

This page covers <Transcription>'s supported attributes and provides sample code.

(information)

Important Notes

The <Transcription> TwiML noun is associated with Twilio's Real-Time Transcriptions product. It is not to be confused with Recording Transcriptions.

Consumers of <Transcription> should leverage the statusCallbackUrl webhook for live processing of conversation utterances in your application.

Real-Time Transcription persistence and post-call language intelligence support comes from integration with Conversational Intelligence. To store your transcripts with Twilio or run Language Operators after the call, add the intelligenceService attribute when starting a Real-Time Transcription session. Note: When using either Deepgram or Google as the transcriptionEngine value, Twilio supports persisted transcripts.

Below is a basic example of <Start><Transcription>:

1<Start>
2  <Transcription statusCallbackUrl="https://example.com/your-callback-url"/> 
3</Start>

Noun attributes

The table below lists <Transcription>'s supported attributes, which modify the <Transcription> behavior. All attributes are optional.

Attribute Name	Allowed Values	Default Value
name	Unique name for the Real-Time Transcription	None
statusCallbackUrl	An absolute URL	None
languageCode	A standard code that identifies human languages.	`en-US`
track	`inbound_track`, `outbound_track`, `both_tracks`	`both_tracks`
inboundTrackLabel	An alphanumeric label to associate to the inbound track being transcribed	None
outboundTrackLabel	An alphanumeric label to associate to the outbound track being transcribed	None
transcriptionEngine	Name of the speech-to-text transcription provider. e.g. `google` or `deepgram`	`google`
speechModel	Any speechModel value at the list of Twilio supported Google STTv2 speech models (except Chirp2 models and languages supported only by Chirp2; or `nova-2` or 'nova-3' for Deepgram)	`telephony`
enableProviderData	`true` or `false`	`false`
profanityFilter	`true` or `false`	`true`
partialResults	`true` or `false`	`false`
hints	Comma-separated list of expected phrases or keywords for recognition	None
enableAutomaticPunctuation	`true` or `false`	`true`
intelligenceService	The Intelligence Service SID or unique name for persisting transcripts and running Language Operators	None

name

The user-specified name of this Real-Time Transcription. This name can be used to stop the Real-Time Transcription.

1const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3const response = new VoiceResponse();
4const start = response.start();
5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', name: 'Contact center transcription'});
6
7console.log(response.toString());

Output

1<?xml version="1.0" encoding="UTF-8"?>
2<Response>
3    <Start>
4        <Transcription statusCallbackUrl="https://example.com/your-callback-url" name="Contact center transcription" />
5    </Start>
6</Response>

statusCallbackUrl

The statusCallbackUrl attribute is the absolute URL of an endpoint. Twilio sends Real-Time Transcription status updates and the call's transcript data to this URL.

Twilio sends a POST request to this URL whenever one of the following occurs:

A Real-Time Transcription session starts. This is called the transcription-started event.
Utterances (partial or final) of transcribed audio is available. This is called the transcription-content event.
A Real-Time Transcription session stops. This is called the transcription-stopped event. This event occurs when a Real-Time Transcription session is stopped via API or TwiML, or when the call ends.
An error occurs. This is called the transcription-error event.

1const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3const response = new VoiceResponse();
4const start = response.start();
5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url'});
6
7console.log(response.toString());

Output

1<?xml version="1.0" encoding="UTF-8"?>
2<Response>
3    <Start>
4        <Transcription statusCallbackUrl="https://example.com/your-callback-url"/> 
5    </Start>      
6</Response>

The transcription-started event

When a Real-Time Transcription is started and a session is created, Twilio sends an HTTP POST request to your statusCallbackUrl for the transcription-started event. This event provides initial details about the transcription session.

These HTTP requests contain the properties listed below.

Property	Description	Example
AccountSid	Twilio Account SID	`AC11b76cdc7d217e72a72be6422d46a7ca`
CallSid	Twilio Call SID	`CA57af2620f427810cb4e430371e8d6e0f`
TranscriptionSid	Unique identifier for this Real-Time Transcription session	`GT20dfa03c8cf8aa8d0c4aeccde5558b66`
Timestamp	Time of the event in UTC ISO 8601 timestamp	`2023-10-19T22:33:22.611Z`
SequenceId	Integer sequence number of the event	`1`
TranscriptionEvent	The event type	`transcription-started`
ProviderConfiguration	JSON string of the transcription provider	`{\"profanityFilter\":\"true\",\"speechModel\":\"telephony\",\"enableAutomaticPunctuation\":\"true\",\"hints\":\"Alice Johnson, Bob Martin, ACME Corp, XYZ Enterprises, product demo, sales inquiry, customer feedback\"}`
TranscriptionEngine	The name of the transcription engine	`google`
Name	Friendly name of the Real-Time Transcription session	`session1`
Track	The track being transcribed: `inbound_track`, `outbound_track`, or `both_tracks`	`inbound_track`
InboundTrackLabel	Label associated with the inbound track	`customer`
OutboundTrackLabel	Label associated with the outbound track	`agent`
PartialResults	Whether partial results are enabled (`true` or `false`)	`true`
LanguageCode	The language code for the transcription	`en-US`

Example of a transcription-started event payload:

1{
2  "TranscriptionSid": "GT8fbf72a043b98407a3ce68331cd0030a",
3  "Timestamp": "2024-06-25T18:45:12.135751Z",
4  "AccountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
5  "ProviderConfiguration": "{\"profanityFilter\":\"true\",\"speechModel\":\"telephony\",\"enableAutomaticPunctuation\":\"true\",\"hints\":\"Alice Johnson, Bob Martin, ACME Corp, XYZ Enterprises, product demo, sales inquiry, customer feedback\"}",
6  "Name": "Chris Transcription",
7  "OutboundTrackLabel": "agent",
8  "LanguageCode": "en-US",
9  "PartialResults": "false",
10  "InboundTrackLabel": "customer",
11  "TranscriptionEvent": "transcription-started",
12  "CallSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
13  "TranscriptionEngine": "google",
14  "Track": "both_tracks",
15  "SequenceId": "1"
16}

The transcription-content event

When an individual utterance (partial or final) of audio is transcribed, Twilio sends an HTTP POST request to your statusCallbackUrl for the transcription-content event. This event provides TranscriptionData results for the transcribed audio.

(information)

Stability and Confidence

Stability and Confidence depend on partialResults. For example, if partialResults is true, then the stability property will be included in the event payload, and confidence will not. However, if partialResults is false, the opposite will be true. Always refer to Google's specific documentation (examples) for more details on each of these properties.

These HTTP requests contain the properties listed below.

Property	Description	Example
AccountSid	Twilio Account SID.	`AC11b76cdc7d217e72a72be6422d46a7ca`
CallSid	Twilio Call SID.	`CA57af2620f427810cb4e430371e8d6e0f`
TranscriptionSid	Unique identifier for this Real-Time Transcription session.	`GT20dfa03c8cf8aa8d0c4aeccde5558b66`
Timestamp	Time of the event in UTC ISO 8601 timestamp.	`2023-10-19T22:33:22.611Z`
SequenceId	Integer sequence number of the event. Note: Sequence numbers increase sequentially within each `<Start><Transcription>` session. Ordering is guaranteed per track in `transcription-content`, but not across tracks, because each track is processed independently by the transcription provider.	`2`
TranscriptionEvent	The event type.	`transcription-content`
LanguageCode	A BCP-47 standard language code (e.g. "en-US").	`en-US`
Track	The track being transcribed: `inbound_track` or `outbound_track`.	`inbound_track`
TranscriptionData	JSON string containing transcription content. Note that `TranscriptionData.confidence` is a decimal number.	`{\"transcript\":\"to be or not to be\",\"confidence\":0.96823084}`
TranscriptionProviderData	JSON string containing additional transcription provider content, such as individual word timings and confidence (a decimal number).	`{"ProviderData":{"type_field":"Results","start":0.0,"duration" 3.25,"is_final":true...`
Stability	String representing estimate of the likelihood the transcription provider will not change the guess it made about this partial result transcript. This property is only provided when `partialResults` is `true`.	Range between 0.0 (unstable) and 1.0 (stable). Example: 0.8
Final	Boolean value indicating whether this event contains the final utterance (or partial utterance).	`false`

Example of a transcription-content event payload when partialResults is equal to false:

1{
2  "LanguageCode": "en-US",
3  "TranscriptionSid": "GT8fbf72a043b98407a3ce68331cd0030a",
4  "TranscriptionEvent": "transcription-content",
5  "CallSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
6  "TranscriptionData": "{\"transcript\":\"Hello, this is Sam from Horizon Financial Services. Just letting you know this call may be recorded for quality purposes. How can I assist you today?\",\"confidence\":0.9956335}",
7  "Timestamp": "2024-06-25T18:45:21.454203Z",
8  "Final": "true",
9  "AccountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
10  "Track": "outbound_track",
11  "SequenceId": "2"
12}

Example of a transcription-content event payload when partialResults is equal to true:

1{
2  "LanguageCode": "en-US",
3  "TranscriptionSid": "GT6ebb54a123f0c86b70605a4925836f69",
4  "Stability": "0.9",
5  "TranscriptionEvent": "transcription-content",
6  "CallSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
7  "TranscriptionData": "{\"transcript\":\"Hello, this is Sam from Horizon Financial Services. Just letting you know this call may be recorded for\"}",
8  "Timestamp": "2024-06-25T16:30:21.600697Z",
9  "Final": "false",
10  "AccountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
11  "Track": "outbound_track",
12  "SequenceId": "70"
13}

Example of a transcription-content event payload when enableProviderData is set to true:

1{
2  "LanguageCode": "en-US",
3  "TranscriptionSid": "GT3239729303cf3f98ab7f05b4ff1e49d0",
4  "TranscriptionEvent": "transcription-content",
5  "CallSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
6  "TranscriptionData": "{\"transcript\":\"Hello?","confidence":0.9746094}",
7  "Timestamp": "2026-01-23T21:47:44.614701800Z",
8  "Final": "true",
9  "AccountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
10  "Track": "inbound_track",
11  "TranscriptionProviderData": "{"ProviderData":{"type_field":"Results","start":3.25,"duration":2.4099998,"is_final":true,"from_finalize":false,"channel":{"alternatives":[{"transcript":"Hello?","words":[{"word":"hello","start":4.77,"end":5.17,"confidence":0.9746094,"languages":[]}]},"metadata":{"request_id":"1dcde0b3-1bef-451f-b409-333133f4e27d","model_info":["name":"general-nova-3","version":"25-04-17.21547","arch":"nova-3"},"model_uuid":"40bd3654-e622-47c4-a111_63a61b23bfe8"},"channel_index":[0,1]},"ProviderConnectTime":"2026-0123T31:47:38.941663342Z","Provider":"deepgram"},
12  "SequenceId": "3"
13}

The transcription-stopped event

When a Real-Time Transcription session is stopped or ends, Twilio sends an HTTP POST request to your statusCallbackUrl for the transcription-stopped event. This event provides final details about the transcription session.

These HTTP requests contain the properties listed below.

Property	Description	Example
AccountSid	Twilio Account SID	AC11b76cdc7d217e72a72be6422d46a7ca
CallSid	Twilio Call SID	CA57af2620f427810cb4e430371e8d6e0f
TranscriptionSid	Unique identifier for this Real-Time Transcription session	GT20dfa03c8cf8aa8d0c4aeccde5558b66
Timestamp	Time of the event, in UTC ISO 8601 format	2023-10-19T22:33:22.611Z
SequenceId	Integer sequence number of the event	3
TranscriptionEvent	The event type	transcription-stopped

An example of the transcription-stopped event payload:

1{
2  "TranscriptionSid": "GT8fbf72a043b98407a3ce68331cd0030a",
3  "TranscriptionEvent": "transcription-stopped",
4  "CallSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
5  "Timestamp": "2024-06-25T18:45:23.839266Z",
6  "AccountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
7  "SequenceId": "3"
8}

The transcription-error event

When an error occurs during a Real-Time Transcription session, Twilio sends an HTTP POST request to your statusCallbackUrl for the transcription-error event.

(information)

Error Documentation

Documentation on Real-Time Transcription errors can be found on the Error and Warning Dictionary and range from 32650-32655. Errors are also viewable in the Twilio Console.

These HTTP requests contain the properties listed below.

Property	Description	Example
AccountSid	Twilio Account SID	`ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX`
CallSid	Twilio Call SID	`CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX`
TranscriptionSid	Unique identifier for this Real-Time Transcription session	`GT20dfa03c8cf8aa8d0c4aeccde5558b66`
Timestamp	Time of the event in UTC ISO 8601 timestamp	`2023-10-19T22:33:22.611Z`
SequenceId	Integer sequence number of the event	`3`
TranscriptionEvent	The event type	`transcription-error`
TranscriptionErrorCode	Error code	`32655`
TranscriptionError	Error description	`Provider Unavailable`

Example of a transcription-error event payload:

1{
2  "TranscriptionSid": "GT20dfa03c8cf8aa8d0c4aeccde5558b66",
3  "Timestamp": "2023-10-19T22:33:22.611Z",
4  "AccountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
5  "SequenceId": "3",
6  "TranscriptionEvent": "transcription-error",
7  "CallSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
8  "TranscriptionErrorCode": "32655",
9  "TranscriptionError": "Provider Unavailable"
10}

languageCode

The languageCode attribute specifies the language in which the transcription should be performed. It accepts a BCP-47 standard language code, such as en-US for American English. This attribute is useful for ensuring that the transcription engine correctly understands and processes the spoken language.

Languages supported by each transcription provider (transcriptionEngine) for transcriptions vary, and, furthermore, vary by the individual speechModel used; see speechModel section for links to lists of languages supported by transcriptions providers with their various speech models.

The following TwiML example demonstrates how to specify the languageCode attribute for a transcription for Mexican Spanish. This ensures that the transcription is performed in the specified language and variant or dialect, which is particularly useful for calls in languages other than English.

1const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3const response = new VoiceResponse();
4const start = response.start();
5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', languageCode: 'es-MX'});
6
7console.log(response.toString());

Output

1<?xml version="1.0" encoding="UTF-8"?>
2<Response>
3    <Start>
4        <Transcription statusCallbackUrl="https://example.com/your-callback-url" languageCode="es-MX" /> 
5    </Start>      
6</Response>

track

The track attribute specifies which audio track should be transcribed. It can take one of the following values: inbound_track, outbound_track, or both_tracks. This attribute is useful for determining whether to transcribe the audio coming from the caller, the callee, or both.

The following TwiML example demonstrates how to specify the track attribute for a transcription.

1const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3const response = new VoiceResponse();
4const start = response.start();
5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', track: 'inbound_track'});
6
7console.log(response.toString());

Output

1<?xml version="1.0" encoding="UTF-8"?>
2<Response>
3    <Start>
4        <Transcription statusCallbackUrl="https://example.com/your-callback-url" track="inbound_track" /> 
5    </Start>      
6</Response>

inboundTrackLabel

The inboundTrackLabel attribute allows you to associate an alphanumeric label with the inbound track being transcribed. This can be useful for identifying and differentiating the inbound audio stream in the transcription results. Using labels helps to clearly identify who is speaking, especially in multi-party conversations or call center scenarios.

Refer to the Track labels section below to understand the importance of using labels.

1const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3const response = new VoiceResponse();
4const start = response.start();
5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', inboundTrackLabel: 'agent', outboundTrackLabel: 'customer'});
6
7console.log(response.toString());

Output

1<?xml version="1.0" encoding="UTF-8"?>
2<Response>
3    <Start>
4        <Transcription statusCallbackUrl="https://example.com/your-callback-url" inboundTrackLabel="agent" outboundTrackLabel="customer" /> 
5    </Start>      
6</Response>

Example 1: Inbound Call

In an inbound call scenario, the call is initiated by the customer and received by the agent or business person. Here, the inbound audio track (agent's speech) is labeled for clarity in the transcription results.

1<Response>
2  <Start>
3    <Transcription track="inbound_track" inboundTrackLabel="agent" />
4  </Start>
5</Response>

In this example, the inbound audio track is labeled as "agent". This is useful for scenarios like customer support calls, where distinguishing the agent's responses from the customer's speech is crucial for understanding the interaction.

Example 2: Outbound Call

In an outbound call scenario, the call is initiated by the agent or business person and received by the customer. Here, the inbound audio track (customer's speech) is labeled for clarity in the transcription results.

1<Response>
2  <Start>
3    <Transcription track="inbound_track" inboundTrackLabel="customer" />
4  </Start>
5</Response>

In this example, the inbound audio track is labeled as "customer". This is useful for scenarios like sales calls, where distinguishing the customer's speech in the transcription can help in analyzing customer feedback and engagement.

outboundTrackLabel

The outboundTrackLabel attribute allows you to associate an alphanumeric label with the outbound track being transcribed. This can be useful for identifying and differentiating the outbound audio stream in the transcription results. Using labels helps to clearly identify who is speaking, especially in multi-party conversations or call center scenarios.

Refer to the Track labels section below to understand the importance of using labels.

1const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3const response = new VoiceResponse();
4const start = response.start();
5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', inboundTrackLabel: 'agent', outboundTrackLabel: 'customer'});
6
7console.log(response.toString());

Output

1<?xml version="1.0" encoding="UTF-8"?>
2<Response>
3    <Start>
4        <Transcription statusCallbackUrl="https://example.com/your-callback-url" inboundTrackLabel="agent" outboundTrackLabel="customer" /> 
5    </Start>      
6</Response>

Example 1: Inbound Call

In an inbound call scenario, the call is initiated by the customer and received by the agent or business person. Here, the outbound audio track (customer's speech) is labeled for clarity in the transcription results.

1<Response>
2  <Start>
3    <Transcription track="outbound_track" outboundTrackLabel="customer" />
4  </Start>
5</Response>

In this example, the outbound audio track is labeled as "customer". This is useful for scenarios like customer support calls, where distinguishing the customer's speech from the agent's responses is crucial for understanding the interaction.

Example 2: Outbound Call

In an outbound call scenario, the call is initiated by the agent or business person and received by the customer. Here, the outbound audio track (agent's speech) is labeled for clarity in the transcription results.

1<Response>
2  <Start>
3    <Transcription track="outbound_track" outboundTrackLabel="agent" />
4  </Start>
5</Response>

In this example, the outbound audio track is labeled as "agent". This is useful for scenarios like sales calls, where distinguishing the agent's speech in the transcription can help in analyzing the effectiveness of the sales pitch.

transcriptionEngine

To leverage specific features or optimizations that different transcription engines offer, set the transcriptionEngine attribute. For details about each provider's speech models, see speechModel in the following section. Both transcriptionEngine: google 'transcriptionEngine: deepgram' support persisted transcript resources.

1const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3const response = new VoiceResponse();
4const start = response.start();
5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', transcriptionEngine: 'google'});
6
7console.log(response.toString());

Output

1<?xml version="1.0" encoding="UTF-8"?>
2<Response>
3    <Start>
4        <Transcription statusCallbackUrl="https://example.com/your-callback-url" transcriptionEngine="google" /> 
5    </Start>      
6</Response>

speechModel

The speechModel attribute allows you to specify which speech model to use for the transcription.

Different speech models can optimize for different use cases, such as phone calls, video, or enhanced models for higher accuracy.

If Google is used as the transcriptionEngine, this maps to Transcription Model in Google terminology. Refer to the Google documentation to understand each speech model's specific capabilities and configurations.

The telephony speech model is optimized for phone call audio and can provide better accuracy for this type of audio.

The long speech model is optimized for long-form audio, such as lectures or extended conversations, and can provide better accuracy for lengthy audio.

When you set transcriptionEngine to google, Twilio only supports speech models and languages available on Google's global STT API endpoints. For the list of supported languages, see the Google STT v2 API Language List. This list excludes Chirp2 models or languages that only those models support.

If you use Deepgram as the transcriptionEngine, Real-Time Transcriptions rely on the nova-2 speech models by default. nova-3 monolingual speech models can alternatively also be specified, optionally, when using one of Deepgram's supported languages. Nova-3's language-detecting multilingual model (language=multi) is also supported as a speech model and language option in Real-Time Transcriptions.

1const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3const response = new VoiceResponse();
4const start = response.start();
5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', speechModel: 'telephony', transcriptionEngine: 'google'});
6
7console.log(response.toString());

Output

1<?xml version="1.0" encoding="UTF-8"?>
2<Response>
3    <Start>
4        <Transcription statusCallbackUrl="https://example.com/your-callback-url" speechModel="telephony" transcriptionEngine="google" /> 
5    </Start>      
6</Response>

enableProviderData

The enableProviderData attribute controls whether Twilio includes additional data from the transcription provider in transcription-content events. When set to true, the response includes information such as individual word timing (as an offset in seconds from ProviderConnectTime) and per-word confidence scores.

For an example of this additional data, see the transcription-content event example with enableProviderData set to true.

The format of the provider data varies by provider. If you change your transcriptionEngine provider, your application must handle the different data formats. The same applies if Twilio fails over to a different provider during an outage.

You can use provider data such as word timing and ProviderConnectTime timestamps to get more granular ordering of transcribed utterances. Twilio passes this data through as-is from the provider. Because each track is transcribed independently and timing can vary, Twilio doesn't guarantee precise timing and sequencing across the inbound and outbound tracks.

profanityFilter

The profanityFilter attribute allows you to enable or disable the filtering of profane words in the transcription. When enabled, the transcription engine attempts to mask or omit any detected profanities in the transcription results.

(warning)

Warning

By default, the Google Transcription Engine enables the profanityFilter for all calls. The medical_conversation speechModel doesn't support profanityFilter. When using the medical_conversation speechModel, set the profanityFilter attribute to false. Deepgram's profanity filter only works for some languages.

The example below demonstrates how to enable the profanity filter for the transcription. This is useful for ensuring that any profane language is masked or omitted in the transcription output.

1const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3const response = new VoiceResponse();
4const start = response.start();
5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', profanityFilter: false, transcriptionEngine: 'google'});
6
7console.log(response.toString());

Output

1<?xml version="1.0" encoding="UTF-8"?>
2<Response>
3    <Start>
4        <Transcription statusCallbackUrl="https://example.com/your-callback-url" profanityFilter="false" transcriptionEngine="google" /> 
5    </Start>      
6</Response>

partialResults

Maps to StreamingRecognitionResult specifically when ("is_final"=false) in Google Terminology. The partialResults attribute allows you to enable or disable the delivery of interim transcription results. When enabled, the transcription engine will send partial (interim) results as the transcription progresses, providing more immediate feedback before the final result is available.

The example below demonstrates how to enable partial results for the transcription.

1const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3const response = new VoiceResponse();
4const start = response.start();
5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', partialResults: true, transcriptionEngine: 'google'});
6
7console.log(response.toString());

Output

1<?xml version="1.0" encoding="UTF-8"?>
2<Response>
3    <Start>
4        <Transcription statusCallbackUrl="https://example.com/your-callback-url" partialResults="true" transcriptionEngine="google" /> 
5    </Start>      
6</Response>

hints

The hints attribute contains a list of words or phrases that the transcription provider can expect to encounter during a Real-Time Transcription. Using the hints attribute can improve the transcription provider's recognition of words or phrases you expect from your callers.

You may provide up to 500 words or phrases in the list of hints, separating each entry with a comma. Your hints may be up to 100 characters each and separate each word in a phrase with a space. For example:

1const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3const response = new VoiceResponse();
4const start = response.start();
5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', hints: 'Alice Johnson, Bob Martin, ACME Corp, XYZ Enterprises, product demo, sales inquiry, customer feedback'});
6
7console.log(response.toString());

Output

1<?xml version="1.0" encoding="UTF-8"?>
2<Response>
3    <Start>
4        <Transcription statusCallbackUrl="https://example.com/your-callback-url" hints="Alice Johnson, Bob Martin, ACME Corp, XYZ Enterprises, product demo, sales inquiry, customer feedback" /> 
5    </Start>      
6</Response>

The hints attribute also supports Google's class token list to improve recognition. You can pass a class token directly in the hints attribute, as shown in the example below.

1const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3const response = new VoiceResponse();
4const start = response.start();
5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', hints: '$OOV_CLASS_ALPHANUMERIC_SEQUENCE'});
6
7console.log(response.toString());

Output

1<?xml version="1.0" encoding="UTF-8"?>
2<Response>
3    <Start>
4        <Transcription statusCallbackUrl="https://example.com/your-callback-url" hints="$OOV_CLASS_ALPHANUMERIC_SEQUENCE" /> 
5    </Start>      
6</Response>

enableAutomaticPunctuation

Maps to Automatic Punctuation in Google Terminology. The enableAutomaticPunctuation attribute allows you to enable or disable automatic punctuation in the transcription. When enabled, the transcription engine will automatically insert punctuation marks such as periods, commas, and question marks, improving the readability of the transcribed text.

1const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3const response = new VoiceResponse();
4const start = response.start();
5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', enableAutomaticPunctuation: true, transcriptionEngine: 'google'});
6
7console.log(response.toString());

Output

1<?xml version="1.0" encoding="UTF-8"?>
2<Response>
3    <Start>
4        <Transcription statusCallbackUrl="https://example.com/your-callback-url" enableAutomaticPunctuation="true" transcriptionEngine="google" /> 
5    </Start>      
6</Response>

intelligenceService

The intelligenceService attribute allows you to opt-in to sending your Real-Time Transcript to Twilio Conversational Intelligence for integrated post-processing. By enabling storage and analysis of calls transcribed in real-time, this feature helps you extract actionable insights from transcripts. This runs in parallel to statusCallbackUrl which streams utterance-level data and other session lifecycle events to your app during the call.

When enabled, this feature performs the following functions:

Persists Live Transcripts: Stores real-time transcriptions in Conversational Intelligence's historical log for future reference and analysis.
Runs Post-Call Language Operators: Triggers Language Operators configured in the referenced Intelligence Service. After the call ends, the Intelligence Service generates AI-powered insights and performs actions.

To use this feature, you need to meet the following conditions.

Have or create an Intelligence Service.
Set the intelligenceService parameter to the Intelligence Service SID or unique name.

Important Notes:

To transcribe a call without recording it, pass an intelligenceService parameter without passing a statusCallbackUrl parameter.
Language Operators are executed after the Real-Time Transcription session concludes, either automatically through the call ending or manually by stopping the live transcription.

1const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3const response = new VoiceResponse();
4const start = response.start();
5start.transcription({statusCallbackUrl: 'https://example.com/your-callback-url', intelligenceService: 'GAaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'});
6
7console.log(response.toString());

Output

1<?xml version="1.0" encoding="UTF-8"?>
2<Response>
3    <Start>
4        <Transcription statusCallbackUrl="https://example.com/your-callback-url" intelligenceService="GAaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" /> 
5    </Start>      
6</Response>

Supported language and model combinations

Twilio's transcription service supports a variety of languages and models. The following examples are specific to Google Speech-to-Text. Depending on the language, certain attributes like speechModel, profanityFilter, and enableAutomaticPunctuation may have different levels of support.

To verify support for languages and speech models, see the following resources:

(warning)

Warning

These examples are subject to changes. To verify support for languages and speech models, customers should always refer back to the Google Speech-to-Text Supported Languages, the Deepgram nova-2 Supported Languages. or the nova-3 Supported (monolingual) Language variants documents, as appropriate.

Example 1: Chinese (Simplified, China) with Chirp Model

This example demonstrates how to configure transcription for Chinese (Simplified, China) using the Chirp Model with support for automatic punctuation.

1<Response>
2  <Start>
3    <Transcription 
4      transcriptionEngine="google" 
5      languageCode="cmn-Hans-CN" 
6      speechModel="chirp" 
7      enableAutomaticPunctuation="true" />
8  </Start>
9</Response>

In this example, the profanityFilter attribute, hints attribute, and other advanced features are not supported for this configuration.

Example 2: Spanish (Spain) with Telephony Model

This example demonstrates how to configure transcription for Spanish (Spain) using the telephony model with full support for all attributes.

1<Response>
2  <Start>
3    <Transcription 
4      transcriptionEngine="google" 
5      languageCode="es-ES" 
6      speechModel="telephony" 
7      profanityFilter="true" 
8      enableAutomaticPunctuation="true" />
9  </Start>
10</Response>

In this example, the telephony model supports automatic punctuation and profanity filter, but not model adaptation (e.g., hints).

Example 3: Hindi (India) with Short Model

This example demonstrates how to configure transcription for Hindi (India) using the short model with support for specific attributes.

1<Response>
2  <Start>
3    <Transcription 
4      transcriptionEngine="google" 
5      languageCode="hi-IN" 
6      speechModel="short" 
7      enableAutomaticPunctuation="true" 
8      profanityFilter="true" 
9      hints="संपर्क, सेवा, समर्थन, ग्राहक" 
10      modelAdaptation="true" />
11  </Start>
12</Response>

In this example, the short model supports automatic punctuation, profanity filter, model adaptation, and hints.

Example 4: French (Canada) with Long Model

This example demonstrates how to configure transcription for French (Canada) using the long model with support for specific attributes.

1<Response>
2  <Start>
3    <Transcription 
4      transcriptionEngine="google" 
5      languageCode="fr-CA" 
6      speechModel="long" 
7      hints="service à la clientèle, rendez-vous, commande" />
8  </Start>
9</Response>

In this example, the long model supports model adaptation through hints, but does not support automatic punctuation, profanity filter, or spoken punctuation.

Track labels

If specifying inboundTrackLabel or outboundTrackLabel, the call direction mapping table below can be used as a guide.

Track	Call Direction	Call Resource Mapping	TrackLabel
Inbound-track	Outbound	TO #	Label for "who is being called" in an outbound call from Twilio (e.g., `inboundTrackLabel`="customer").
Outbound-track	Outbound	FROM #	Label for "who is calling" in an outbound call from Twilio (e.g., `outboundTrackLabel`="agent").
Inbound-track	Inbound	FROM #	Label for "who is being called" in an inbound call to Twilio (e.g., `inboundTrackLabel`="agent").
Outbound-track	Inbound	TO #	Label for "who is calling" in an inbound call to Twilio (e.g., `outboundTrackLabel`="customer").

Note: A call that has an "outbound" direction is a call that is outbound from Twilio, i.e., from Twilio to a customer.

Stop a Real-Time Transcription

If you provided a name attribute when starting a Real-Time Transcription session, you can stop a Real-Time Transcription using TwiML or via API.

Given a Real-Time Transcription that was started with the following TwiML instructions:

1<Response>
2  <Start>
3    <Transcription name="Contact center transcription" />
4  </Start>
5</Response>

You can stop the Real-Time Transcription with the following TwiML instructions:

1const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3const response = new VoiceResponse();
4const stop = response.stop();
5stop.transcription({name: 'Contact center transcription'});
6
7console.log(response.toString());

Output

1<?xml version="1.0" encoding="UTF-8"?>
2<Response>
3    <Stop>
4        <Transcription name="Contact center transcription" />
5    </Stop>
6</Response>

If a name was not provided, you can stop an in-progress real-time Transcription via API using the SID of the Transcription. Learn more about the Transcriptions subresource.

HIPAA eligibility and PCI compliance

HIPAA eligibility and PCI compliance varies depending on your selected speech model and whether you use webhooks or persisted transcripts. To determine whether your implementation may be HIPAA eligible or PCI compliant, see the following table.

Transcription engine	Speech model	Transcript destination	HIPAA eligibility	PCI-compliant
Google	Any supported model	Webhooks	Yes	Yes
Google	Any supported model	Persisted Transcript	Yes	No
Deepgram	`nova-2` or `nova-3` monolingual variants	Webhooks	Yes	Yes
Deepgram	`nova-2` or `nova-3` monolingual variants	Persisted Transcript	Yes	No
Deepgram	`nova-3` multilingual	Webhooks or Persisted Transcript	No	No

AI nutrition facts

(information)

AI Nutrition Facts

Real-Time Transcription, including <Transcriptions> TwiML noun and API, uses third-party artificial technology and machine learning technologies.

Twilio's AI Nutrition Facts provide an overview of the AI feature you're using, so you can better understand how the AI is working with your data. Real-Time Transcriptions AI qualities are outlined in the following Speech to Text Transcriptions - Programmable Voice Nutrition Facts label. For more information and the glossary regarding the AI Nutrition Facts Label, please refer to Twilio's AI Nutrition Facts page.

AI Nutrition Facts

Speech to Text Transcriptions - Programmable Voice, Twilio Video, and Conversational Intelligence

Description: Generate speech to text voice transcriptions (real-time and post-call) in Programmable Voice, Twilio Video, and Conversational Intelligence.
Privacy Ladder Level: N/A
Feature is Optional: Yes
Model Type: Generative and Predictive - Automatic Speech Recognition
Base Model: Deepgram Speech-to-Text, Google Speech-to-Text, Amazon Transcribe
Base Model Trained with Customer Data: No
Customer Data is Shared with Model Vendor: No
Training Data Anonymized: N/A
Data Deletion: Yes
Human in the Loop: Yes
Data Retention: Until the customer deletes
Logging & Auditing: Yes
Guardrails: Yes
Input/Output Consistency: Yes
Other Resources: https://www.twilio.com/docs/conversational-intelligence

Learn more about this label at nutrition-facts.ai

TwiML™ Voice: <Transcription>

Legal notice

Important Notes

Noun attributes

name

Output

statusCallbackUrl

Output

The transcription-started event

The transcription-content event

Stability and Confidence

The transcription-stopped event

The transcription-error event

Error Documentation

languageCode

Output

track

Output

inboundTrackLabel

Output

Example 1: Inbound Call

Example 2: Outbound Call

outboundTrackLabel

Output

Example 1: Inbound Call

Example 2: Outbound Call

transcriptionEngine

Output

speechModel

Output

enableProviderData

profanityFilter

Warning

Output

partialResults

Output

hints

Output

Output

enableAutomaticPunctuation

Output

intelligenceService

Output

Supported language and model combinations

Warning

Example 1: Chinese (Simplified, China) with Chirp Model

Example 2: Spanish (Spain) with Telephony Model

Example 3: Hindi (India) with Short Model

Example 4: French (Canada) with Long Model

Track labels

Stop a Real-Time Transcription

Output

HIPAA eligibility and PCI compliance

AI nutrition facts

AI Nutrition Facts

AI Nutrition Facts

Speech to Text Transcriptions - Programmable Voice, Twilio Video, and Conversational Intelligence

Trust Ingredients