Skip to contentSkip to navigationSkip to topbar
On this page

<ConversationRelay> TwiML noun


(information)

Legal notice

ConversationRelay, including the <ConversationRelay> TwiML noun and API, uses artificial intelligence or machine learning technologies. By enabling or using any features or functionalities within Programmable Voice that Twilio identifies as using artificial intelligence or machine learning technology, you acknowledge and agree to certain terms. Your use of these features or functionalities is subject to the terms of the Predictive and Generative AI or ML Features Addendum(link takes you to an external page).

ConversationRelay isn't compliant with the Payment Card Industry (PCI)(link takes you to an external page) and doesn't support Voice workflows that are subject to PCI.

(information)

Info

Before using ConversationRelay, you need to complete the onboarding steps and agree to the Predictive and Generative AI/ML Features Addendum. See the ConversationRelay Onboarding Guide for more details.

The <ConversationRelay> TwiML noun under the <Connect> verb routes a call to Twilio's ConversationRelay service, providing advanced AI-powered voice interactions. ConversationRelay handles the complexities of live, synchronous voice calls, such as Speech-to-Text (STT) and Text-to-Speech (TTS) conversions, session management, and low-latency communication with your application. This approach allows your system to focus on processing conversational AI logic and sending back responses effectively.

In a typical setup, <ConversationRelay> connects to your AI application through a WebSocket, allowing real-time and event-based interaction. Your application receives transcribed caller speech in structured messages and sends responses as text, which ConversationRelay converts to speech and plays back to the caller. This setup is commonly used for customer service, virtual assistants, and other scenarios that require real-time, AI-based voice interactions.


<ConversationRelay> attributes

conversationrelay-attributes page anchor

The <ConversationRelay> noun supports the following attributes:

Attribute nameDescriptionDefault valueRequired
urlThe URL to your WebSocket server (must use wss://).Required
welcomeGreetingThe message automatically played to the caller after we answer the call and establish the WebSocket connection.Optional
welcomeGreetingInterruptibleSpecifies if the caller can interrupt the welcomeGreeting with speech. Values can be "none", "dtmf", "speech", or "any". For backward compatibility, Boolean values are also accepted: true = "any" and false = "none"."any"Optional
languageThe language code (for example, "en-US") that applies to both Speech-to-Text (STT) and Text-to-Speech (TTS). Setting this attribute is equivalent to setting both ttsLanguage and transcriptionLanguage."en-US"Optional
ttsLanguageThe default language code to use for TTS when the text token message doesn't specify a language. If you set both attributes, this one overrides the language attribute. You can modify this via the ttsLanguage field in the language message you send through the Service Provider Interface (SPI).Optional
ttsProviderThe provider for TTS. Available choices are "Google", "Amazon", and "ElevenLabs"."ElevenLabs"Optional
voiceThe voice used for TTS. Options vary based on the ttsProvider. For details, refer to the Twilio TTS voices. Additional voices are available for ConversationRelay."UgBBYS2sOqTuMpoF3BR0" (ElevenLabs), "en-US-Journey-O" (Google), "Joanna-Neural" (Amazon)Optional
transcriptionLanguageThe language code to use for STT when the session starts. If you set both attributes, this one overrides the language attribute for the transcription language. You can modify this via the transcriptionLanguage field in the language message you send through the SPI.Optional
transcriptionProviderThe provider for STT (Speech Recognition). Available choices are "Google" and "Deepgram"."Google"Optional
speechModelThe speech model used for STT. Choices vary based on the transcriptionProvider. Refer to the provider's documentation for an accurate list."telephony" (Google), "nova-2-general" (Deepgram)Optional
interruptibleSpecifies if caller speech can interrupt TTS playback. Values can be "none", "dtmf", "speech", or "any". For backward compatibility, Boolean values are also accepted: true = "any" and false = "none"."any"Optional
dtmfDetectionSpecifies whether the system sends Dual-tone multi-frequency (DTMF) keypresses over the WebSocket. Set to true to turn on DTMF events.Optional
reportInputDuringAgentSpeechSpecifies whether your application receives prompts and DTMF events while the agent is speaking. Values can be "none", "dtmf", "speech", or "any". Note: The default value for this attribute has changed. The default was "any" before May, 2025 and it's now "none"."none"Optional
preemptibleSpecifies if the TTS of the current talk cycle can allow text tokens from the subsequent talk cycle to interrupt.falseOptional
hintsA comma-separated list of words or phrases that helps Speech-to-Text recognition for uncommon words, product names, or domain-specific terminology. Works similarly to the hints attribute in <Gather>.Optional
debugA space-separated list of options that you can use to subscribe to debugging messages. Options are debugging, speaker-events, and tokens-played. The debugging option provides general debugging information. speaker-events will notify your application about agentSpeaking and clientSpeaking events. tokens-played will provide messages about what's just been played over TTS.Optional
elevenlabsTextNormalizationSpecifies whether or not to apply text normalization while using the ElevenLabs TTS provider. Options are "on", "auto", or "off". "auto" has the same effect as "off" for ConversationRelay voice calls."off"Optional
intelligenceServiceA Conversational Intelligence Service SID or unique name for persisting conversation transcripts and running Language Operators for virtual agent observability. Please see this guide for more details.Optional

Include nested elements within <ConversationRelay> for more granular configuration. For more information on configuring ConversationRelay, refer to the ConversationRelay Onboarding Guide.

<Language> element

language-element page anchor

The <Language> element maps a language code to specific TTS and STT settings. Use this element to configure multiple languages for your session.

Example

Connect a Programmable Voice call to Twilio's ConversationRelay service.Link to code sample: Connect a Programmable Voice call to Twilio's ConversationRelay service.
1
const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3
const response = new VoiceResponse();
4
const connect = response.connect();
5
const conversationrelay = connect.conversationRelay({
6
url: 'wss://mywebsocketserver.com/websocket'
7
});
8
conversationrelay.language({
9
code: 'sv-SE',
10
ttsProvider: 'amazon',
11
voice: 'Elin-Neural',
12
transcriptionProvider: 'google',
13
speechModel: 'long'
14
});
15
conversationrelay.language({
16
code: 'en-US',
17
ttsProvider: 'google',
18
voice: 'en-US-Journey-O'
19
});
20
21
console.log(response.toString());

Output

1
<?xml version="1.0" encoding="UTF-8"?>
2
<Response>
3
<Connect>
4
<ConversationRelay url="wss://mywebsocketserver.com/websocket">
5
<Language code="sv-SE" ttsProvider="amazon" voice="Elin-Neural" transcriptionProvider="google" speechModel="long"/>
6
<Language code="en-US" ttsProvider="google" voice="en-US-Journey-O" />
7
</ConversationRelay>
8
</Connect>
9
</Response>

Attributes

Attribute nameDescription of attributesDefault valueRequired
codeThe language code (for example, "en-US") that applies to both STT and TTS.Required
ttsProviderThe provider for TTS. Choices are "Google", "Amazon", and "ElevenLabs".Inherited from <ConversationRelay>Optional
voiceThe voice used for TTS. Choices vary based on the ttsProvider.Inherited from <ConversationRelay>Optional
transcriptionProviderThe provider for STT. Choices are "Google" and "Deepgram".Inherited from <ConversationRelay>Optional
speechModelThe speech model used for STT. Choices vary based on the transcriptionProvider.Inherited from <ConversationRelay>Optional
languageThe language code for the session (for example, "en-US")."en-US"Optional
customParameterCustom parameters to be sent in the setup message.Optional

Notes

  • If you specify the same language code in both <ConversationRelay> and <Language>, the settings in <Language> take precedence.
  • ConversationRelay provides default settings for commonly used languages.

The <Parameter> element allows you to send custom parameters from the TwiML directly into the initial "setup" message sent over the WebSocket. These parameters appear under the customParameters field in the JSON message.

Example

Connect a Programmable Voice call to Twilio's ConversationRelay service.Link to code sample: Connect a Programmable Voice call to Twilio's ConversationRelay service.
1
const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3
const response = new VoiceResponse();
4
const connect = response.connect();
5
const conversationrelay = connect.conversationRelay({
6
url: 'wss://mywebsocketserver.com/websocket'
7
});
8
conversationrelay.parameter({
9
name: 'foo',
10
value: 'bar'
11
});
12
conversationrelay.parameter({
13
name: 'hint',
14
value: 'Annoyed customer'
15
});
16
17
console.log(response.toString());

Output

1
<?xml version="1.0" encoding="UTF-8"?>
2
<Response>
3
<Connect>
4
<ConversationRelay url="wss://mywebsocketserver.com/websocket">
5
<Parameter name="foo" value="bar"/>
6
<Parameter name="hint" value="Annoyed customer"/>
7
</ConversationRelay>
8
</Connect>
9
</Response>

Resulting Setup Message

1
{
2
"type": "setup",
3
"sessionId": "VX00000000000000000000000000000000",
4
"callSid": "CA00000000000000000000000000000000",
5
"...": "...",
6
"customParameters": {
7
"foo": "bar",
8
"hint": "Annoyed customer"
9
}
10
}

Generating TwiML for <ConversationRelay>

generating-twiml-for-conversationrelay page anchor
Connect a Programmable Voice call to Twilio's ConversationRelay service.Link to code sample: Connect a Programmable Voice call to Twilio's ConversationRelay service.
1
const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3
const response = new VoiceResponse();
4
const connect = response.connect({
5
action: 'https://myhttpserver.com/connect_action'
6
});
7
connect.conversationRelay({
8
url: 'wss://mywebsocketserver.com/websocket',
9
welcomeGreeting: 'Hi! Ask me anything!'
10
});
11
12
console.log(response.toString());

Output

1
<?xml version="1.0" encoding="UTF-8"?>
2
<Response>
3
<Connect action="https://myhttpserver.com/connect_action">
4
<ConversationRelay url="wss://mywebsocketserver.com/websocket" welcomeGreeting="Hi! Ask me anything!" />
5
</Connect>
6
</Response>
  • action (optional): The URL that Twilio will request when the <Connect> verb ends.
  • url (required): The URL of your WebSocket server (must use the wss:// protocol).
  • welcomeGreeting (optional): The message automatically played to the caller after we answer the call and establish the WebSocket connection.

When the TwiML execution is complete, Twilio will make a callback to the action URL with call information and the return parameters from ConversationRelay.


Language settings and their default values

language-settings-and-their-default-values page anchor

Language settings refer to configurations for both Text-to-Speech and Speech-to-Text:

  • Text-to-Speech (TTS) settings:
    • ttsLanguage
    • ttsProvider
    • voice
  • Speech-to-Text (STT) settings:
    • transcriptionLanguage
    • transcriptionProvider
    • speechModel

Configure language settings

configure-language-settings page anchor

Configure language settings in two places:

  1. Attributes of <ConversationRelay>: These serve as the default settings used when the session starts.
  2. Within <Language> Elements: Each <Language> element configures settings for a specific language code. You can include multiple <Language> elements to support multiple languages.

Handle defaults and overrides

handle-defaults-and-overrides page anchor
  • In <ConversationRelay>, the ttsLanguage attribute overrides the language attribute for the default TTS language.
  • In <ConversationRelay>, the transcriptionLanguage attribute overrides the language attribute for the STT language.
  • If a <Language> element specifies the same code attribute as in <ConversationRelay>, the <Language> element's settings take precedence.
  • The system uses default values when you don't provide specific settings.

Default Values

  • language: Defaults to en-US if not specified.
  • ttsProvider: Defaults to ElevenLabs if not specified.
  • transcriptionProvider: Defaults to Google if not specified.
  • If you set the ttsProvider attribute without the voice attribute, the system uses a default voice for that provider.
  • If you set the transcriptionProvider attribute without the speechModel attribute, the system uses a default model for that provider.
  • If you set the voice attribute without the ttsProvider attribute, the system infers the provider from the default or specified ttsProvider.
  • If you set the speechModel attribute without the transcriptionProvider attribute, the system infers the provider from the default or specified transcriptionProvider.

For Speech-to-Text (STT) settings:

  • At session start, the service uses the transcriptionLanguage attribute to initiate the STT session.
  • If the combination of the transcriptionProvider and speechModel attributes is invalid, the call disconnects, and the system reports an error in the action callback and error notifications.
  • You can change the transcriptionLanguage attribute during the session via the language message you send through the Service Provider Interface (SPI).

For Text-to-Speech (TTS) settings:

  • When the lang property is present in the text token message from the SPI, the service uses it to select the TTS voice.
  • If the combination of the ttsProvider and voice attributes is invalid, the system sends an error message over the SPI.
  • If you don't specify the lang property in the text token, the service uses the current TTS language settings.

Result of TwiML execution

result-of-twiml-execution page anchor

<Connect> action URL callback

connect-action-url-callback page anchor

When an action URL is specified in the <Connect> verb, ConversationRelay will make a request to that URL when the <Connect> verb ends. The request includes call information and session details.

Example Payloads

Session ended by application example

session-ended-by-application-example page anchor
1
{
2
"AccountSid": "AC00000000000000000000000000000000",
3
"CallSid": "CA00000000000000000000000000000000",
4
"CallStatus": "in-progress",
5
"From": "client:caller",
6
"To": "test:conversationrelay",
7
"Direction": "inbound",
8
"ApplicationSid": "AP00000000000000000000000000000000",
9
"SessionId": "VX00000000000000000000000000000000",
10
"SessionStatus": "ended",
11
"SessionDuration": "25",
12
"HandoffData": "{\"reason\": \"The caller requested to talk to a real person\"}"
13
}

Error occurred during session example

error-occurred-during-session-example page anchor
1
{
2
"AccountSid": "AC00000000000000000000000000000000",
3
"CallSid": "CA00000000000000000000000000000000",
4
"CallStatus": "in-progress",
5
"From": "client:caller",
6
"To": "test:conversationrelay",
7
"Direction": "inbound",
8
"ApplicationSid": "AP00000000000000000000000000000000",
9
"SessionId": "VX00000000000000000000000000000000",
10
"SessionStatus": "failed",
11
"SessionDuration": "10",
12
"ErrorCode": "39001",
13
"ErrorMessage": "Network connection to WebSocket server failed."
14
}

Session completed normally (caller hung up) example

session-completed-normally-caller-hung-up-example page anchor
1
{
2
"AccountSid": "AC00000000000000000000000000000000",
3
"CallSid": "CA00000000000000000000000000000000",
4
"CallStatus": "completed",
5
"From": "client:caller",
6
"To": "test:conversationrelay",
7
"Direction": "inbound",
8
"ApplicationSid": "AP00000000000000000000000000000000",
9
"SessionId": "VX00000000000000000000000000000000",
10
"SessionStatus": "completed",
11
"SessionDuration": "35"
12
}

ConversationRelay, including the <ConversationRelay> TwiML nouns and APIs, use artificial intelligence or machine learning technologies.

Our AI Nutrition Facts for ConversationRelay(link takes you to an external page) provide an overview of the AI feature you're using, so you can better understand how the AI is working with your data. The below AI Nutrition Label details the ConversationRelay AI qualities. For more information and the glossary regarding the AI Nutrition Facts Label, refer to our AI Nutrition Facts page(link takes you to an external page).

Deepgram AI nutrition facts

deepgram-ai-nutrition-facts page anchor

AI Nutrition Facts

ConversationRelay (STT and TTS) - Programmable Voice - Deepgram

Description
Generate speech to text in real-time through a WebSocket API in Programmable Voice.
Privacy Ladder Level
N/A
Feature is Optional
Yes
Model Type
Automatic Speech Recognition
Base Model
Deepgram Nova2

Trust Ingredients

Base Model Trained with Customer Data
No

ConversationRelay uses the Default Base Model provided by the Model Vendor. The Base Model is not trained using Customer Data.

Customer Data is Shared with Model Vendor
No

ConversationRelay uses the Default Base Model provided by the Model Vendor. The Base Model is not trained using Customer Data.

Training Data Anonymized
N/A

Base Model is not trained using any Customer Data.

Data Deletion
N/A

Customer Data is not stored or retained in the Base Model.

Human in the Loop
Yes

Customer can view and listen to the input and output in the customer's own terminal.

Data Retention
N/A

Compliance

Logging & Auditing
Yes

Customer can view and listen to the input and output in the customer's own terminal.

Guardrails
Yes

Customer can view and listen to the input and output in the customer's own terminal.

Input/Output Consistency
Yes

Customer is responsible for human review.

Other Resources
Learn more about this label at nutrition-facts.ai

Google AI nutrition facts

google-ai-nutrition-facts page anchor

AI Nutrition Facts

ConversationRelay (STT and TTS) - Programmable Voice - Google AI

Description
Generate speech to text in real-time and convert text into natural-sounding speech through a WebSocket API in Programmable Voice.
Privacy Ladder Level
N/A
Feature is Optional
Yes
Model Type
Generative and Predictive - Automatic Speech Recognition and Text-to-Speech
Base Model
Google Speech-to-Text; Google Text-to-Speech

Trust Ingredients

Base Model Trained with Customer Data
No

ConversationRelay uses the Default Base Model provided by the Model Vendor. The Base Model is not trained using Customer Data.

Customer Data is Shared with Model Vendor
No

ConversationRelay uses the Default Base Model provided by the Model Vendor. The Base Model is not trained using Customer Data.

Training Data Anonymized
N/A

Base Model is not trained using any Customer Data.

Data Deletion
N/A

Customer Data is not stored or retained in the Base Model.

Human in the Loop
Yes

Customer can view and listen to the input and output in the customer's own terminal.

Data Retention
N/A

Compliance

Logging & Auditing
Yes

Customer can view and listen to the input and output in the customer's own terminal.

Guardrails
Yes

Customer can view and listen to the input and output in the customer's own terminal.

Input/Output Consistency
Yes

Customer is responsible for human review.

Other Resources
Learn more about this label at nutrition-facts.ai

Amazon AI nutrition facts

amazon-ai-nutrition-facts page anchor

AI Nutrition Facts

ConversationRelay (STT and TTS) - Programmable Voice - Amazon AI

Description
Convert text into natural sounding speech through a websocket API in Programmable Voice.
Privacy Ladder Level
N/A
Feature is Optional
Yes
Model Type
Generative and Predictive
Base Model
Amazon Polly Text-to-Speech

Trust Ingredients

Base Model Trained with Customer Data
No

ConversationRelay uses the Default Base Model provided by the Model Vendor. The Base Model is not trained using Customer Data.

Customer Data is Shared with Model Vendor
No

ConversationRelay uses the Default Base Model provided by the Model Vendor. The Base Model is not trained using Customer Data.

Training Data Anonymized
N/A

Base Model is not trained using any Customer Data.

Data Deletion
N/A

Customer Data is not stored or retained in the Base Model.

Human in the Loop
Yes

Customer can view and listen to the input and output in the customer's own terminal.

Data Retention
N/A

Compliance

Logging & Auditing
Yes

Customer can view and listen to the input and output in the customer's own terminal.

Guardrails
Yes

Customer can view and listen to the input and output in the customer's own terminal.

Input/Output Consistency
Yes

Customer is responsible for human review.

Other Resources
Learn more about this label at nutrition-facts.ai

ElevenLabs nutrition facts

elevenlabs-nutrition-facts page anchor

AI Nutrition Facts

ConversationRelay (STT and TTS) - Programmable Voice - ElevenLabs

Description
Convert text into a human-sounding voice using speech synthesis technology from ElevenLabs.
Privacy Ladder Level
N/A
Feature is Optional
Yes
Model Type
Predictive
Base Model
ElevenLabs Text-To-Speech: Flash 2 and Flash 2.5

Trust Ingredients

Base Model Trained with Customer Data
No

The Base Model is not trained using any Customer Data.

Customer Data is Shared with Model Vendor
No

Programmable Voice uses the default Base Model provided by the Model Vendor. The Base Model is not trained using customer data.

Training Data Anonymized
N/A

Base Model is not trained using any Customer Data.

Data Deletion
N/A

The Base Model is not trained using any Customer Data.

Human in the Loop
Yes

Customers can view text input and listen to the audio output.

Data Retention
Customer can review TwiML logs, including <Say> Logs, to debug and troubleshoot for up to 30 days.

Compliance

Logging & Auditing
Yes

Customers can view text input and listen to the audio output.

Guardrails
Yes

Customers can view text input and listen to the audio output.

Input/Output Consistency
Yes

Customer is responsible for human review.

Other Resources
Learn more about this label at nutrition-facts.ai