Picking a voice
Picking a voice for your ConversationRelay application is an important step towards creating an engaging user experience. Twilio supports text-to-speech voices from Google, Amazon Polly, and ElevenLabs. Text-to-Speech (TTS) voice quality varies significantly by provider and voice type. While generative voices often offer higher fidelity and more natural-sounding responses, they can introduce additional latency and process TTS at a slower rate.
For voices from Google or Amazon (including generative options), refer to our Twilio TTS Voices documentation. Each provider offers a variety of languages and styles, enabling you to tailor your application's voice experience to your specific needs.
- Browse the available voices in the Available voices and languages table. Test them using the Twilio Console to find the one that best fits your application's requirements.
- Copy the voice ID from the table (for example,
en-US-Wavenet-D
). - Configure the
<ConversationRelay>
noun in TwiML: SetttsProvider
toGoogle
orAmazon
and use the copied voice ID in thevoice
attribute.
ElevenLabs uses the Flash 2.5 model by default for text-to-speech. Use the interface below to search and filter through a wide selection of ElevenLabs voices by language, accent, age, and more. Each voice entry includes a voice ID that you can copy and paste into your <ConversationRelay>
configuration.
-
Search or filter: Pick a voice using the tool below that matches your requirements.
-
Copy the voice ID: From the search results, copy the voice ID (for example,
NYC9WEgkq1u4jiqBseQ9
). -
Configure the
<ConversationRelay>
noun: In your TwiML, setttsProvider="ElevenLabs"
and use the copied voice ID in thevoice
attribute. -
Pick an audio model (optional): The voices from ElevenLabs use the Flash 2.5 model by default. Other models are available and could improve the quality or performance of your application depending on your use case. You can use a different model by appending a hyphen to the voice ID followed by the model ID. The supported model IDs include
flash_v2
,turbo_v2_5
,turbo_v2
and the default,flash_v2_5
. Some models only work with a specific set of languages. You can learn about the strengths and the supported languages of each model on the ElevenLabs website. -
Customize your ElevenLabs voice (recommended): You can adjust the
speed
and other characteristics of your chosen ElevenLabs voice. To do that, add a hyphen to the end of thevoice
attribute followed by an underscore-separated string with values forspeed
,stability
, andsimilarity
respectively. Thespeed
should be a value between 0.7 and 1.2 and thestability
andsimilarity
values can range from 0.0 to 1.0.For example, a
voice
attribute ofXrExE9yKIg1WjnnlVkGX-1.2_0.6_0.8
will set the speed to1.2
, thestability
to0.6
, and thesimilarity
to0.8
. See the ElevenLabs documentation to learn more about how these settings affect your application's voice.
Example:
1<Connect>2<ConversationRelay url="wss://example.com/websocket" ttsProvider="ElevenLabs" voice="NYC9WEgkq1u4jiqBseQ9-turbo_v2_5-0.8_0.8_0.6" ... />3</Connect>
If you don't explicitly specify the voice attribute in your <ConversationRelay>
configuration, ConversationRelay automatically applies a default voice based on the language setting (as defined by the language or ttsLanguage attribute) and the selected TTS provider (default is ElevenLabs). Below is the complete list of default voice settings:
Language | Voice ID | TTS provider | Speech model | Transcription provider |
---|---|---|---|---|
bg-BG | AB9XsbSA4eLG12t2myjN | ElevenLabs | long | |
cs-CZ | uYFJyGaibp4N2VwYQshk | ElevenLabs | long | |
da-DK | ygiXC2Oa1BiHksD3WkJZ | ElevenLabs | long | |
de-DE | FTNCalFNG5bRnkkaP5Ug | ElevenLabs | telephony | |
en-AU | 9Ft9sm9dzvprPILZmLJl | ElevenLabs | telephony | |
en-GB | Fahco4VZzobUeiPqni1S | ElevenLabs | telephony | |
en-IN | mCQMfsqGDT6IDkEKR20a | ElevenLabs | long | |
en-US | UgBBYS2sOqTuMpoF3BR0 | ElevenLabs | telephony | |
es-ES | 6xftrpatV0jGmFHxDjUv | ElevenLabs | telephony | |
es-US | CaJslL1xziwefCeTNzHv | ElevenLabs | telephony | |
fi-FI | 6xPz2opT0y5qtoRh1U1Y | ElevenLabs | long | |
fr-CA | IPgYtHTNLjC7Bq7IPHrm | ElevenLabs | telephony | |
fr-FR | a5n9pJUnAhX4fn7lx3uo | ElevenLabs | telephony | |
hi-IN | IvLWq57RKibBrqZGpQrC | ElevenLabs | long | |
hu-HU | TumdjBNWanlT3ysvclWh | ElevenLabs | long | |
id-ID | 1k39YpzqXZn52BgyLyGO | ElevenLabs | long | |
it-IT | uScy1bXtKz8vPzfdFsFw | ElevenLabs | telephony | |
ja-JP | 3JDquces8E8bkmvbh6Bc | ElevenLabs | telephony | |
kn-IN | kn-IN-Standard-A | long | ||
ko-KR | uyVNoMrnUku1dZyVEXwD | ElevenLabs | telephony | |
ml-IN | ml-IN-Standard-A | long | ||
mr-IN | mr-IN-Standard-A | long | ||
nl-BE | s7Z6uboUuE4Nd8Q2nye6 | ElevenLabs | telephony | |
nl-NL | UNBIyLbtFB9k7FKW8wJv | ElevenLabs | telephony | |
pl-PL | W0sqKm1Sfw1EzlCH14FQ | ElevenLabs | long | |
pt-BR | CstacWqMhJQlnfLPxRG4 | ElevenLabs | telephony | |
pt-PT | TsZfI8Nbn2Xd7ArC76n9 | ElevenLabs | telephony | |
ro-RO | OlBp4oyr3FBAGEAtJOnU | ElevenLabs | long | |
ru-RU | AB9XsbSA4eLG12t2myjN | ElevenLabs | long | |
sv-SE | 4xkUqaR9MYOJHoaC1Nak | ElevenLabs | long | |
ta-IN | ZhJ5LanYnCmLKQUXvsV7 | ElevenLabs | long | |
te-IN | te-IN-Standard-A | long | ||
th-TH | th-TH-Standard-A | long | ||
tr-TR | IuRRIAcbQK5AQk1XevPj | ElevenLabs | long | |
uk-UA | nCqaTnIbLdME87OuQaZY | ElevenLabs | long | |
vi-VN | foH7s9fX31wFFH2yqrFa | ElevenLabs | long |
Our internal configuration defines these default settings and updates them periodically. Refer to the Twilio Twilio TTS Voices documentation for a complete and current list of supported languages, default voices, and detailed settings.