Skip to contentSkip to navigationSkip to topbar
On this page

Picking a voice


Picking a voice for your ConversationRelay application is an important step towards creating an engaging user experience. Twilio supports text-to-speech voices from Google, Amazon Polly, and ElevenLabs. Text-to-Speech (TTS) voice quality varies significantly by provider and voice type. While generative voices often offer higher fidelity and more natural-sounding responses, they can introduce additional latency and process TTS at a slower rate.


Google and Amazon Polly voices

google-and-amazon-polly-voices page anchor

For voices from Google or Amazon (including generative options), refer to our Twilio TTS Voices documentation. Each provider offers a variety of languages and styles, enabling you to tailor your application's voice experience to your specific needs.

How to use Google and Amazon Polly voices

how-to-use-google-and-amazon-polly-voices page anchor
  1. Browse the available voices in the Available voices and languages table. Test them using the Twilio Console to find the one that best fits your application's requirements.
  2. Copy the voice ID from the table (for example, en-US-Wavenet-D).
  3. Configure the <ConversationRelay> noun in TwiML: Set ttsProvider to Google or Amazon and use the copied voice ID in the voice attribute.

ElevenLabs uses the Flash 2.5 model by default for text-to-speech. Use the interface below to search and filter through a wide selection of ElevenLabs voices by language, accent, age, and more. Each voice entry includes a voice ID that you can copy and paste into your <ConversationRelay> configuration.

How to use ElevenLabs voices

how-to-use-elevenlabs-voices page anchor
  1. Search or filter: Pick a voice using the tool below that matches your requirements.

  2. Copy the voice ID: From the search results, copy the voice ID (for example, NYC9WEgkq1u4jiqBseQ9).

  3. Configure the <ConversationRelay> noun: In your TwiML, set ttsProvider="ElevenLabs" and use the copied voice ID in the voice attribute.

  4. Pick an audio model (optional): The voices from ElevenLabs use the Flash 2.5 model(link takes you to an external page) by default. Other models are available and could improve the quality or performance of your application depending on your use case. You can use a different model by appending a hyphen to the voice ID followed by the model ID. The supported model IDs include flash_v2, turbo_v2_5, turbo_v2 and the default, flash_v2_5. Some models only work with a specific set of languages. You can learn about the strengths and the supported languages of each model on the ElevenLabs website(link takes you to an external page).

  5. Customize your ElevenLabs voice (recommended): You can adjust the speed and other characteristics of your chosen ElevenLabs voice. To do that, add a hyphen to the end of the voice attribute followed by an underscore-separated string with values for speed, stability, and similarity respectively. The speed should be a value between 0.7 and 1.2 and the stability and similarity values can range from 0.0 to 1.0.

    For example, a voice attribute of XrExE9yKIg1WjnnlVkGX-1.2_0.6_0.8 will set the speed to 1.2, the stability to 0.6, and the similarity to 0.8. See the ElevenLabs documentation(link takes you to an external page) to learn more about how these settings affect your application's voice.

Example:

1
<Connect>
2
<ConversationRelay url="wss://example.com/websocket" ttsProvider="ElevenLabs" voice="NYC9WEgkq1u4jiqBseQ9-turbo_v2_5-0.8_0.8_0.6" ... />
3
</Connect>

If you don't explicitly specify the voice attribute in your <ConversationRelay> configuration, ConversationRelay automatically applies a default voice based on the language setting (as defined by the language or ttsLanguage attribute) and the selected TTS provider (default is ElevenLabs). Below is the complete list of default voice settings:

LanguageVoice IDTTS providerSpeech modelTranscription provider
bg-BGAB9XsbSA4eLG12t2myjNElevenLabslongGoogle
cs-CZuYFJyGaibp4N2VwYQshkElevenLabslongGoogle
da-DKygiXC2Oa1BiHksD3WkJZElevenLabslongGoogle
de-DEFTNCalFNG5bRnkkaP5UgElevenLabstelephonyGoogle
en-AU9Ft9sm9dzvprPILZmLJlElevenLabstelephonyGoogle
en-GBFahco4VZzobUeiPqni1SElevenLabstelephonyGoogle
en-INmCQMfsqGDT6IDkEKR20aElevenLabslongGoogle
en-USUgBBYS2sOqTuMpoF3BR0ElevenLabstelephonyGoogle
es-ES6xftrpatV0jGmFHxDjUvElevenLabstelephonyGoogle
es-USCaJslL1xziwefCeTNzHvElevenLabstelephonyGoogle
fi-FI6xPz2opT0y5qtoRh1U1YElevenLabslongGoogle
fr-CAIPgYtHTNLjC7Bq7IPHrmElevenLabstelephonyGoogle
fr-FRa5n9pJUnAhX4fn7lx3uoElevenLabstelephonyGoogle
hi-INIvLWq57RKibBrqZGpQrCElevenLabslongGoogle
hu-HUTumdjBNWanlT3ysvclWhElevenLabslongGoogle
id-ID1k39YpzqXZn52BgyLyGOElevenLabslongGoogle
it-ITuScy1bXtKz8vPzfdFsFwElevenLabstelephonyGoogle
ja-JP3JDquces8E8bkmvbh6BcElevenLabstelephonyGoogle
kn-INkn-IN-Standard-AGooglelongGoogle
ko-KRuyVNoMrnUku1dZyVEXwDElevenLabstelephonyGoogle
ml-INml-IN-Standard-AGooglelongGoogle
mr-INmr-IN-Standard-AGooglelongGoogle
nl-BEs7Z6uboUuE4Nd8Q2nye6ElevenLabstelephonyGoogle
nl-NLUNBIyLbtFB9k7FKW8wJvElevenLabstelephonyGoogle
pl-PLW0sqKm1Sfw1EzlCH14FQElevenLabslongGoogle
pt-BRCstacWqMhJQlnfLPxRG4ElevenLabstelephonyGoogle
pt-PTTsZfI8Nbn2Xd7ArC76n9ElevenLabstelephonyGoogle
ro-ROOlBp4oyr3FBAGEAtJOnUElevenLabslongGoogle
ru-RUAB9XsbSA4eLG12t2myjNElevenLabslongGoogle
sv-SE4xkUqaR9MYOJHoaC1NakElevenLabslongGoogle
ta-INZhJ5LanYnCmLKQUXvsV7ElevenLabslongGoogle
te-INte-IN-Standard-AGooglelongGoogle
th-THth-TH-Standard-AGooglelongGoogle
tr-TRIuRRIAcbQK5AQk1XevPjElevenLabslongGoogle
uk-UAnCqaTnIbLdME87OuQaZYElevenLabslongGoogle
vi-VNfoH7s9fX31wFFH2yqrFaElevenLabslongGoogle

Our internal configuration defines these default settings and updates them periodically. Refer to the Twilio Twilio TTS Voices documentation for a complete and current list of supported languages, default voices, and detailed settings.