Menu

Expand
Rate this page:

Text-to-Speech (TTS)

Text To Speech (TTS), also known as Speech Synthesis, is a process where text is converted into a human-sounding voice. TTS has been a popular choice for developers and business users alike when building IVR (Interactive Voice Response) solutions and other voice applications, as it accelerates time to production without having to record audio files with human voices. Using recorded files requires recording each message with a human voice, whereas TTS prompts can be dynamically generated from raw text.

Get Started with Text-to-Speech

Using the <Say> verb, you can provide text and Twilio will synthesize speech in real time and play back the audio in any call. For example, the following TwiML plays Hello World.

<Response>
   <Say>Hello World!</Say>
</Response>

When using <Say>, you can choose between using Man, Woman, Alice, or Amazon Polly voices. To use one of these voices, either configure the Text to Speech settings in the Twilio Console, or provide the Voice attribute on <Say>. You can view and change the default voice in the Twilio Console.

Text-to-Speech Console Page

The Text-to-Speech page in the Twilio Console allows you to configure your account's Text-to-Speech (TTS) voice and locale.

All Twilio accounts use the Amazon Polly Provider by default.

In the Console, you can also change the default voice for a specific locale. For example, the default voice for en-GB is Amy. To change the voice to Emma, click on English (British) (en-GB) under the Locale Mapping table and select Emma under the Voice drop-down.

Once you've made these changes, the following TwiML will use Emma:

<Response>
   <Say language="en-GB">Hello I am Emma!!</Say>
</Response>

As a developer, you can override the default voice and locale on your account by providing attributes on the <Say> verb. For example, if the default voice for your account is Amazon Polly Salli and you’d like to use Amazon Polly Joanna for a specific call, you can provide Polly.Joanna that as the voice attribute:

<Response>
   <Say voice="Polly.Joanna">Hello I am Joanna!</Say>
</Response>

You can also use an Amazon Polly Neural voice for languages that have Neural voices available.

<Response>
   <Say voice="Polly.Joanna-Neural">Hello I am Joanna!</Say>
</Response>

You can learn more about these attributes in the <Say> API Docs page.

Amazon Polly Text-to-Speech

Amazon Polly is one of the leading providers for life-like text to speech, including Neural voices, and offers voices across many languages and locales. It comes with support for SSML, which allows developers to control many aspects of the synthesized speech. The Amazon Polly Neural TTS (NTTS) system can produce even higher quality voices than its standard voices. The NTTS system produces the most natural and human-like text-to-speech voices possible.

Polly Standard and Neural Voices

The following table contains the list of Polly and Neural voices that can be used with the voice attribute on <Say>.

Polly Voice

Gender

Arabic (arb)

Polly.Zeina

Female

Arabic (Gulf) (ar-AE)

Polly.Hala-Neural

Female

Catalan (ca-ES)

Polly.Arlet-Neural

Female

Chinese (Cantonese) (yue-CN)

Polly.Hiujin-Neural

Female

Chinese (Mandarin) (cmn-CN)

Polly.Zhiyu

Female

Polly.Zhiyu-Neural

Female

Danish (da-DK)

Polly.Mads

Male

Polly.Naja

Female

Dutch (nl-NL)

Polly.Lotte

Female

Polly.Ruben

Male

Polly.Laura-Neural

Female

English (Australian) (en-AU)

Polly.Nicole

Female

Polly.Russell

Male

Polly.Olivia-Neural

Female

English (British) (en-GB)

Polly.Amy

Female

Polly.Brian

Male

Polly.Emma

Female

Polly.Amy-Neural

Female

Polly.Emma-Neural

Female

Polly.Brian-Neural

Male

Polly.Arthur-Neural

Male

English (Indian) (en-IN)

Polly.Aditi

Female

Polly.Raveena

Female

Polly.Kajal-Neural

Female

English (New Zealand) (en-NZ)

Polly.Aria-Neural

Female

English (US) (en-US)

Polly.Ivy

Female

Polly.Joanna

Female

Polly.Joey

Male

Polly.Justin

Male

Polly.Kendra

Female

Polly.Kimberly

Female

Polly.Matthew

Male

Polly.Salli

Female

Polly.Ivy-Neural

Female

Polly.Joanna-Neural*

Female

Polly.Kendra-Neural

Female

Polly.Kevin-Neural

Male (child)

Polly.Kimberly-Neural

Female

Polly.Salli-Neural

Female

Polly.Joey-Neural

Male

Polly.Justin-Neural

Male

Polly.Matthew-Neural*

Male

English (South African) (en-ZA)

Polly.Ayanda-Neural

Female

English (Welsh) (en-GB-WLS)

Polly.Geraint

Male

Finnish (fi-FI)

Polly.Suvi-Neural

Female

French (fr-FR)

Polly.Céline/Polly.Celine

Female

Polly.Léa/Polly.Lea

Female

Polly.Mathieu

Male

Polly.Lea-Neural

Female

French (Canadian) (fr-CA)

Polly.Chantal

Female

Polly.Gabrielle-Neural

Female

Polly.Liam-Neural

Male

German (de-DE)

Polly.Hans

Male

Polly.Marlene

Female

Polly.Vicki

Female

Polly.Vicki-Neural

Female

Polly.Daniel-Neural

Male

German (Austrian) (de-AT)

Polly.Hannah-Neural

Female

Hindi (hi-IN)

Polly.Aditi

Female

Polly.Kajal-Neural

Female

Icelandic (is-IS)

Polly.Dóra/Polly.Dora

Female

Polly.Karl

Male

Italian (it-IT)

Polly.Bianca

Female

Polly.Carla

Female

Polly.Giorgio

Male

Polly.Bianca-Neural

Female

Japanese (ja-JP)

Polly.Mizuki

Female

Polly.Takumi

Male

Polly.Takumi-Neural

Male

Korean (ko-KR)

Polly.Seoyeon

Female

Polly.Seoyeon-Neural

Female

Norwegian (nb-NO)

Polly.Liv

Female

Polly.Ida-Neural

Female

Polish (pl-PL)

Polly.Jacek

Male

Polly.Jan

Male

Polly.Ewa

Female

Polly.Maja

Female

Polly.Ola-Neural

Female

Portuguese (Brazilian) (pt-BR)

Polly.Camila

Female

Polly.Ricardo

Male

Polly.Vitória/Polly.Vitoria

Female

Polly.Camila-Neural

Female

Polly.Vitoria-Neural

Female

Portuguese (European) (pt-PT)

Polly.Cristiano

Male

Polly.Inês/Polly.Ines

Female

Polly.Ines-Neural

Female

Romanian (ro-RO)

Polly.Carmen

Female

Russian (ru-RU)

Polly.Maxim

Male

Polly.Tatyana

Female

Spanish (Castilian) (es-ES)

Polly.Conchita

Female

Polly.Enrique

Male

Polly.Lucia

Female

Polly.Lucia-Neural

Female

Spanish (Mexican) (es-MX)

Polly.Mia

Female

Polly.Mia-Neural

Female

US Spanish (es-US)

Polly.Lupe

Female

Polly.Miguel

Male

Polly.Penélope/Polly.Penelope

Female

Polly.Lupe-Neural

Female

Polly.Pedro-Neural

Male

Swedish (sv-SE)

Polly.Astrid

Female

Polly.Elin-Neural

Female

Turkish (tr-TR)

Polly.Filiz

Female

Welsh (cy-GB)

Polly.Gwyneth

Female

SSML with Amazon Polly

Speech Synthesis Markup Language (SSML) is a W3C specification that allows developers to use XML-based markup language for assisting the generation of synthesized speech. Twilio has partnered with Amazon Polly so that you can use <Say> to control the synthesized speech.

As per the SSML spec, the root element for SSML starts with <speak>; however, when you are using SSML with <Say>, you can skip <speak> and insert the rest of the SSML inside <Say>. For example,

<Response>
  <Say voice="Polly.Joanna">
     <prosody rate="fast">
     Speech Synthesis Markup Language (SSML) is a W3C specification that allows developers to use XML-based markup language for assisting the generation of synthesized speech.
     </prosody>
  </Say>
</Response>

Below are a few SSML tags and examples of how you can use them with <Say>.

<prosody> controls the volume, rate, and pitch of synthesized speech.

<Response>
   <Say voice="Polly.Joanna">Prosody can be used to change the way words sound. The following words are
 <prosody volume="x-loud"> quite a bit louder than the rest of this passage.
 </prosody> Each morning when I wake up, <prosody rate="x-slow">I speak slowly and deliberately until I have my coffee.</prosody> I can also change the pitch of my voice using prosody. Do you like <prosody pitch="+5%"> speech with a pitch higher,</prosody> or <prosody pitch="-10%"> is a lower pitch preferable?</prosody></Say>
</Response>

Neural voices support the volume and rate attributes, but don’t support the pitch attribute. Learn more about using SSML with Amazon Polly in the Amazon Polly Developer Guide.

<say-as> allows you to indicate information about the type of text contained within the element and to specify the level of detail for rendering the contained text.

For example, if you are trying to repeat a phone number and do not use <say-as>, the following <Say> will play “John’s phone number is ... four billion one hundred fifty five million five hundred fifty one thousand two hundred twelve”.

<Response>
   <Say>John’s phone number is, 4155551212</Say>
</Response>

To synthesize the text so that the phone number is read back correctly, you can rewrite the <Say> as follows, so that you hear “John’s phone number is ... four one five ... five five five ... one two one two”.

<Response>
   <Say voice="Polly.Joanna">John’s phone number is, <say-as interpret-as="telephone">4155551212</say-as></Say>
</Response>

Generating SSML via Helper Libraries

You can generate TwiML with SSML within the <Say> verb using one of our helper libraries for C#, Java, Node.js, PHP, Python, or Ruby.

        
        
        

        SSML with Helper Library Example

        Using SSML in Studio Flows

        Studio supports embedding SSML directly in the Text to Say field of Say/Play and Gather Input on Call widgets.

        Amazon Polly SSML Support

        While the W3C specification covers many capabilities, Amazon Polly currently only supports the following SSML. Click on individual actions to learn more.

        Limits with Amazon Polly Text-to-Speech

        The following limits apply when using Amazon Polly Voices.

        1. There is a 3,000 character limit on text that <Say> can process with Polly Voices.
        2. Amazon-specific SSML tags such as <amazon:auto-breath> are not currently supported.
        3. Lexicons are not supported.
        4. Neural voices are only available in specific locales.

        Pricing

        Amazon Polly Neural Pricing

        Amazon Polly Neural price starts at $0.0032/100 neural characters with the following volume discounts.

        Characters Min

        Characters Max

        *Price per 100 Characters

        0

        5,000,000

        $0.0032

        5,000,001

        50,000,000

        $0.0029

        50,000,001

        100,000,000

        $0.0027

        100,000,001

        $0.0025

        Amazon Polly Pricing

        Amazon Polly price starts at $0.0008/100 characters with the following volume discounts.

        Characters Min

        Characters Max

        *Price per 100 Characters

        0

        5,000,000

        $0.00080

        5,000,001

        50,000,000

        $0.00072

        50,000,001

        100,000,000

        $0.00068

        100,000,001

        $0.00064

        * Usage is rounded towards the end of call and priced in blocks of 100 characters. For example, if 546 characters are used on a call, then you’re charged $0.004 for the use of Polly Voices on that call.

        Commit to a monthly volume and receive a significant discount beyond standard volume discounts. Contact the Twilio sales team to learn more.

        Rate this page:

        Need some help?

        We all do sometimes; code is hard. Get help now from our support team, or lean on the wisdom of the crowd by visiting Twilio's Stack Overflow Collective or browsing the Twilio tag on Stack Overflow.

              
              
              

              Thank you for your feedback!

              Please select the reason(s) for your feedback. The additional information you provide helps us improve our documentation:

              Sending your feedback...
              🎉 Thank you for your feedback!
              Something went wrong. Please try again.

              Thanks for your feedback!

              thanks-feedback-gif