Menu

Expand
Rate this page:

Text-to-Speech

Text To Speech (TTS), also known as Speech Synthesis, is a process where text is converted into a human-sounding voice. TTS has been a popular choice for developers and business users alike when building IVR (Interactive Voice Response) solutions and other voice applications, as it accelerates time to production without having to record audio files with human voices. Using recorded files requires recording each message with a human voice, whereas TTS prompts can be dynamically generated from raw text.

Getting Started

Using the <Say> verb, you can provide text and Twilio will synthesize speech in real time and play back the audio in any call. For example, the following TwiML plays Hello World.

<Response>
   <Say>Hello World!</Say>
</Response>

When using <Say>, you can choose between using Man, Woman, Alice, or Amazon Polly voices. To use one of these voices, either configure the Text to Speech settings in the Twilio Console, or provide the Voice attribute on <Say>. You can view and change the default voice in the Twilio Console.

Text to Speech Console

The TTS console page allows you to test different voices and set the default TTS voice and locale for your account. To get started, navigate to https://www.twilio.com/console/voice/twiml/text-to-speech.

All Twilio accounts use the Basic Provider by default. To change from Basic to Amazon Polly, navigate to the console page and select Amazon Polly as the provider.

Change Default TTS Provider

Default TTS Provider to Polly

Once you change the default provider to Amazon Polly, the default voice and locale become Salli and en-US, respectively. After this change, Twilio will synthesize the following TwiML using the Salli voice:

<Response>
   <Say>Hello I am Salli!</Say>
</Response>

Using the TTS console, you can select the voice and local for your account without writing any code. For example, to change the default locale to French, simply select "French" under Locale and press Save.

Change Default TTS Voice to French

You can also change the default voice for an individual locale from the console page. For example, the default voice for en-GB is set to Amy. To change the voice to Emma, click on "English (British) (en-GB)" under the Locale Mapping table and select "Emma" under the Voice drop-down.

Changing TTS British Voice

Test TTS British Voice

Once you've made these changes, return the following TwiML to hear text in Emma’s voice:

<Response>
   <Say language="en-GB">Hello I am Emma!!</Say>
</Response>

As a developer, you can override the default voice and locale on your account by providing attributes on the <Say> verb. For example, if the default voice and locale are set to Salli and en-US, but you’d like to use Joanna for a specific call, you can provide that as the voice attribute:

<Response>
   <Say voice="Polly.Joanna">Hello I am Joanna!</Say>
</Response>

You can also use an Amazon Polly Neural voice for languages that have Neural voices available.

<Response>
   <Say voice="Polly.Joanna-Neural">Hello I am Joanna!</Say>
</Response>

You can learn more about these attributes in the <Say> API Docs page.

Amazon Polly

Amazon Polly is one of the leading providers for life-like text to speech, including Neural voices, and offers voices across many languages and locales. It comes with support for SSML, which allows developers to control many aspects of the synthesized speech. The Amazon Polly Neural TTS (NTTS) system can produce even higher quality voices than its standard voices. The NTTS system produces the most natural and human-like text-to-speech voices possible.

Polly Standard and Neural Voices

The following table contains the list of Polly and Neural voices that can be used with the voice attribute on <Say>.

Polly Voice

Gender

Danish (da-DK)

Polly.Mads

Male

Polly.Naja

Female

Dutch (nl-NL)

Polly.Lotte

Female

Polly.Ruben

Male

English (Australian) (en-AU)

Polly.Nicole

Female

Polly.Russell

Male

English (British) (en-GB)

Polly.Amy

Female

Polly.Brian

Male

Polly.Emma

Female

Polly.Amy-Neural

Female

Polly.Emma-Neural

Female

Polly.Brian-Neural

Male

English (Indian) (en-IN)

Polly.Raveena

Female

English (US) (en-US)

Polly.Ivy

Female

Polly.Joanna

Female

Polly.Joey

Male

Polly.Justin

Male

Polly.Kendra

Female

Polly.Kimberly

Female

Polly.Matthew

Male

Polly.Salli

Female

Polly.Ivy-Neural

Female

Polly.Joanna-Neural*

Female

Polly.Kendra-Neural

Female

Polly.Kimberly-Neural

Female

Polly.Salli-Neural

Female

Polly.Joey-Neural

Male

Polly.Justin-Neural

Male

Polly.Matthew-Neural*

Male

English (Welsh) (en-GB-WLS)

Polly.Geraint

Male

French (fr-FR)

Polly.Céline/Polly.Celine

Female

Polly.Mathieu

Male

French (Canadian) (fr-CA)

Polly.Chantal

Female

German (de-DE)

Polly.Hans

Male

Polly.Marlene

Female

Polly.Vicki

Female

Icelandic (is-IS)

Polly.Dóra/Polly.Dora

Female

Polly.Karl

Male

Italian (it-IT)

Polly.Carla

Female

Polly.Giorgio

Male

Japanese (ja-JP)

Polly.Mizuki

Female

Polly.Takumi

Male

Norwegian (nb-NO)

Polly.Liv

Female

Polish (pl-PL)

Polly.Jacek

Male

Polly.Jan

Male

Polly.Ewa

Female

Polly.Maja

Female

Portuguese (Brazilian) (pt-BR)

Polly.Ricardo

Male

Polly.Vitória/Polly.Vitoria

Female

Polly.Camila-Neural

Female

Portuguese (European) (pt-PT)

Polly.Cristiano

Male

Polly.Inês/Polly.Ines

Female

Romanian (ro-RO)

Polly.Carmen

Female

Russian (ru-RU)

Polly.Maxim

Male

Polly.Tatyana

Female

Spanish (Castilian) (es-ES)

Polly.Conchita

Female

Polly.Enrique

Male

Spanish (Latin American) (es-US)

Polly.Miguel

Male

Polly.Penélope/Polly.Penelope

Female

Polly.Lupe-Neural

Female

Swedish (sv-SE)

Polly.Astrid

Female

Turkish (tr-TR)

Polly.Filiz

Female

Welsh (cy-GB)

Polly.Gwyneth

Female

SSML

Speech Synthesis Markup Language (SSML) is a W3C specification that allows developers to use XML-based markup language for assisting the generation of synthesized speech. Twilio has partnered with Amazon Polly so that you can use <Say> to control the synthesized speech.

As per the SSML spec, the root element for SSML starts with <speak>; however, when you are using SSML with <Say>, you can skip <speak> and insert the rest of the SSML inside <Say>. For example,

<Response>
   <Say voice="Polly.Joanna"><prosody rate="fast">

     Speech Synthesis Markup Language (SSML) is a W3C specification that allows developers to use XML-based markup language for assisting the generation of synthesized speech.

   </prosody></Say>
</Response>

Below are a few SSML tags and examples of how you can use them with <Say>.

<prosody>

You can use <prosody> to control the volume, rate, and pitch of synthesized speech.

<Response>
   <Say voice="Polly.Joanna">Prosody can be used to change the way words sound. The following words are
 <prosody volume="x-loud"> quite a bit louder than the rest of this passage.
 </prosody> Each morning when I wake up, <prosody rate="x-slow">I speak slowly and deliberately until I have my coffee.</prosody> I can also change the pitch of my voice using prosody. Do you like <prosody pitch="+5%"> speech with a pitch higher,</prosody> or <prosody pitch="-10%"> is a lower pitch preferable?</prosody></Say>
</Response>

<say-as>

The say-as element allows you to indicate information on the type of text contained within the element and to specify the level of detail for rendering the contained text.

For example, if you are trying to repeat a phone number and do not use <say-as>, the following <Say> will play “John’s phone number is ... four billion one hundred fifty five million five hundred fifty one thousand two hundred twelve”.

<Response>
   <Say>John’s phone number is, 4155551212</Say>
</Response>

To synthesize the text so that the phone number is read back correctly, you can rewrite the <Say> as follows, so that you hear “John’s phone number is ... four one five ... five five five ... one two one two”.

<Response>
   <Say voice="Polly.Joanna">John’s phone number is, <say-as interpret-as="telephone">4155551212</say-as></Say>
</Response>

Generating SSML via Helper Libraries

You can generate TwiML with SSML within the <Say> verb using one of our helper libraries for C#, Java, Node.js, PHP, Python, or Ruby.

        
        
        
        

        Amazon Polly SSML Support

        While the W3C specification covers many capabilities, Amazon Polly currently only supports the following SSML. Click on individual actions to learn more.

        Pricing

        Amazon Polly Neural Pricing

        Amazon Polly Neural price starts at $0.0032/100 neural characters with the following volume discounts.

        Characters Min

        Characters Max

        *Price per 100 Characters

        0

        5,000,000

        $0.0032

        5,000,001

        50,000,000

        $0.0029

        50,000,001

        100,000,000

        $0.0027

        100,000,001

        $0.0025

        Amazon Polly Pricing

        Amazon Polly price starts at $0.0008/100 characters with the following volume discounts.

        Characters Min

        Characters Max

        *Price per 100 Characters

        0

        5,000,000

        $0.00080

        5,000,001

        50,000,000

        $0.00072

        50,000,001

        100,000,000

        $0.00068

        100,000,001

        $0.00064

        * Usage is rounded towards the end of call and priced in blocks of 100 characters. For example, if 546 characters are used on a call, then you’re charged $0.004 for the use of Polly Voices on that call.

        Commit to a monthly volume and receive a significant discount beyond standard volume discounts. Contact our sales team to learn more.

        Limits

        The following limits apply when using Amazon Polly Voices.

        1. There is a 3,000 character limit on text that <Say> can process with Polly Voices.
        2. Amazon-specific SSML tags such as <amazon:auto-breath> are not currently supported.
        3. Lexicons are not supported.
        4. Neural voices are only available in specific locales.
        Rate this page:

        Need some help?

        We all do sometimes; code is hard. Get help now from our support team, or lean on the wisdom of the crowd browsing the Twilio tag on Stack Overflow.

              
              
              

              Thank you for your feedback!

              We are always striving to improve our documentation quality, and your feedback is valuable to us. How could this documentation serve you better?

              Sending your feedback...
              🎉 Thank you for your feedback!
              Something went wrong. Please try again.

              Thanks for your feedback!

              Refer us and get $10 in 3 simple steps!

              Step 1

              Get link

              Get a free personal referral link here

              Step 2

              Give $10

              Your user signs up and upgrade using link

              Step 3

              Get $10

              1,250 free SMSes
              OR 1,000 free voice mins
              OR 12,000 chats
              OR more