Menu

Rate this page:

Thanks for rating this page!

We are always striving to improve our documentation quality, and your feedback is valuable to us. How could this documentation serve you better?

Text-to-Speech

Text To Speech (TTS) - also known as Speech Synthesis - is a process where text is converted into a human-sounding voice. TTS has been a popular choice for developers and business users alike when building IVR (Interactive Voice Response) solutions and other voice applications, as it accelerates time to production without having to record audio files with human voice. Instead of recorded files, where changing a message requires re-recording with a human voice, TTS prompts can be dynamically generated from raw text.

Getting Started

Twilio <Say> verb makes it easy to synthesize speech. You provide the text, and Twilio will synthesize speech in real time and playback the audio in any call. For example, the following TwiML plays back Hello World. By default, the text will be played in US English dialect using Twilio’s default Male voice.

<Response>
<Say>Hello World!</Say>
</Response>

When using <Say> you have choice between using Man, Woman, Alice or Amazon Polly Voices. To use one of these voices, either configure the Text to Speech settings in the Twilio Console, or provide Voice attribute on <Say>.

Text to Speech Console

The TTS console page makes it easy to test different voices and set the default TTS voice and locale for your account. To get started navigate to https://www.twilio.com/console/voice/twiml/text-to-speech

All Twilio accounts are defaulted to Basic Provider on account creation and to change from Basic to Amazon Polly as provider, navigate to the console page and select Amazon Polly as the provider.

Change Default TTS Provider

Default TTS Provider to Polly

Once the default provider is changed to Amazon Polly, you will notice the default Voice & locale are changed to Salli, en-US respectively. After this change when the following TwiML is used Twilio synthesizes the text using Salli voice,

<Response>
<Say>Hello I am Salli!</Say>
</Response>

In the past developers are forced to use attributes on <Say> to synthesize text using different voice or locale. While this still option is still available, the TTS console makes it very easy to select voice & local for your account so that a code change is not required. To change the default voice, simply click edit link next to Default Voices and select appropriate default locale & Voice for your account. For example, to change Default locale to French simply select French under Locale and press Save.

Change Default TTS Voice to French

In addition you can also change the voices assigned by default to each locale by Twilio from the console page. For example by default voice for en-GB is set to Amy and to change the voice to Emma, simply click on English (British) (en-GB) under the Locale Mapping table and select Emma under Voice drop-down.

Changing TTS British Voice

Test TTS British Voice

Once these changes are made simply return the following TwiML to hear Text in Emma’s voice,

<Response>
<Say language="en-GB">Hello I am Emma!!</Say>
</Response>

As a developer you can always override the default voices and locale on your account by providing the attributes on <Say> verb. For example, if Default Voice, locale on your account is set to Salli, en-US, and you’d like to Joanna for on a specific call you can simply provide voice attribute,

<Response>
<Say voice="Polly.Joanna">Hello I am Joanna!</Say>
</Response>

You can learn more about these attributes in <Say> API Docs page.

Amazon Polly

Amazon Polly is one of the leading providers for life like text to speech that offers voices across many languages, locales and comes with support for SSML that allows developers to control many aspects of the synthesized speech.

Voices

The following table contains the list of Polly voices that can be used with voice attribute on <Say>.

Polly Voice

Gender

Danish (da-DK)

Polly.Mads

Male

Polly.Naja

Female

Dutch (nl-NL)

Polly.Lotte

Female

Polly.Ruben

Male

English (Australian) (en-AU)

Polly.Nicole

Female

Polly.Russell

Male

English (British) (en-GB)

Polly.Amy

Female

Polly.Brian

Male

Polly.Emma

Female

English (Indian) (en-IN)

Polly.Raveena

Female

English (US) (en-US)

Polly.Ivy

Female

Polly.Joanna

Female

Polly.Joey

Male

Polly.Justin

Male

Polly.Kendra

Female

Polly.Kimberly

Female

Polly.Matthew

Male

Polly.Salli

Female

English (Welsh) (en-GB-WLS)

Polly.Geraint

Male

French (fr-FR)

Polly.Céline/Polly.Celine

Female

Polly.Mathieu

Male

French (Canadian) (fr-CA)

Polly.Chantal

Female

German (de-DE)

Polly.Hans

Male

Polly.Marlene

Female

Polly.Vicki

Female

Icelandic (is-IS)

Polly.Dóra/Polly.Dora

Female

Polly.Karl

Male

Italian (it-IT)

Polly.Carla

Female

Polly.Giorgio

Male

Japanese (ja-JP)

Polly.Mizuki

Female

Polly.Takumi

Male

Norwegian (nb-NO)

Polly.Liv

Female

Polish (pl-PL)

Polly.Jacek

Male

Polly.Jan

Male

Polly.Ewa

Female

Polly.Maja

Female

Portuguese (Brazilian) (pt-BR)

Polly.Ricardo

Male

Polly.Vitória/Polly.Vitoria

Female

Portuguese (European) (pt-PT)

Polly.Cristiano

Male

Polly.Inês/Polly.Ines

Female

Romanian (ro-RO)

Polly.Carmen

Female

Russian (ru-RU)

Polly.Maxim

Male

Polly.Tatyana

Female

Spanish (Castilian) (es-ES)

Polly.Conchita

Female

Polly.Enrique

Male

Spanish (Latin American) (es-US)

Polly.Miguel

Male

Polly.Penélope/Polly.Penelope

Female

Swedish (sv-SE)

Polly.Astrid

Female

Turkish (tr-TR)

Polly.Filiz

Female

Welsh (cy-GB)

Polly.Gwyneth

Female

SSML

Speech Synthesis Markup Language (SSML) is a W3C specification that allows developers to use XML-based markup language for assisting the generation of synthesized speech. We are excited to bring these capabilities to you via partnership with Amazon Polly so that you can easily use <Say> to control the synthesized speech.

As per the SSML spec, the root element for SSML starts with <speak>, however when you’re using SSML with <Say> you can skip <speak> and insert rest of the SSML inside <Say>. For example,

<Response>
<Say><prosody rate="fast">

Speech Synthesis Markup Language (SSML) is a W3C specification that allows developers to use XML-based markup language for assisting the generation of synthesized speech .

</prosody></Say>
</Response>

Let’s take a quick look at a few SSML tags and how you can use them with <Say>.

<prosody>

You can use <prosody> to control the volume, rate, pitch of synthesized speech.

<Response>
<Say>Prosody can be used to change the way words sound. The following words are
<prosody volume="x-loud"> quite a bit louder than the rest of this passage.
</prosody> Each morning when I wake up, <prosody rate="x-slow">I speak slowly and deliberately until I have my coffee.</prosody> I can also change the pitch of my voice using prosody. Do you like <prosody pitch="+5%"> speech with a pitch higher,</prosody> or <prosody pitch="-10%"> is a lower pitch preferable?</Say>
</Response>

<say-as>

As per the W3C spec, the say-as element allows you to indicate information on the type of text construct contained within the element and to help specify the level of detail for rendering the contained text.

For example, if you are trying to repeat a phone number without <say-as> with the following <Say> you will hear, “John’s phone number is ... four billion one hundred fifty five million five hundred fifty one thousand two hundred twelve”.

<Response>
<Say>John’s phone number is, 4155551212</Say>
</Response>

To synthesize the text so that the phone number is read back correctly, you’d rewrite the <Say> as follows so that you hear, “John’s phone number is ... four one five ... five five five ... one two one two”.

<Response>
<Say>John’s phone number is, <say-as interpret-as="telephone">4155551212</say-as></Say>
</Response>

Generating SSML via Helper Libraries

You can generate TwiML with SSML within the <Say> verb using one of our helper libraries for C#, Java, Node.js, PHP, Python, or Ruby. (PHP and Ruby coming soon.)

Loading Code Sample...
      
      
          
          
          
          
        

      Pricing

      Amazon Polly price starts at $0.0008/100 characters with the following volume discounts.

      Characters Min

      Characters Max

      *Price per 100 Characters

      0

      5,000,000

      $0.00080

      5,000,001

      50,000,000

      $0.00072

      50,000,001

      100,000,000

      $0.00068

      100,000,001

      $0.00064

      * Usage is rounded towards the end of call and priced in blocks of 100 characters. For example, if 546 characters are used on a call, then you’re charged $0.004 for the use of Polly Voices on that call.

      Commit to a monthly volume and receive a significant discount beyond standard volume discounts. Contact our sales team to learn more.

      Limits

      The following limits apply when using Amazon Polly Voices.

      1. There is a 3,000 character limit on text that <Say> can process with Polly Voices.
      2. Amazon specific SSML tags are not currently supported. For example, <amazon:auto-breath>
      3. Lexicons are not supported.

      Need some help?

      We all do sometimes; code is hard. Get help now from our support team, or lean on the wisdom of the crowd browsing the Twilio tag on Stack Overflow.

      Loading Code Sample...