Text-to-Speech (TTS)
Text To Speech (TTS), also known as Speech Synthesis, is a process where text is converted into a human-sounding voice. TTS has been a popular choice for developers and business users alike when building IVR (Interactive Voice Response) solutions and other voice applications, as it accelerates time to production without having to record audio files with human voices. Using recorded files requires recording each message with a human voice, whereas TTS prompts can be dynamically generated from raw text.
Get Started with Text-to-Speech
Using the <Say> verb, you can provide text and Twilio will synthesize speech in real time and play back the audio in any call. For example, the following TwiML plays Hello World.
<Response>
<Say>Hello World!</Say>
</Response>
When using <Say>
, you can choose between using Man, Woman, Alice, or Amazon Polly voices. To use one of these voices, either configure the Text to Speech settings in the Twilio Console, or provide the Voice attribute on <Say>
. You can view and change the default voice in the Twilio Console.
Text-to-Speech Console Page
The Text-to-Speech page in the Twilio Console allows you to configure your account's Text-to-Speech (TTS) voice and locale.
All Twilio accounts use the Amazon Polly Provider by default.
In the Console, you can also change the default voice for a specific locale. For example, the default voice for en-GB
is Amy
. To change the voice to Emma
, click on English (British) (en-GB) under the Locale Mapping table and select Emma under the Voice drop-down.
Once you've made these changes, the following TwiML will use Emma
:
<Response>
<Say language="en-GB">Hello I am Emma!!</Say>
</Response>
As a developer, you can override the default voice and locale on your account by providing attributes on the <Say>
verb. For example, if the default voice for your account is Amazon Polly Salli
and you’d like to use Amazon Polly Joanna
for a specific call, you can provide Polly.Joanna
that as the voice
attribute:
<Response>
<Say voice="Polly.Joanna">Hello I am Joanna!</Say>
</Response>
You can also use an Amazon Polly Neural voice for languages that have Neural voices available.
<Response>
<Say voice="Polly.Joanna-Neural">Hello I am Joanna!</Say>
</Response>
You can learn more about these attributes in the <Say>
API Docs page.
Amazon Polly Text-to-Speech
Amazon Polly is one of the leading providers for life-like text to speech, including Neural voices, and offers voices across many languages and locales. It comes with support for SSML, which allows developers to control many aspects of the synthesized speech. The Amazon Polly Neural TTS (NTTS) system can produce even higher quality voices than its standard voices. The NTTS system produces the most natural and human-like text-to-speech voices possible.
Polly Standard and Neural Voices
The following table contains the list of Polly
and Neural
voices that can be used with the voice
attribute on <Say>.
Polly Voice |
Gender |
Arabic (arb) |
|
Polly.Zeina |
Female |
Arabic (Gulf) (ar-AE) |
|
Polly.Hala-Neural |
Female |
Catalan (ca-ES) |
|
Polly.Arlet-Neural |
Female |
Chinese (Cantonese) (yue-CN) |
|
Polly.Hiujin-Neural |
Female |
Chinese (Mandarin) (cmn-CN) |
|
Polly.Zhiyu |
Female |
Polly.Zhiyu-Neural |
Female |
Danish (da-DK) |
|
Polly.Mads |
Male |
Polly.Naja |
Female |
Dutch (nl-NL) |
|
Polly.Lotte |
Female |
Polly.Ruben |
Male |
Polly.Laura-Neural |
Female |
English (Australian) (en-AU) |
|
Polly.Nicole |
Female |
Polly.Russell |
Male |
Polly.Olivia-Neural |
Female |
English (British) (en-GB) |
|
Polly.Amy |
Female |
Polly.Brian |
Male |
Polly.Emma |
Female |
Polly.Amy-Neural |
Female |
Polly.Emma-Neural |
Female |
Polly.Brian-Neural |
Male |
Polly.Arthur-Neural |
Male |
English (Indian) (en-IN) |
|
Polly.Aditi |
Female |
Polly.Raveena |
Female |
Polly.Kajal-Neural |
Female |
English (New Zealand) (en-NZ) |
|
Polly.Aria-Neural |
Female |
English (US) (en-US) |
|
Polly.Ivy |
Female |
Polly.Joanna |
Female |
Polly.Joey |
Male |
Polly.Justin |
Male |
Polly.Kendra |
Female |
Polly.Kimberly |
Female |
Polly.Matthew |
Male |
Polly.Salli |
Female |
Polly.Ivy-Neural |
Female |
Polly.Joanna-Neural* |
Female |
Polly.Kendra-Neural |
Female |
Polly.Kevin-Neural |
Male (child) |
Polly.Kimberly-Neural |
Female |
Polly.Salli-Neural |
Female |
Polly.Joey-Neural |
Male |
Polly.Justin-Neural |
Male |
Polly.Matthew-Neural* |
Male |
English (South African) (en-ZA) |
|
Polly.Ayanda-Neural |
Female |
English (Welsh) (en-GB-WLS) |
|
Polly.Geraint |
Male |
Finnish (fi-FI) |
|
Polly.Suvi-Neural |
Female |
French (fr-FR) |
|
Polly.Céline/Polly.Celine |
Female |
Polly.Léa/Polly.Lea |
Female |
Polly.Mathieu |
Male |
Polly.Lea-Neural |
Female |
French (Canadian) (fr-CA) |
|
Polly.Chantal |
Female |
Polly.Gabrielle-Neural |
Female |
Polly.Liam-Neural |
Male |
German (de-DE) |
|
Polly.Hans |
Male |
Polly.Marlene |
Female |
Polly.Vicki |
Female |
Polly.Vicki-Neural |
Female |
Polly.Daniel-Neural |
Male |
German (Austrian) (de-AT) |
|
Polly.Hannah-Neural |
Female |
Hindi (hi-IN) |
|
Polly.Aditi |
Female |
Polly.Kajal-Neural |
Female |
Icelandic (is-IS) |
|
Polly.Dóra/Polly.Dora |
Female |
Polly.Karl |
Male |
Italian (it-IT) |
|
Polly.Bianca |
Female |
Polly.Carla |
Female |
Polly.Giorgio |
Male |
Polly.Bianca-Neural |
Female |
Japanese (ja-JP) |
|
Polly.Mizuki |
Female |
Polly.Takumi |
Male |
Polly.Takumi-Neural |
Male |
Korean (ko-KR) |
|
Polly.Seoyeon |
Female |
Polly.Seoyeon-Neural |
Female |
Norwegian (nb-NO) |
|
Polly.Liv |
Female |
Polly.Ida-Neural |
Female |
Polish (pl-PL) |
|
Polly.Jacek |
Male |
Polly.Jan |
Male |
Polly.Ewa |
Female |
Polly.Maja |
Female |
Polly.Ola-Neural |
Female |
Portuguese (Brazilian) (pt-BR) |
|
Polly.Camila |
Female |
Polly.Ricardo |
Male |
Polly.Vitória/Polly.Vitoria |
Female |
Polly.Camila-Neural |
Female |
Polly.Vitoria-Neural |
Female |
Portuguese (European) (pt-PT) |
|
Polly.Cristiano |
Male |
Polly.Inês/Polly.Ines |
Female |
Polly.Ines-Neural |
Female |
Romanian (ro-RO) |
|
Polly.Carmen |
Female |
Russian (ru-RU) |
|
Polly.Maxim |
Male |
Polly.Tatyana |
Female |
Spanish (Castilian) (es-ES) |
|
Polly.Conchita |
Female |
Polly.Enrique |
Male |
Polly.Lucia |
Female |
Polly.Lucia-Neural |
Female |
Spanish (Mexican) (es-MX) |
|
Polly.Mia |
Female |
Polly.Mia-Neural |
Female |
US Spanish (es-US) |
|
Polly.Lupe |
Female |
Polly.Miguel |
Male |
Polly.Penélope/Polly.Penelope |
Female |
Polly.Lupe-Neural |
Female |
Polly.Pedro-Neural |
Male |
Swedish (sv-SE) |
|
Polly.Astrid |
Female |
Polly.Elin-Neural |
Female |
Turkish (tr-TR) |
|
Polly.Filiz |
Female |
Welsh (cy-GB) |
|
Polly.Gwyneth |
Female |
SSML with Amazon Polly
Speech Synthesis Markup Language (SSML) is a W3C specification that allows developers to use XML-based markup language for assisting the generation of synthesized speech. Twilio has partnered with Amazon Polly so that you can use <Say>
to control the synthesized speech.
As per the SSML spec, the root element for SSML starts with <speak>
; however, when you are using SSML with <Say>
, you can skip <speak>
and insert the rest of the SSML inside <Say>
. For example,
<Response>
<Say voice="Polly.Joanna">
<prosody rate="fast">
Speech Synthesis Markup Language (SSML) is a W3C specification that allows developers to use XML-based markup language for assisting the generation of synthesized speech.
</prosody>
</Say>
</Response>
Below are a few SSML tags and examples of how you can use them with <Say>
.
<prosody>
controls the volume, rate, and pitch of synthesized speech.
<Response>
<Say voice="Polly.Joanna">Prosody can be used to change the way words sound. The following words are
<prosody volume="x-loud"> quite a bit louder than the rest of this passage.
</prosody> Each morning when I wake up, <prosody rate="x-slow">I speak slowly and deliberately until I have my coffee.</prosody> I can also change the pitch of my voice using prosody. Do you like <prosody pitch="+5%"> speech with a pitch higher,</prosody> or <prosody pitch="-10%"> is a lower pitch preferable?</prosody></Say>
</Response>
Neural voices support the volume
and rate
attributes, but don’t support the pitch
attribute. Learn more about using SSML with Amazon Polly in the Amazon Polly Developer Guide.
<say-as>
allows you to indicate information about the type of text contained within the element and to specify the level of detail for rendering the contained text.
For example, if you are trying to repeat a phone number and do not use <say-as>
, the following <Say>
will play “John’s phone number is ... four billion one hundred fifty five million five hundred fifty one thousand two hundred twelve”.
<Response>
<Say>John’s phone number is, 4155551212</Say>
</Response>
To synthesize the text so that the phone number is read back correctly, you can rewrite the <Say>
as follows, so that you hear “John’s phone number is ... four one five ... five five five ... one two one two”.
<Response>
<Say voice="Polly.Joanna">John’s phone number is, <say-as interpret-as="telephone">4155551212</say-as></Say>
</Response>
Generating SSML via Helper Libraries
You can generate TwiML with SSML within the <Say>
verb using one of our helper libraries for C#, Java, Node.js, PHP, Python, or Ruby.
Using SSML in Studio Flows
Studio supports embedding SSML directly in the Text to Say field of Say/Play and Gather Input on Call widgets.
Amazon Polly SSML Support
While the W3C specification covers many capabilities, Amazon Polly currently only supports the following SSML. Click on individual actions to learn more.
Action |
SSML Tag |
<break> |
|
Emphasizing words |
<emphasis> |
<lang> |
|
<p> |
|
<phoneme> |
|
<prosody> |
|
<s> |
|
<say-as> |
|
<sub> |
|
<w> |
Limits with Amazon Polly Text-to-Speech
The following limits apply when using Amazon Polly Voices.
- There is a 3,000 character limit on text that
<Say>
can process with Polly Voices. - Amazon-specific SSML tags such as
<amazon:auto-breath>
are not currently supported. - Lexicons are not supported.
- Neural voices are only available in specific locales.
Pricing
Amazon Polly Neural Pricing
Amazon Polly Neural price starts at $0.0032/100 neural characters with the following volume discounts.
Characters Min |
Characters Max |
*Price per 100 Characters |
0 |
5,000,000 |
$0.0032 |
5,000,001 |
50,000,000 |
$0.0029 |
50,000,001 |
100,000,000 |
$0.0027 |
100,000,001 |
$0.0025 |
Amazon Polly Pricing
Amazon Polly price starts at $0.0008/100 characters with the following volume discounts.
Characters Min |
Characters Max |
*Price per 100 Characters |
0 |
5,000,000 |
$0.00080 |
5,000,001 |
50,000,000 |
$0.00072 |
50,000,001 |
100,000,000 |
$0.00068 |
100,000,001 |
$0.00064 |
* Usage is rounded towards the end of call and priced in blocks of 100 characters. For example, if 546 characters are used on a call, then you’re charged $0.004 for the use of Polly Voices on that call.
Commit to a monthly volume and receive a significant discount beyond standard volume discounts. Contact the Twilio sales team to learn more.
Need some help?
We all do sometimes; code is hard. Get help now from our support team, or lean on the wisdom of the crowd by visiting Twilio's Stack Overflow Collective or browsing the Twilio tag on Stack Overflow.