TwiML™ Voice: <Say>
The <Say>
verb converts text to speech that is read back to the caller. <Say>
is useful for development or saying dynamic text that is difficult to pre-record. The verb offers different options for voices, each with its own supported set of languages and genders, so you can configure your TwiML depending on your preferred gender and language combination.
Verb Attributes
The <Say>
verb supports different attributes, depending on the voice
value you set.
voice
The <Say>
verb allows you to specify the voice to use for the text. The voices man
and woman
support English, Spanish, French, German, and Italian languages. The voice alice
speaks even more languages with support for several different locales in a female voice. You can also use one of over 50 Amazon Polly voices. Please visit the Text-to-Speech docs page to learn more.
Attribute Name | Allowed Values | Default Value |
---|---|---|
voice | man, woman, alice, or any of the Amazon Polly voices | man (for Basic Provider); Salli (for Amazon Polly Provider). See the text-to-speech console to configure. |
loop | integer >= 0 | 1 |
language | see below | see below |
voice = man or woman
When you set voice to man
or woman
you may use the following values for the language attribute:
Attribute Name | Allowed Values | Default Value |
---|---|---|
language | en, en-gb, es, fr, de | en |
Use one or more of these attributes in a <Say>
verb like so:
voice = alice
When you set the voice to alice
, you may use the following values for the language attribute:
Attribute Name | Default Value |
---|---|
language | en-US |
Allowed Values | Language, locale |
da-DK | Danish, Denmark |
de-DE | German, Germany |
en-AU | English, Australia |
en-CA | English, Canada |
en-GB | English, UK |
en-IN | English, India |
en-US | English, United States |
ca-ES | Catalan, Spain |
es-ES | Spanish, Spain |
es-MX | Spanish, Mexico |
fi-FI | Finnish, Finland |
fr-CA | French, Canada |
fr-FR | French, France |
it-IT | Italian, Italy |
ja-JP | Japanese, Japan |
ko-KR | Korean, Korea |
nb-NO | Norwegian, Norway |
nl-NL | Dutch, Netherlands |
pl-PL | Polish-Poland |
pt-BR | Portuguese, Brazil |
pt-PT | Portuguese, Portugal |
ru-RU | Russian, Russia |
sv-SE | Swedish, Sweden |
zh-CN | Chinese (Mandarin) |
zh-HK | Chinese (Cantonese) |
zh-TW | Chinese (Taiwanese Mandarin) |
Use one or more of these attributes in a <Say>
verb like so:
language
The language
attribute allows you to specify a language and locale, with
the affiliated accent and pronunciations. Twilio supports separate languages
depending on the voice you choose.
man
or woman
voices work with the following locales: English with an American accent (en
),
English with a British accent (en-gb
), Spanish (es
), French (fr
), Italian (it
), and German (de
).
The default is English with an American accent (en
).
The alice
voice speaks 26 total dialects (18 languages and 14 locales). See the table
above for a description of all the languages and locales
supported by Alice.
Note: if you specify a language and locale that only Alice speaks, but you don't specify a voice, you will get Alice by default. For example, the following TwiML will default to Alice:
loop
The loop
attribute specifies how many times you'd like the text repeated. The
default is once. Specifying 0 will cause the <Say>
verb to loop until either the
call is hung up or 1,000 iterations are performed.
Nouns
The noun
of a TwiML verb is the content nested within the verb;
it's what the verb acts upon. The only noun for the <Say>
verb is
the plaintext that will be played.
Noun | Description |
---|---|
plain text | The text Twilio reads to the caller. Basic TTS Voices are limited to 4,096 UTF-8 single byte characters. Polly Voices are limited to 3,000 UTF-8 single byte characters, not including SSML tags. |
Nesting Rules
You can't nest any verbs within <Say>
. However, you can nest <Say>
within the <Gather>
verb.
Examples
Example 1: Hello World
When a call is directed to the following TwiML document, the caller hears "hello world" spoken once in a male voice.
Example 2: Hello, Hello
This TwiML document says "Hello" twice in Brazilian Portuguese:
Hints and Advanced Uses
- There is a 4,096 Unicode character limit on the text that
<Say>
can process. - When translating text to speech, the
<Say>
verb will make assumptions about how to pronounce numbers, dates, times, amounts of money and other abbreviations. Test these situations well. - When saying numbers, "12345" will be spoken as "twelve thousand three hundred forty-five", whereas "1 2 3 4 5" will be spoken as "one two three four five."
- Punctuation such as commas and periods will be interpreted as natural pauses.
<Say>
is useful for saying dynamic text that would be difficult to pre-record. In cases where the contents of<Say>
are static, you might consider recording a live person saying the phrase and using the<Play>
verb instead.- If you want to insert a long pause, try using the
<Pause>
verb.<Pause>
should be placed outside<Say>
tags, not nested inside them.
Need some help?
We all do sometimes; code is hard. Get help now from our support team, or lean on the wisdom of the crowd by visiting Twilio's Stack Overflow Collective or browsing the Twilio tag on Stack Overflow.