The <Say> verb converts text to speech that is read back to the caller. <Say> is useful for development or saying dynamic text that is difficult to pre-record.
The <Say> verb supports the following attributes that modify its behavior:
| Attribute Name | Allowed Values | Default Value |
|---|---|---|
| voice | man, woman | man |
| language | en, en-gb, es, fr, de, it | en |
| loop | integer >= 0 | 1 |
Use one or more of these attributes in a <Say> verb like so:
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Say voice="woman" language="fr">Chapeau!</Say>
</Response>
The 'voice' attribute allows you to choose a male or female voice to read text back. The default value is 'man'.
The 'language' attribute allows you pick a voice with a specific language's accent and pronunciations. Twilio currently supports English with an American accent (en), English with a British accent (en-gb), Spanish (es), French (fr), Italian (it), and German (de). The default is English with an American accent (en).
The 'loop' attribute specifies how many times you'd like the text repeated. The
default is once. Specifying '0' will cause the <Say> verb to loop until the
call is hung up.
The "noun" of a TwiML verb is the stuff nested within the verb that's not
a verb itself; it's the stuff the verb acts upon. These are the nouns for
<Say>:
| Noun | Description |
|---|---|
| plain text | The text Twilio will read to the caller. Limited to 4KB (4,000 ASCII characters) |
You can't nest any verbs within <Say>. But you can nest <Say> within the following verbs:
When a call is directed to the following TwiML document, the caller will hear "hello world" spoken once in a male voice.
<?xml version="1.0" encoding="UTF-8" ?>
<Response>
<Say>Hello World</Say>
</Response>
This TwiML document will cause a female voice with a British accent to say "Hello" to the caller, twice.
<?xml version="1.0" encoding="UTF-8" ?>
<Response>
<Say voice="woman" language="en-gb" loop="2">Hello</Say>
</Response>
There is a 4KB limit on the text that <Say> can process.
When translating text to speech, the <Say> verb will
make assumptions about how to pronounce numbers,
dates, times, amounts of money and other abbreviations.
Test these situations well.
When saying numbers, '12345' will be spoken as "twelve thousand three hundred forty five." Whereas '1 2 3 4 5' will be spoken as "one two three four five."
Punctuation such as commas and periods will be interpreted as natural pauses by the speech engine.
<Say> is useful for saying dynamic text that would be difficult to pre-record. In cases where
the contents of <Say> are static, you might consider recording a live person saying the phrase
and using the <Play> verb instead.
If you want to insert a long pause try using the <Pause>
verb. <Pause> should be placed outside <Say> tags, not nested inside them.