Get Started

TwiMLTM Voice: <Say>

The <Say> verb converts text to speech that is read back to the caller. <Say> is useful for development or saying dynamic text that is difficult to pre-record. The current verb offers different options for voices, each with its own supported set of languages and genders, so configure your TwiML depending on preferred gender and language combination.

Verb Attributes

The <Say> verb supports different attributes, depending on the 'voice' value you set.

voice

The <Say> verb allows two separate voice engines. The first with the voices man and woman supports the English, Spanish, French, German, and Italian languages in both genders. The second, alice, speaks even more languages with support for several different locales in a female voice.

Attribute Name Allowed Values Default Value
voice man, woman, alice man (for limited languages);
alice (for additional languages/locales)
loop integer >= 0 1
language see below see below
voice = man or woman

When you set voice to man or woman you may use the following values for the language attribute:

Attribute Name Allowed Values Default Value
language en, en-gb, es, fr, de, it en

Use one or more of these attributes in a <Say> verb like so:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Say voice="woman" language="fr">Chapeau!</Say>
</Response>
voice = alice

When you set voice to alice you may use the following values for the language attribute:

Attribute Name Allowed Values Language, locale Default Value
language da-DK
de-DE
en-AU
en-CA
en-GB
en-IN
en-US
ca-ES
es-ES
es-MX
fi-FI
fr-CA
fr-FR
it-IT
ja-JP
ko-KR
nb-NO
nl-NL
pl-PL
pt-BR
pt-PT
ru-RU
sv-SE
zh-CN
zh-HK
zh-TW
Danish, Denmark
German, Germany
English, Australia
English, Canada
English, UK
English, India
English, United States
Catalan, Spain
Spanish, Spain
Spanish, Mexico
Finnish, Finland
French, Canada
French, France
Italian, Italy
Japanese, Japan
Korean, Korea
Norwegian, Norway
Dutch, Netherlands
Polish-Poland
Portuguese, Brazil
Portuguese, Portugal
Russian, Russia
Swedish, Sweden
Chinese (Mandarin)
Chinese (Cantonese)
Chinese (Taiwanese Mandarin)
en-US

Use one or more of these attributes in a <Say> verb like so:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Say voice="alice" language="fr-FR">Chapeau!</Say>
</Response>

language

The 'language' attribute allows you to specify a language and locale -- with the affiliated accent and pronunciations. Twilio supports separate languages depending on the voice you choose. For voice set to man or woman, select English with an American accent (en), English with a British accent (en-gb), Spanish (es), French (fr), Italian (it), and German (de). The default is English with an American accent (en).

Alice, however, speaks many more languages. For voice set to alice, you have access to 26 total dialects (18 languages and 14 locales). See the table above for a description of all the languages and locales supported by Alice.

Note: if you specify a language and locale that only Alice speaks, but you don't specify a voice, you will get Alice by default. For example, the following TwiML will default to Alice:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Say language="sv-SE">Hej!</Say>
</Response>

loop

The 'loop' attribute specifies how many times you'd like the text repeated. The default is once. Specifying '0' will cause the <Say> verb to loop until the call is hung up.

Nouns

The "noun" of a TwiML verb is the stuff nested within the verb that's not a verb itself; it's what the verb acts upon. These are the nouns for <Say>:

Noun Description
plain text The text Twilio reads to the caller. Limited to 4,096 Unicode characters.

Nesting Rules

You can't nest any verbs within <Say>. But you can nest <Say> within the following verbs:

Examples

Example 1: Hello World

When a call is directed to the following TwiML document, the caller hears "hello world" spoken once in a male voice.

<?xml version="1.0" encoding="UTF-8" ?>
<Response>
     <Say>Hello World</Say>
</Response>

Example 2: Hello, Hello

This TwiML document says "Hello" twice in Brazilian Portuguese:

<?xml version="1.0" encoding="UTF-8" ?>
<Response>
     <Say voice="alice" language="pt-BR" loop="2">Bom dia.</Say>
</Response>

Hints and Advanced Uses

  • There is a 4,096 Unicode character limit on the text that <Say> can process.

  • When translating text to speech, the <Say> verb will make assumptions about how to pronounce numbers, dates, times, amounts of money and other abbreviations. Test these situations well.

  • When saying numbers, '12345' will be spoken as "twelve thousand three hundred forty-five." Whereas '1 2 3 4 5' will be spoken as "one two three four five."

  • Punctuation such as commas and periods will be interpreted as natural pauses by the speech engine.

  • <Say> is useful for saying dynamic text that would be difficult to pre-record. In cases where the contents of <Say> are static, you might consider recording a live person saying the phrase and using the <Play> verb instead.

  • If you want to insert a long pause, try using the <Pause> verb. <Pause> should be placed outside <Say> tags, not nested inside them.