Skip to contentSkip to navigationSkip to topbar
Rate this page:
On this page

TwiML™ Voice: <Say>


The <Say> verb allows your application to programmatically speak dynamic text over a call or conference using Text To Speech (TTS) capabilities. <Say> offers different options for voices, each of which has its own supported set of languages, accents, and genders, so you can configure your TwiML according to your needs and preferences.

When Twilio executes <Say>, it synthesizes speech for the text between <Say>'s opening and closing tags.

The TwiML sample below causes Twilio to play audio of a synthesized voice saying "Hello!" on a call or conference.

<Say> using default values

say-using-default-values page anchor
Node.js
Python
C#
Java
PHP
Ruby

_10
const VoiceResponse = require('twilio').twiml.VoiceResponse;
_10
_10
const response = new VoiceResponse();
_10
response.say('Hello!');
_10
_10
console.log(response.toString());

Output

_10
<?xml version="1.0" encoding="UTF-8"?>
_10
<Response>
_10
<Say>Hello!</Say>
_10
</Response>


<Say> attributes

say-attributes page anchor

The <Say> verb supports the following attributes that modify its behavior:

Attribute: language

Accepted values:

Any supported language/locale combination, e.g. en-UK

Default value:

en-US (English with United States locale)

Attribute: loop

Accepted values:

Any positive integer or zero, e.g. 4

Default value:

1

Attribute: voice

Accepted values:

Default value:

man

Note: Using an invalid combination of voice and language may result in error and <Say> instruction failure. Please review the Text To Speech page to ensure correct configuration and use of accepted values for voice and language attributes.

language

language page anchor

<Say>'s language attribute allows you to specify the language and locale for the synthesized voice, e.g. en-US for English spoken in a United States accent.

Most of Twilio's supported language values are comprised of a lowercase language abbreviation and an uppercase locale abbreviation, e.g. fr-CA where fr is the language, French, and CA is the locale, Canada.

See the list of available voices on the Text to Speech page to find the list of available languages and locales. Use the value from the ID column as the value of your language attribute, e.g. language="en-UK".

Note: If you are using both the language and voice attributes, ensure that the voice value you use is available for the language. Invalid combinations of voice and language (i.e. those that aren't shown on the available voices and languages list) may result in error and <Say> instruction failure.

<Say> using language attribute

say-using-language-attribute page anchor
Node.js
Python
C#
Java
PHP
Ruby

_10
const VoiceResponse = require('twilio').twiml.VoiceResponse;
_10
_10
const response = new VoiceResponse();
_10
response.say({
_10
language: 'fr-FR'
_10
}, 'Bonjour!');
_10
_10
console.log(response.toString());

Output

_10
<?xml version="1.0" encoding="UTF-8"?>
_10
<Response>
_10
<Say language="fr-FR">Bonjour!</Say>
_10
</Response>

<Say>'s loop attribute specifies how many times you'd like the text to be repeated. The default is once (1).

Specifying 0 will cause the <Say> verb to loop until either the call is hung up or 1,000 iterations are performed.

<Say> using loop attribute

say-using-loop-attribute page anchor
Node.js
Python
C#
Java
PHP
Ruby

_10
const VoiceResponse = require('twilio').twiml.VoiceResponse;
_10
_10
const response = new VoiceResponse();
_10
response.say({
_10
loop: 2
_10
}, 'Hello!');
_10
_10
console.log(response.toString());

Output

_10
<?xml version="1.0" encoding="UTF-8"?>
_10
<Response>
_10
<Say loop="2">Hello!</Say>
_10
</Response>

(information)

Info

Effective June 26, 2023, Alice voices are no longer supported for Text-To-Speech and any request will be redirected to an alternate voice. It is recommended to update configuration in your Console, Studio Flows, and backend application to remove any references to alice voices. For more information, visit the Changelog(link takes you to an external page).

<Say>'s voice attribute allows you to specify the synthesized voice to use when speaking the text, e.g. man or Polly.Amy.

Twilio offers three levels of synthesized voices: Basic, Standard, and Premium.

  • Basic voices are offered at no cost. The possible voice attribute values for Basic voices are man or woman
  • Standard and Premium voices are provided by Amazon Polly and Google. Visit the Text To Speech page to learn more.

All possible voice values can be found in the "Available voices and languages" section of the Text To Speech page under the Voice name column.

The default value for voice depends on your Account-level Text To Speech settings, which can be found in your Console(link takes you to an external page) under Develop > Voice > Settings > Text-to-speech. Visit the Text To Speech page for more information.

If you configured default settings in your Console, you can use the voice attribute to override the default voice for a specific <Say> instruction.

Note: If you are using both the language and voice attributes, ensure that the voice value you use is available for the language. Invalid combinations of voice and language (i.e. those that aren't shown on the available voices and languages list) may result in error and <Say> instruction failure.

<Say> using voice attribute

say-using-voice-attribute page anchor
Node.js
Python
C#
Java
PHP
Ruby

_10
const VoiceResponse = require('twilio').twiml.VoiceResponse;
_10
_10
const response = new VoiceResponse();
_10
response.say({
_10
voice: 'Polly.Mathieu',
_10
language: 'fr-FR'
_10
}, 'Bonjour! Je m\'appelle Mathieu.');
_10
_10
console.log(response.toString());

Output

_10
<?xml version="1.0" encoding="UTF-8"?>
_10
<Response>
_10
<Say voice="Polly.Mathieu" language="fr-FR">Bonjour! Je m'appelle Mathieu.</Say>
_10
</Response>


  • Synthesized speech may pronounce numbers, dates, times, and amounts in an unnatural or incorrect way. Always verify that the generated speech sounds as you expect. You can also use SSML tags to adjust the pronunciation.
  • Numbers written without spaces are pronounced as a whole number, e.g. <Say>12345</Say> is spoken as "twelve thousand, three hundred forty-five".
  • Numbers separated by spaces are pronounced as individual numbers, e.g. <Say>1 2 3 4 5</Say> is spoken as "one two three four five."
  • Commas and periods in synthesized speech are interpreted as natural pauses.
  • If you want to insert a long pause, try using the <Pause> verb. <Pause> should be placed outside <Say> tags, not nested inside them.

There is a character limit on the text that <Say> can process, which varies depending on the Text To Speech option used. See the "Limits" section of the Text To Speech page for more information.


Rate this page: