Skip to contentSkip to navigationSkip to topbar
On this page

TwiML™ Voice: <Say>


(warning)

Legal notice

<Say> and Text-to-Speech (TTS), including the <Say> TwiML verb and API, uses artificial intelligence or machine learning technologies. By enabling or using any features or functionalities within Programmable Voice that Twilio identifies as using artificial intelligence or machine learning technology, you acknowledge and agree to certain terms. Your use of these features or functionalities is subject to the terms of the Predictive and Generative AI or ML Features Addendum(link takes you to an external page).

Availability of voices

Some features and voices, including third-party voices, in <Say> and Text-to-Speech vary in availability. Some may be available as alpha, beta, not generally available, limited release, or preview (collectively "Beta"). The information contained in this document is subject to change. Some features aren't implemented and others may change before the product becomes Generally Available. The Twilio Service Level Agreement(link takes you to an external page) doesn't cover Beta releases.

Use of third-party voices

Third-party voices may change without prior notice. Although Twilio provides access to these third-party voices, control and updates are managed by the third-party vendors. These changes include, but are not limited to, new models that affect how voices sound or the removal of voices from their offering with or without alternative or automatic redirections. For the most up to date technical information regarding such third-party voice functionality, please refer to the applicable third-party voice vendor product documentation.

The <Say> verb allows your application to programmatically speak dynamic text over a call or conference using Text To Speech (TTS) capabilities. <Say> offers different options for voices, each of which has its own supported set of languages, accents, and genders, so you can configure your TwiML according to your needs and preferences.

When Twilio executes <Say>, it synthesizes speech for the text between <Say>'s opening and closing tags.

Consider the following TwiML sample. It has Twilio play audio of a synthesized voice saying "Hello!" on a voice call.

<Say> using default valuesLink to code sample: <Say> using default values
1
const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3
const response = new VoiceResponse();
4
response.say('Hello!');
5
6
console.log(response.toString());

Output

1
<?xml version="1.0" encoding="UTF-8"?>
2
<Response>
3
<Say>Hello!</Say>
4
</Response>

<Say> attributes

say-attributes page anchor

The <Say> verb supports the following attributes that modify its behavior:

AttributeAccepted ValuesDefault Value
languageAny supported language/locale combination, e.g. en-UKen-US (English with United States locale)
loopAny positive integer or zero, e.g. 41
voice- man
- woman
- Any of the Twilio-supported Amazon Polly Voices, e.g. Polly.Amy
- Any of the Twilio-supported Google Voices, e.g. Google.en-GB-Standard-A
man
(warning)

Warning

Using an invalid combination of voice and language may result in error and <Say> instruction failure. Review the Text To Speech page to ensure correct configuration and use of accepted values for voice and language attributes.

language

language page anchor

<Say>'s language attribute allows you to specify the language and locale for the synthesized voice, e.g. en-US for English spoken in a United States accent.

Most of Twilio's supported language values are comprised of a lowercase language abbreviation and an uppercase locale abbreviation. As an example, the value fr-CA indicates fr the French language and CA indicates the locale, Canada.

To review available languages and locales, consult the list of available voices on the Text to Speech page. Use the value from the ID column as the value of your language attribute, like language="en-UK".

(information)

Info

If you use both the language and voice attributes, check that the voice value works with the language. Combinations of voice and language not shown as available may result in an error and <Say> instruction failure.

1
const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3
const response = new VoiceResponse();
4
response.say({
5
language: 'fr-FR'
6
}, 'Bonjour!');
7
8
console.log(response.toString());

Output

1
<?xml version="1.0" encoding="UTF-8"?>
2
<Response>
3
<Say language="fr-FR">Bonjour!</Say>
4
</Response>

<Say>'s loop attribute specifies how many times you'd like the text to be repeated. The default is once (1).

Specifying 0 will cause the <Say> verb to loop until either the call is hung up or 1,000 iterations are performed.

1
const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3
const response = new VoiceResponse();
4
response.say({
5
loop: 2
6
}, 'Hello!');
7
8
console.log(response.toString());

Output

1
<?xml version="1.0" encoding="UTF-8"?>
2
<Response>
3
<Say loop="2">Hello!</Say>
4
</Response>

<Say>'s voice attribute allows you to specify the synthesized voice to use when speaking the text, e.g. man or Polly.Amy.

Twilio offers three types of synthesized voices based on technology: Basic, Standard, and Neural.

  • Twilio provides Basic voices at no cost. These voices accept voice attribute values of man or woman.
  • Amazon Polly and Google provide Standard and Neural voices.

All possible voice values can be found in the "Available voices and languages" section of the Text To Speech page under the Voice name column.

The default value for voice depends on your Account-level Text-To-Speech settings. You find these settings in your Console(link takes you to an external page) under Develop > Voice > Settings > Text-to-speech.

To override the default settings in your Console, set the voice attribute for a specific <Say> instruction.

1
const VoiceResponse = require('twilio').twiml.VoiceResponse;
2
3
const response = new VoiceResponse();
4
response.say({
5
voice: 'Polly.Mathieu',
6
language: 'fr-FR'
7
}, 'Bonjour! Je m\'appelle Mathieu.');
8
9
console.log(response.toString());

Output

1
<?xml version="1.0" encoding="UTF-8"?>
2
<Response>
3
<Say voice="Polly.Mathieu" language="fr-FR">Bonjour! Je m'appelle Mathieu.</Say>
4
</Response>

  • Synthesized speech may pronounce numbers, dates, times, and amounts in an unnatural or incorrect way. Always verify that the generated speech sounds as you expect. You can also use SSML tags to adjust the pronunciation.
  • Numbers written without spaces are pronounced as a whole number, e.g. <Say>12345</Say> is spoken as "twelve thousand, three hundred forty-five".
  • Numbers separated by spaces are pronounced as individual numbers, e.g. <Say>1 2 3 4 5</Say> is spoken as "one two three four five."
  • Commas and periods in synthesized speech are interpreted as natural pauses.
  • If you want to insert a long pause, try using the <Pause> verb. <Pause> should be placed outside <Say> tags, not nested inside them.

There is a character limit on the text that <Say> can process, which varies depending on the Text To Speech option used. See the "Limits" section of the Text To Speech page for more information.


Consult AI Nutrition Facts for Programmable Voice - Text-to-Speech (TTS)(link takes you to an external page).