Learn more with Workshops and Technical Sessions

Developer Conference for Communications
This document describes Twilio's old 2008-08-01 API. Please use the latest version.

TwiML TM Voice: Say Verb

The <Say> verb converts text to speech that is read back to the caller. <Say> is useful for development or saying dynamic text that is difficult to pre-record.

Verb Attributes

The <Say> verb supports the following attributes that modify its behavior:

Attribute NameAllowed ValuesDefault Value
voiceman, womanman
languageen, es, fr, deen
loopinteger >= 01


The voice attribute allows you to choose a male or female voice to read the text back. The default value is "man".


The language attribute allows you pick a voice with a specific language's accent and pronunciations. The currently supported languages are "en" (English), "es" (Spanish), "fr" (French), and "de" (German). The default is "en".


The loop attribute specifies how many times you'd like the text repeated. The default is once.
Specifying 0 will cause the the <Say> verb to loop until the call is hung up.


The "noun" of a Twilio verb is the text body of the XML element: the thing the verb acts upon. In the case of <Say>, the noun is the text you wish spoken to the caller. There is a 4KB limit on the text that <Say> can process.

Nesting Rules

The <Say> verb can be nested in the following elements:

The following verbs can be nested within <Say>:

  • none


Example 1: Hello World

<?xml version="1.0" encoding="UTF-8" ?>
     <Say>Hello World</Say>

When a call is directed to the following TwiML document, the caller will hear "hello world" spoken once in a male voice.

Example 2: Hello, Hello

<?xml version="1.0" encoding="UTF-8" ?>
     <Say voice="woman" loop="2">Hello</Say>

This TwiML document tells Twilio to say Hello twice in a row with a female voice to the caller.

Hints and Advanced Uses

  • When translating text to speech, the <Say> tag will make assumptions about how to pronounce numbers, dates, times, amounts of money, and other abbreviations.

  • When saying numbers: 12345 will be spoken as "twelve thousand three hundred forty five". 1 2 3 4 5 will be spoken as "one two three four five".

  • Punctuation, such as commas and periods will be interpreted as natural pauses by the speech engine.

  • <Say> is useful for saying dynamic text that would be difficult to pre-record. In cases where the contents of <Say> are static, you might consider recording a live person saying the phrase and using the <Play> verb instead.

  • If you want to insert a longer pause try using the <Pause> verb. <Pause> should be placed outside <Say> tags not nested inside them.