When a phone call comes in to one of your Twilio numbers, Twilio makes an HTTP request to the URL configured for that number. In your response to that request you can tell Twilio what to do on the call.
Twilio behaves just like a web browser when making HTTP requests to URLs:
Twilio does the right thing when your application responds with different MIME types:
|text/xml, application/xml, text/html||Twilio interprets the returned document as an XML Instruction Set (which we like to call TwiML). See the TwiML Interpreter section for details. This is the most commonly used response.|
|various audio types||Twilio plays the audio file to the caller, and then hangs up. See the
|text/plain||Twilio reads the content of the text out loud to the caller, and then hangs up.|
When your application responds to a Twilio request with XML, Twilio runs your document through the TwiML interpreter. To keep things simple, the TwiML interpreter only understands a few specially named XML elements. In TwiML parlance these are divided into three groups: the root
<Response> element, "verbs" and "nouns". Each group is discussed below.
The interpreter starts at the top of your TwiML document and executes instructions ("verbs") in order from top to bottom. As an example, the following TwiML snippet reads "Hello World" to the caller before playing Cowbell.mp3 for the caller and then hanging up.
<?xml version="1.0" encoding="UTF-8" ?> <Response> <Say>Hello World</Say> <Play>https://api.twilio.com/Cowbell.mp3</Play> </Response>
TwiML elements ("verbs" and "nouns") have case-sensitive names. For example, using
<say> instead of
<Say> will result in an error. Attribute names are also case sensitive and "camelCased." And you can use XML comments freely; the interpreter ignores them.
The root element of Twilio's XML Markup is the
<Response> element. In any TwiML response
to a Twilio request, all verb elements must be nested within this element. Any other structure is
<?xml version="1.0" encoding="UTF-8"?> <Response> <Say>Hello</Say> </Response>
Most XML elements in a TwiML document are TwiML verbs. Verb names are case sensitive, as are their attribute names. There are only six core TwiML Voice verbs and four secondary verbs, with detailed documentation on each. The six core verbs are:
<Say>: Read some text to the caller.
<Play>: Play an audio file to the caller.
<Record>: Record a call or part of a call.
<Gather>: Get the digits a caller presses.
<Dial>: Call another phone number or conference and connect the current caller.
<Sms>: Send an SMS message during a call.
Note that there are certain situations when the TwiML interpreter may not reach verbs in a TwiML document because control flow has passed to a different document. This usually happens when a verb's 'action' attribute is set. For example, if a
<Say> verb is followed by a
<Sms> and then another
<Say>, the 2nd
<Say> is unreachable if the
<Sms> verb's 'action' URL is set. In this case, call flow continues with the TwiML received in your response to the 'action' URL request.
A Noun in TwiML is anything nested inside a verb that is not itself a verb. It's whatever the verb is acting on. This is usually just text. But sometimes, as in the case of
<Dial> with its
<Conference> nouns, there are nested XML elements that are nouns.
Status callbacks do not control call flow, so TwiML does not need to be returned. If you do respond, use status code 204 No Content or 200 OK with Content-Type: text/xml and an empty
<Response/> in the body. Not responding properly will result in warnings in Debugger.