Twilio behaves just like a web browser, so there's nothing new to learn.
Twilio does the right thing when your application responds with different mime types.
<Play> documentation for supported mime types.When Twilio interprets a TwiML document it starts at the top and executes verbs in the order they are placed in the document. For example, in the following code snippet "Hello World" is read to the caller before the Cowbell.mp3 file is played for the caller.
<?xml version="1.0" encoding="UTF-8" ?>
<Response>
<Say>Hello World</Say>
<Play>http://api.twilio.com/Cowbell.mp3</Play>
</Response>
There are certain situations when elements in a TwiML document may not be reached because control flow has passed to a different document. For example, if a <Say> verb is followed by a <Gather> and then another <Say>, the 2nd <Say> may not be reached if the <Gather> action was called. The following verbs may impact control flow: