Skip to contentSkip to navigationSkip to topbar
Rate this page:
On this page

Media Streams Overview


Media Streams gives access to the raw audio stream of your Programmable Voice calls by sending the live audio stream of the call to a destination of your choosing using WebSockets. This enables use cases such as real-time transcriptions, sentiment analysis, voice authentication, and more. Raw audio can also be streamed into Twilio Voice calls from another application, enabling use cases such as conversational IVR or integrations with real-time conversations with an AI chat bot assistant.

(information)

Support for Twilio Regions

Media Streams is now available in the Ireland (IE1) and Australia (AU1) Regions.

In order to get started with Media Streams, you should first be familiar with making and/or receiving Voice Calls with Twilio. If this is your first time building with Twilio Programmable Voice, complete one of the Voice quickstarts(link takes you to an external page).

Before you start building with Media Streams, you need to decide whether unidirectional or bidirectional streams are best for your use case. The sections below explain the differences between each option and provide links to helpful docs and resources.


Unidirectional Media Streams

unidirectional-media-streams page anchor

Unidirectional Media Streams are those in which your WebSocket application receives a Call's audio stream, but your WebSocket application can't send an audio stream back to Twilio (to then play in the Call).

With unidirectional Media Streams, you can receive the inbound audio track (the audio that is incoming to Twilio), the outbound audio track (the audio that Twilio is generating on the call), or both tracks.

DTMF is not yet supported with unidirectional Media Streams.

You can start a unidirectional Media Stream using <Start><Stream> TwiML instructions or via REST API using the Stream resource.

If you use TwiML for the stream, Twilio executes <Start><Stream>, which initiates the audio stream with your WebSocket server, and then executes the next TwiML instruction you provide.

You can stop a unidirectional Media Stream using <Stop><Stream> or via the Stream resource.

Resources for unidirectional Media Streams

resources-for-unidirectional-media-streams page anchor

Check out the following resources to help you build your unidirectional Media Streams application:


Bidirectional Media Streams

bidirectional-media-streams page anchor

Bidirectional Media Streams are those in which your WebSocket application both receives audio from Twilio and can send audio to Twilio, which is then played on the Call. An example use case for bidirectional Media Streams is to facilitate a real-time voice conversation with an AI assistant.

With bidirectional Media Streams, you can only receive the inbound track.

DTMF is supported with bidirectional Media Streams only in the inbound direction, from Twilio toward your media server. Sending DTMF outbound from your media server toward Twilio is not supported.

To start a bidirectional Media Stream, use <Connect><Stream>. These TwiML instructions block subsequent TwiML instructions unless the WebSocket connection is disconnected.

You cannot use the Stream resource to start a bidirectional Media Stream.

You can stop a bidirectional Media Stream by closing the WebSocket connection from your server or by ending the Call.

Resources for bidirectional Media Streams

resources-for-bidirectional-media-streams page anchor

Check out the following resources to help you build your bidirectional Media Streams application


For unidirectional Streams, you can stream up to four tracks at a time on a Call.

For bidirectional Streams, you can have only one Stream per Call.

Each Media Stream is associated with one WebSocket connection.


Communicate with Twilio's media servers

communicate-with-twilios-media-servers page anchor

Your Media Streams application must be able to communicate with Twilio.

Configure your fiewall rules to allow secure WebSocket connections (TCP port 443) from Twilio to your WebSocket servers from any public IP address.

You must also configure your application to validate the X-Twilio-Signature header. This is how your application verifies that a Media Stream is coming from an authentic Twilio source. Learn more on the General Usage - Security page.



Rate this page: