This page is for reference only. We are no longer onboarding new customers to Programmable Video. Existing customers can continue to use the product until December 5, 2024.
We recommend migrating your application to the API provided by our preferred video partner, Zoom. We've prepared this migration guide to assist you in minimizing any service disruption.
Twilio is launching a new Console. Some screenshots on this page may show the Legacy Console and therefore may no longer be accurate. We are working to update all screenshots to reflect the new Console experience. Learn more about the new Console.
Twilio Programmable Video is organized in two domains:
In P2P Rooms media is exchanged among clients. Hence, such media cannot be processed in the Twilio Cloud. On the other hand, in Group Rooms, participants publish their media to the Twilio Cloud, where an SFU (Selective Forwarding Unit) media server processes and redistributes it. Due to this, post-flight services are only available in Group Rooms.
The Video Recording Service is a post-flight service which persists the media information that is communicated among Group Room participants. In a Group Room, media is exchanged in the form of tracks. For example, if there are 7 participants publishing their microphone and webcam, there will be a total of 14 media tracks: 7 audio tracks and 7 video tracks. If later a screen-share is published, then the number of tracks will be 15. If Video Recording is activated for this Room, then 15 different recordings will be generated, one per media track.
Often, users want to playback the recorded Rooms. However, most media players are not capable of playing back Twilio Video Recordings and will exhibit strange behavior such as lack seekability and/or reporting the file is corrupt. This is because Twilio Video Recordings have been designed for flexibility, reliability and compactness, making them incompatible with common media players. To playback recordings, you will need to use the Video Compositions Service.
The Video Compositions Service is a post-flight service which provides the capability to create playable files by mixing Video Recordings. To create a Composition, developers need to specify:
Twilio Programmable Video in-flight APIs are designed around three main object types: Rooms, Participants and Tracks. The following UML diagram illustrates how these relate to Recordings and Compositions:
In the general case, when a Track is recorded it generates one Recording object, which in turn has one media file you can download. Hence, you may see the Recording object as a file containing the Track media and some additional metadata. However, there are some corners cases to that rule:
It is important to remark that Twilio does not support Data Track recordings. Hence, not all Track subtypes can generate Recordings.
A Composition aggregates one or more Recordings (i.e. in a Composition we must include at least one recording). However, a Recording can be included in zero or more Compositions, as nothing prevents the same Recording to be part of as many Compositions as you want.
A Composition is always associated with one Room, meaning that it is not possible to create Compositions mixing Recordings from multiple Rooms. However, for a given Room zero or more compositions can be created.
Video Recordings can be enabled by setting the Recording Rules for the Room or else setting the
record_participants_on_connect flag when creating the Group Room. When Video Recordings are enabled for a Room, all the published tracks are recorded. By defining Recording Rules it is possible to choose which of the published tracks should be recorded. By default, Video Recordings are disabled and no tracks are recorded.
Enabling Video Recordings in the Console
To enable Video Recording in the Console, navigate to your Room Settings and:
Enabling Video Recordings using the Rooms REST API
The Room REST API allows to override the Console specified behavior on a per-room basis. This is done using the
RecordParticipantsOnConnect POST parameter. If the room
Type is not
group-small this parameter is ignored, otherwise, the following holds:
is not set at POST time, the Console default is used.
is set to
, then recordings are enabled no matter what the Console specifies.
is set to
, then recordings are disabled no matter what the Console specifies.
You can check our official Room REST API documentation for further information on how to use this parameter.
The Video Recordings Logs Console page makes it possible to manage your video recordings. On this page, you can:
Retrieve video recordings by SID
In the Video Recordings Logs every recording will be represented as a row containing:
status will have an associated file that can be downloaded.
Clicking on the Date field will open the Recording Details Console page where you will be able to:
The Video Recording Logs in the Console allow you to perform basic administration and monitoring tasks. However, if you want to have full programmatic control over how your recordings are managed, you should use the Video Recordings REST API. As with most of Twilio's products, this API is based on the following communication model:
The Video Recording Rules REST API allows you to customize the recording settings for in-progress Rooms. Using this API, you can control aspects such as when to start, pause, and stop the recordings, which participants to record, and which tracks to record. The rules can be updated at any time during the life of the room.
Twilio Group Rooms provide PSTN interoperability. This means that any Twilio Voice call can be connected to a Group Room and it will behave as a participant that does not publish or subscribe to any video. In this scenario we have a Twilio Voice call that is, at the same time, a Twilio Video participant. When considering recordings, this situation generates some confusion among developers. The following should help to clarify:
Hence, when it comes to recordings, a PSTN participant has 4 options:
Twilio Video Compositions are a post-flight service developers can use to create playable files by mixing their Video Recordings. A given composition can only include recordings generated on the same source Group Room. This guarantees all recordings in the composition have a common time reference and can be synchronized. However, not all recordings in the source Group Room need to be into a the composition. Developers can select the specific audio recordings to be included, which will be mixed through a linear adder, as well as the desired video recordings, which will be composed following a layout developers also provide.
Developers can compose a given Group Room in multiple ways to stress different aspects of the communication that might be relevant for different use-cases. For example, an e-learning session can be composed with a Picture-in-Picture layout showing the screen and webcam of the teacher so that students can playback the lesson. It can also be composed in a grid layout showing all the student webcams for evaluating the degree of attention of the students.
Note: The maximum size of all selected Recordings for a Composition is 40 GB. For estimation of Recording's size check this table.
As with Recordings, the Video Compositions Logs Console page makes it possible to manage your video compositions. Using it, you can:
In the Video Compositions Logs every composition will be represented as a row containing:
status will have an associated file that can be downloaded or played.
Clicking into the Date field you will open the Composition Details Console page where you will be able to:
The Video Compositions REST API is also based on the above mentioned REST/Callbacks model:
parameters. Twilio will send webhooks to the URL specified in the former with the HTTP method given in the latter. Some common status callbacks related to compositions are:
To develop applications involving Video Recordings and Compositions developers must use the Video Recordings and Video Compositions REST APIs and listen to the appropriate HTTP callbacks that Twilio generates. Understanding the relationship among these APIs and callbacks may be challenging. To illustrate how this works, the following picture shows an example timeline of the different requests and callbacks that occur in a simple service. For simplicity, we have omitted all the events that are not directly related to recordings or compositions. Requests are represented at the bottom while callbacks are at the top.
In chronological order (from left to right) this picture shows the following:
Create Room (POST): This is the first step of the application. The Group Room can be created using a POST request fired using the Rooms REST API. In this case the Room callback URL and method can be set as POST parameters. Notice that in ad-hoc Rooms this step is omitted and the callback parameters are taken from the Console Room Settings. All the room related callbacks as well as all the recordings related callbacks will be published to that URL with the given HTTP method. Notice also that we are assuming this room is created with recordings enabled.
room-created: This callback is fired just after the room is created.
recording-started: When the first track is published by a participant the first recording-started event if fired to indicate that a new recording has been created. This callback also provides the corresponding Recording SID. The newly created recording should be in state
recording-started: Given that tracks are recorded individually, further track publications will fire further recording-started callbacks.
Fetch Recording (GET): Once we have received a recording-started event, we can use the Video Recordings REST API to fetch the recording metadata. In the above timeline, such a recording should be in state
processing and the recording media file will not be yet available.
recording-completed: When a track is unpublished, our media server finishes the recording and makes available the recording file. When this happens, the recording state goes to
completed. After that the recording-completed callback is fired indicating to the application that the recording media file can be downloaded.
Fetch Recording Media (GET): Once the recording-completed callback has been received, the application can safely fetch the associated media file. In that case, the GET request will return an HTTP redirection pointing to a self-signed temporary URL where the recording media file can be downloaded.
room-ended: when a room is completed all the published tracks are automatically unpublished and all the associated recordings are completed. A room can be completed in multiple ways. Note that the room's
empty_room_timeout value can impact how long it takes for recordings to complete; this value determines the amount of time before a room ends after all participants have left the room. The default value is five minutes, which means that it will take at least five minutes after all participants have left a room before the room is ended and the recordings are completed. Check our Rooms documentation for further information.
recording-completed: depending on how a room is completed and on whether there are still published tracks, the recording-completed callback for the pending recordings may arrive before or after the room-ended event. As a general rule, your application should not assume any specific order for these events.
Create Composition (POST): Developers can fire the POST request for creating a composition at any time after the room-created event is received. The above timeline does it after the last recording-completed event is received, but developers can do it at any other time. As part of the POST parameters developers can specify the callback URL as well as the method to be used by the compositions webhooks. The POST request will return information about the newly created composition including the Composition SID. A composition is typically created in the state
enqueued indicating that it is waiting for the available computing resources to start the media mixing operations.
Fetch Composition (GET): after a composition has been created developers can fetch its associated metadata. However, the composition media file will not be available until the composition goes to the state
composition-started: this callback is fired when the composition is taken out of the queue and allocated the appropriate computing resources to proceed. Notice that just before firing this event the composition state transitions to
processing. Notice also that the total composition queue time is variable and depends on load conditions.
composition-progress (3): while the composition is
processing Twilio will fire periodic composition-progress callbacks providing a hint on the degree of completeness of the processing. The composition processing time depends on the source room duration and on the selected resolution and formats. As a worst-case, the rule of thumb is the total processing time should be under half of the duration of the room.
composition-available: when the media processing is completed the composition state transitions to
completed and the associated media file is made available in our cloud. At that point, the composition-available event is fired.
Fetch Composition Media (GET): Once the composition-available callback has been received, the application can safely fetch the associated media file. In that case, the GET request will return an HTTP redirection pointing to a self-signed temporary URL where the composition media file can be downloaded.
Many times developers need to compose all their Rooms with the same layout. Doing this requires firing a create composition POST request for each Room. In those cases, it may be more efficient to use the Composition Hooks REST API. A Composition Hook is a template that describes how a composition should be created. When a Composition Hook is active in a given Twilio Account, all the Group Rooms generating at least one recording that are completed in that account will be composed with the specified template. You can find full details on how to create and manage your Composition Hooks in our official reference documentation. Using Composition Hooks is similar to directly use the Composition REST API. The main differences are illustrated in this figure:
Twilio Video Recordings and Compositions are stored in encrypted volumes and are only transferred to the Internet under strong cryptographic protection. However, many of our customers require further privacy guarantees to comply with their applicable legislation and policies. Due to this, Twilio has created the Video Recordings Settings and the Video Composition Settings. These capabilities make it possible to configure a Twilio Account to use special protection.
If you activate Encrypted Video Recordings in a Twilio account, all the Video Recordings media files generated in that account will be cryptographically protected with a public key provided by you. Hence, only you will be able to decrypt such recordings. Please, read our Encrypting your Stored Media guide for further information on how to use this feature.
If you activate External AWS (Amazon Web Services) S3 Video Storage in an account, all the Video Recordings media files generated in that account will be directly stored in an S3 bucket specified by you. Hence, Twilio will not store or keep the media files you create on your behalf. Please, read our Storing into AWS S3 guide for further information on how to use this feature.
To fully understand how Video Recordings and Compositions are managed inside Twilio, please observe the following architectural diagram:
As shown, there are two parallel information flows:
The Signaling/Metadata information flow
Video Recordings and Compositions are REST resources containing metadata describing the associated media files. That includes information regarding times, states, formats, durations, etc. If Twilio APIs are used appropriately, that metadata should not contain any kind of PII. For tracking purposes, when a Recording or a Composition are deleted, the metadata is kept for 30 days.
The Media information flow
The media information starts at the Group Room where our media server receives the audio and video bytes. The Video Recording Service can then read those bytes and create the appropriate Video Recordings that are stored following the corresponding account configuration specified in the Video Recording Settings:
Only when Recordings are stored into the Recordings Media Repository Twilio can read them. Due to this, Compositions are only possible on Recordings that are stored into that repository. When a Composition is created, the corresponding Recordings are read and mixed and the appropriate composed media file is generated and stored again following the corresponding account configuration specified in the Composition Settings.
For completeness, this section lists the not yet supported features:
Twilio Video Recordings:
Twilio Video Compositions: