Skip to contentSkip to navigationSkip to topbar
Rate this page:
On this page

Developing High Quality Video Applications


(warning)

Warning

This page is for reference only. We are no longer onboarding new customers to Programmable Video. Existing customers can continue to use the product until December 5, 2024(link takes you to an external page).
We recommend migrating your application to the API provided by our preferred video partner, Zoom. We've prepared this migration guide(link takes you to an external page) to assist you in minimizing any service disruption.


Overview

overview page anchor

This guide provides advice for developing high-quality Twilio Video applications. For an optimal end-user experience, we highly recommend that you read the complete Twilio Programmable Video documentation and tailor our general recommendations provided here to your specific use-case.



Use this table as a fast guide to find the recommended settings for your application.

Choosing the column

Choosing the row

  • To choose the Room type see this Section . If in doubt, use Group Rooms.
  • To choose your mode (i.e. grid , collaboration or presentation ) use this section . If in doubt, use collaboration .
Desktop BrowserMobile BrowserMobile SDK
P2P RoomRecommended SettingsRecommended SettingsRecommended Settings
Group Room (grid)Recommended SettingsRecommended SettingsRecommended Settings
Group Room (collaboration)Recommended SettingsRecommended SettingsRecommended Settings
Group Room (presentation)Recommended SettingsRecommended SettingsRecommended Settings

What does Quality Mean?

what-does-quality-mean page anchor

Quality is an elusive concept that may have different meanings in different contexts. With Twilio Programmable Video, quality is a synonym of Quality of Experience understood as to how well a video application solves to end-users' needs and addresses their expectations.

Videoconferencing is the most typical use-case of real-time video applications. They allow end-users to communicate "as they do face-to-face." Hence, end-users expectations are high fidelity (i.e. high resolution, high frame-rate, etc) and low latency (i.e. real-time conversational interactions). However, the quality of experience may also be impacted by other aspects such as battery consumption, availability of computing and networking resources, etc. Some of the variables affecting quality inversely impact one another. For example, if you increase the video resolution then the battery consumption and the networking costs will also increase.

Hence, before starting developing a high-quality video application, first you must wonder: what do end-users need and expect? Having a precise answer to that question will help you make the most appropriate decisions for quality optimization.


Concepts and Terminology

concepts-and-terminology page anchor

You may find useful the following concepts and definitions:

Resolution

Video tracks can be understood as sequences of still images each of which is encoded as a matrix of pixels. The resolution refers to the dimensions of such a matrix expressed as width x height. The following resolutions are common:

ResolutionDimensions (pixels)
FullHD (Full High Definition) - aka 1080p1920x1080 [1]
HD (High Definition) - aka 720p1280x720
qHD (Quarter High Definition) - aka 540p960x540 [1]
VGA (Video Graphics Array)640x480
QCIF (Quarter Common Interface Format)176×144

Frame-rate

The frame-rate refers to the number of still images that the video stream includes per time unit. It is typically expressed in terms of fps (frames per second). Hence, an HD@30fps video will comprise a sequence of 30 HD still images per second.

Bitrate

The bitrate refers to the number of bits that a given video or audio stream consumes when being transported through a digital network. It is typically measured in terms of bps (bits per second) sometimes prefixed with a power of 10 prefix (e.g. Kbps, Mbps, etc).

P2P and Group Rooms

P2P and Group Rooms and the two main building blocks of Twilio Programmable Video APIs. Please, read our Understanding Video Rooms guide for further guidance.

Codecs: VP8, H.264, and VP8 Simulcast

A codec refers to a type of algorithm that encodes a video signal typically compressing it in the process. VP8 and H.264 are the two main codecs used for videoconferencing. VP8 Simulcast is a scalable version of the VP8 codec. For further information, you may read our Managing Codecs and Working with VP8 Simulcast developer guides.

Network Bandwidth Profile API

The Network Bandwidth Profile API (aka BW Profile API) is a Twilio Video API specifically designed for optimizing bandwidth utilization in Group Rooms. This is a critical API for creating high-quality Group Room applications.

Track Priority API

The Track Priority API allows developers to set the relative priority of Tracks in a video application. The Network Bandwidth Profile API uses Track priorities to assign bandwidth to tracks.

Dominant Speaker Detection API

The Dominant Speaker is the participant having the highest audio activity at a given time. Many videoconferencing applications enhance the Dominant Speaker (e.g. by representing it larger in the central area of the UI). Twilio's Dominant Speaker Detection API makes it possible for developers to be notified when the Dominant Speaker changes in a Group Room. Refer to our Detecting the Dominant Speaker developers guide for further guidance.

Network Quality API

The Network Quality API is a Video API specifically designed for monitoring the network quality on Group Rooms. Refer to Using the Network Quality API developer guide for further information.


Minimum Bandwidth Recommendations

min-bw-recommendations page anchor

Video Bitrates

The bandwidth requirement of video streams will depend on the codec, resolution and frame rate. The following table describes the minumum bandwidth required for various codecs and resolutions. In all cases the frame rate is assumed to be 30 fps.

Video CodecResolution (width x height)Bitrate (kbps)
VP8176x144150
VP8640x480400
VP81280x720650
VP81920x1080 [1]1,200
VP8 Simulcast176x144150
VP8 Simulcast640x480550
VP8 Simulcast1280x7201,400
VP8 Simulcast1920x10803,000
H.264 [2]176x144125
H.264 [2]640x480400
H.264 [2]1280x720600

Screen Share Bitrates

Screen share typically uses a frame rate of 5 fps. The following table describes the minumum bandwidth required for various codecs and resolutions. In all cases the frame rate is assumed to be 5 fps.

Video CodecResolution (width x height)Bitrate (kbps)
VP81280x72085
VP81920x1080 [1]175
VP8 Simulcast1280x720700
VP8 Simulcast1920x10801,800
H.264 [2]1280x72090

[1]: Note that on some devices, frame dimensions may differ slightly due to limitations of some hardware video encoders requiring the dimensions to be a multiple of 16. For example, 960x540 may actually be 960x528.

[2]: Note that each device or browser has a different H.264 codec implementation and as such there will be some variance to the bitrates presented above.

Audio Bitrates

The default bitrate for the Opus codec is 32 kbps.


Before going deep into the technical details, it may be interesting to understand some general common-sense recommendations that you may find useful in your design process.

Subscribe only to what end-users need

Encoding, communicating and rendering video tracks is expensive. This is very noticeable in multiparty applications when the number of participants is large. For example, in a room with 20 participants, it is generally a bad idea to have all the participants rendering 20 high resolution video tracks. That could contribute to network congestion and will overload the client CPU resources making the quality of experience unacceptable. Instead, well-designed videoconferencing services tend to limit the number of rendered video tracks to the ones that are really required. For example, in an e-learning application, it doesn't provide much value having all the students rendering the video of the rest of the students all the time. It is more reasonable to do it only in special situations such as when a question is being asked by that specific participant. In that case, developers must make use of the Network Bandwidth Profile API, which dynamically adjusts to the dominant speaker and rendered size of the participants who are displayed on screen. In addition, the Network Bandwdith Profile API can automatically switch off video tracks that are not visible on screen.

Make it simple for end-users to mute

Your application should provide mute capabilities to end-users so that they can disable the video or audio communication as they wish. This will avoid unnecessary traffic and background noise.

Use VP8 Simulcast in multiparty Group Rooms

Multiparty Group Rooms participants should prefer VP8 Simulcast over other video codecs. The larger the number of participants in a room, the more important Simulcast is for providing the best possible quality of experience.

Use a reasonable resolution and frame-rate

Frame-rate and resolution are the two main capture constraints that affect video fidelity. When the video source is a camera showing people or moving objects, typically the perceptual quality is better at higher frame-rate. However, for screen-sharing, the resolution is typically more relevant. You should try to set resolution and frame-rate to the minimum value required by your use-case. Over-dimensioning resolution and frame-rate will have a negative impact on the CPU and network consumption and may increase latency. In addition, remember that the resolution and frame-rate you specify as capture constraints are just hints for the client video engine. The actual resolution and frame-rate may decrease if CPU overuse is detected or if the network capacity is not enough for the required traffic.

Consider the render dimensions

When setting your video capture constraints for publishers you must also wonder about the render size on the subscriber's side. If you know that a given video track is to be rendered only in thumbnail size for all subscribers, then it does not make sense to capture it in high resolution at the publisher.

Do not share resources

High-resolution video and audio consume relevant CPU and bandwidth resources. If those resources are being shared with other applications the quality of experience will decrease. To have the best possible experience, you should recommend your end-users to close all the applications that may steal CPU or bandwidth to your video service while it's executing.

Use the best connectivity you can find

Network connectivity is the most critical aspect affecting communication quality. Restricted bandwidth, high latency, and packet loss may affect very negatively your end-users' experience. Hence, you should recommend using the best possible network access they may find: wired connectivity is commonly better than a wireless connection. Among wireless connections, typically corporate or cellular connectivity is better than public open shared WiFi networks.

Using maxVideoBitrate or maxAudioBitrate

Both parameters allow controlling the maximum Participant's upstream bandwidth.

  • maxVideoBitrate specifies the maximum video bitrate a participant can publish to the Room. By default, no value is set and the maxVideoBitrate is unlimited. In that case, the bitrate is only limited by the Twilio client SDK using an algorithm that considers the available bandwidth and CPU resources. In general, we recommend trusting that algorithm and avoid setting the maxVideoBitrate . However, in devices with restricted CPU or battery life we recommend setting maxVideoBitrate to a value between 500000 and 2000000 bps per track. Note, if a Participant is Publishing N video tracks then each video track will be limited to consuming maxVideoBitrate/N .
  • maxAudioBitrate specifies the maximum audio bitrate published by a Participant. It only takes effect when using Opus (i.e. it has no effect on PCM codecs). By default it is unset and Opus is configured with its default settings consuming between 20Kbps and 40Kbps. Twilio's recommendation is to keep the default. However, when the audio is human speech, we may restrict maxAudioBitrate to 16Kbps to save bandwidth without any significant quality degradation. Do not restrict maxAudioBitrate if you intent to communicate music or other type of audio signal beyond human speech.
RecommendationWhen to use it
maxVideoBitrateKeep default (unset)In mobile platforms keep it between 500000 and 2000000 bps per video track
maxAudioBitrateKeep default (unset)In speech communications keep it over 16000 bps per audio track

Use GLL

On the Internet, latency and packet loss depend on geolocation. When the connection between a sender and a receiver spans the globe, latency and jitter are increased by the distance between the parties. Packet loss is also more likely, due to the number of routers in the connection path. Due to this, the Twilio infrastructure that serves your rooms should be as close as possible to your clients. Otherwise, quality may be affected:

  • In both P2P and Group Rooms the connectivity time may increase.
  • In Group Rooms the media latency and packet loss may increase making the fidelity to drop.

To minimize these problems, Twilio makes it possible to specify the signaling and media regions for your Rooms. However, determining what's the closest region for a participant is not always trivial. For this reason, we recommend developers use GLL (Global Low Latency). When GLL is specified, Twilio will automatically choose the region that minimizes latency. See our Video Regions and Global Low Latency documentation for further insight.

Measure

Quality should be understood as a process. You should try to measure both your end-users' perception as well as the many different factors that may affect it including CPU consumption and network connectivity metrics. You may find Twilio's Network Quality API interesting for the latter. With that information, try to understand your end-users' pain points and design a strategy to minimize them. Periodically repeating the measure-analyze-implement cycle is the best way to guarantee you are offering the best possible quality of experience to your users.


P2P or Group Rooms: Which Room Should I Use?

p2p-or-group-rooms page anchor

Selecting the most appropriate Room Type for your use-case is a critical step. For that, we strongly recommend following our Understanding Twilio Video Rooms Developers Guide. From the quality perspective, and without any consideration to features or compliance, the difference between P2P and Group Rooms can be synthesized in the following table:

P2P RoomsGroup Rooms
Media connectionsClient-to-client communicationServer-routed communication
Upstream bandwidthProportional to the number of participantsConstant with the number of participants

You may find the following rules of thumb useful to assess the suitability of P2P Rooms for your use-case:

  • If you require high-quality video, then P2P Rooms are only recommended for 1-to-1 communications.
  • If you can tolerate low-quality video, then P2P Rooms can be used for rooms with up to 4 participants.
  • If your room has only audio, then P2P Rooms can go up to 10 participants.
  • In the rest of the cases, Group Rooms will probably offer you better quality.

Enhancing Quality in P2P Rooms

enhancing-quality-in-p2p-rooms page anchor

Using the appropriate client-side settings is essential for optimizing P2P Room quality. The following recommendations may be useful for that purpose.

Desktop Browser in P2P Rooms: Recommended Settings

desktop-browser-in-p2p-rooms page anchor

Show me the code


_10
Twilio.Video.connect('$TOKEN', {
_10
name: 'my-room-name',
_10
audio: true,
_10
maxAudioBitrate: 16000, //For music remove this line
_10
video: { height: 720, frameRate: 24, width: 1280 }
_10
});

Codec Settings

SettingRecommended value
Video codec-Use VP8 (default)-H.264: only if needed for interoperability reasons-Never use Simulcast
Audio codec-Use Opus (default)

Video Capture Settings

SettingRecommended value
For webcam-Use HD@24fps-Consider VGA@30fps you detect CPU overuse
For screen-Use FullHD@15fps-Consider HD@15fps if you detect CPU overuse

Mobile Browser in P2P Rooms: Recommended Settings

mobile-browser-in-p2p-rooms page anchor

Show me the code


_10
Twilio.Video.connect('$TOKEN', {
_10
name: 'my-room-name',
_10
audio: true,
_10
maxAudioBitrate: 16000, //For music remove this line
_10
video: { height: 480, frameRate: 24, width: 640 }
_10
});

Codec Settings

SettingRecommended value
Video codec-Use VP8 (default)-H.264: only if needed for interoperability reasons-Never use Simulcast
Audio codec-Use Opus (default)

Video Capture Settings

SettingRecommended value
For webcam-Use VGA@24fps-Consider HD@24fps if codec has HW support
For screen-Use HD@15fps-Consider FullHD@15fps if if codec has HW support

Mobile SDKs in P2P Rooms: Recommended Settings

mobile-sdks-in-p2p-rooms page anchor

Show me the code

Android SDK


_12
VideoConstraints videoConstraints =
_12
new VideoConstraints.Builder()
_12
.maxFps(24)
_12
.maxVideoDimensions(VideoDimensions.VGA_VIDEO_DIMENSIONS)
_12
.build();
_12
_12
LocalVideoTrack localVideoTrack = LocalVideoTrack.create(context, true, videoCapturer, videoConstraints);
_12
_12
ConnectOptions connectOptions = new ConnectOptions.Builder(accessToken)
_12
.videoTracks(Collections.singletonList(localVideoTrack)
_12
.encodingParameters(new EncodingParameters(16, 0)
_12
.build();

iOS SDK


_16
let format = VideoFormat()
_16
format.dimensions = CMVideoDimensions(width:640, height: 480)
_16
format.frameRate = 24
_16
_16
camera.startCapture(device: device, format: format, completion: { (captureDevice, videoFormat, error) in
_16
// Any code needed to run after capture starts
_16
});
_16
_16
localVideoTrack = LocalVideoTrack(source: camera, enabled: true, name: "Camera")
_16
_16
let connectOptions = ConnectOptions(token: accessToken) { builder in
_16
if let localVideoTrack = localVideoTrack {
_16
builder.videoTracks = [localVideoTrack]
_16
}
_16
builder.encodingParameters = EncodingParameters(audioBitrate:16, videoBitrate:0)
_16
}

Codec Settings

SettingRecommended value
Video codec-Use VP8 (default)-H.264: only if needed for interoperability reasons-Never use Simulcast
Audio codec-Use Opus (default)

Video Capture Settings

SettingRecommended value
For webcam-Use VGA@24fps-Consider HD@24fps if codec has HW support.
For screen-Use HD@15fps-Consider FullHD@15fps if codec has HW support.

Enhancing Quality in Group Rooms

enhancing-quality-in-group-rooms page anchor

Group Room quality strongly depends on how the bandwidth is managed. To optimize quality, you must make sure that your video tracks are appropriately prioritized and that bandwidth is allocated in alignment with your use-case needs. This is done using the Track Priority API and the Network Bandwidth Profile API.

Track Priority API: Recommendations

track-priority-api page anchor

Track priorities are used to determine the importance of tracks. They are used to allocate bandwidth and to decide which tracks should be switched off in case of congestion. Track priorities are use-case dependent and setting them correctly is essential for having optimal quality. The following general guidelines may be helpful for that objective:

Audio track priorities

  • From the perspective of the Network Bandwidth Profile API, audio tracks are always a higher priority than video tracks. Hence, you may think of audio as being in a special more important category.
  • Setting the priority of an audio track will have no effect in your application.

Video track priorities

  • If there is one participant or video track that is more important than the others then this should be set to high priority so that in the case of network congestion this video track will be the last to be switched off
  • Typically there should be only one video track with priority high . When screen-sharing, the screen should be the high priority track. If screen-share is absent, and Dominant Speaker Detection is activated, the dominant speaker video may be the high priority track.
  • You may need to dynamically adapt video track priorities. For example, dominantSpeakerPriority may need to go from high to low when a screen-share is activated.

Network Bandwidth Profile API: Selecting the mode

network-bandwidth-profile page anchor

Determining the Bandwidth Profile Mode Bandwidth Profiles have three modes: collaboration, presentation, and grid. You can determine the mode that best fits your use-case with the following decision diagram:

Decision diagram for Network Bandwidth Profile mode selection.

Do I use Group Rooms?

  • If your application uses Twilio P2P Rooms, answer NO. Otherwise, answer YES.

Is it a multiparty service?

  • If your application is only used for 1-to-1 communications (i.e. there are never more than 2 Participants in the Room) answer NO. Otherwise, answer YES.

Is there a main video track?

  • If your application UI renders all video tracks with the same display size, answer NO. If your application has one (or several) video tracks that are enhanced in the UI (e.g. dominant speaker, screen-share, etc.) taking more display area answer YES.

Can I use VP8 Simulcast?

  • If a relevant fraction of your application end-users cannot use VP8 simulcast (e.g. because you have decided to use H.264, or because it's not supported, etc.) answer NO. Otherwise, answer YES.

Is the main track quality critical?

  • If you prefer the main video track quality to be preserved by all means, even at the cost of completely switching off other less relevant tracks when bandwidth is low (e.g. the screen-share in a presentation), answer YES. Otherwise, answer NO.

Developing Applications with grid mode

grid-mode page anchor

Applications use grid mode for one of the following reasons:

  • The application is 1-to-1.
  • The application is for multiparty communications but the UI layout does not enhance any video tracks over others (i.e. all tracks are rendered with the same size).
  • It's not possible to use Simulcast. Note that for large rooms (i.e. rooms with 5 or more participants), not using Simulcast will typically bring a significant degradation on video quality even in grid mode.
Typical GUI layout used for grid mode. Videos are displayed in a matrix where all video tracks have equal relevance.

Developing Applications with collaboration mode

collaboration-mode page anchor

Applications using collaboration mode typically share the following properties:

  • Interactions are multiparty (i.e. a large number of participants communicate)
  • The UI layout is designed to enhance one main video track (e.g. dominant speaker).
  • The rest of the video tracks are displayed in thumbnail size.
  • Keeping all tracks visible is more important than having higher quality in the main track.
Applications using collaboration mode typically enhance the dominant speaker and represent the rest of participants in thumbnail size.

Developing Applications with presentation mode

presentation-mode page anchor

Applications using presentation mode typically share the following properties:

  • Interactions are one-to-many (i.e. one participant presents to a large audience).
  • The UI layout is designed to enhance one main video track (e.g. the presenter screen-share).
  • The rest of the video tracks may or may not be displayed as they are not so relevant.
  • Presenter quality is critical and more relevant than keeping viewers' tracks on.
Applications using presentation mode typically have a screen-share track whose quality must be maximized by all means. They may additionally display the presenter's webcam or other participants webcam but with lower priority.

Rate this page: