Working with VP8 Simulcast
Overview
This guide introduces the Simulcast technique and explains how you can use it to enhance the video quality of your Group Room applications.
Contents
- What's Simulcast
- Enabling Simulcast in your Twilio application
- Resolution and Simulcast layers
- Simulcast and Capture Settings on Mobile SDKs
- Pros and cons of Simulcast
- Limitations and known issues
What’s Simulcast
An SFU (Selective Forwarding Unit) is a media infrastructure component used for scaling videoconferences. Twilio’s Group Rooms is based on an SFU that enables developers to add a large number of Participants to a video room by forwarding audio, video, and data information from each publisher to any of its subscribers. Given that this forwarding takes place at Twilio’s Cloud, there’s no additional client-side CPU or memory consumption as the number of Room Participants increases. However, the problem in these architectures is that SFUs just forward and can neither transcode nor modify the video. Hence, when there are subscribers with limited downlink bandwidth, publishers need to reduce quality to adapt to the worst of them so that no subscriber is congested. As shown in the following figure, this is suboptimal as we are constraining the quality of participants that could communicate with much higher quality.
Simulcast is a standardized technique designed for solving this problem. Simulcast involves the simultaneous sending of different versions of the same video track encoded independently at different resolutions and framerates. With Simulcast, the SFU has several versions of the track with different qualities, so that it can forward higher qualities to higher bandwidth subscribers and lower qualities to lower bandwidth ones. In more technical jargon, we say that Simulcast is a mechanism for providing scalability to non-scalable video codecs such as VP8.
Remark that Simulcast involves the track publisher (which needs to send the different track qualities) and the SFU (which selects the most optimal quality for each subscriber.) However, when Participants act only as subscribers they are not aware of Simulcast as they just receive a standard VP8 encoded video. Hence, they can neither enable nor disable Simulcast use.
Enabling Simulcast in your Twilio application
Simulcast can be enabled in Group Room clients sending media to Twilio’s SFU. The following table illustrates Twilio’s current support for Simulcast:
Twilio Video SDK | Browser (or N/A) | VP8 Simulcast Support (only Group Rooms) |
---|---|---|
JavaScript | Chrome | Yes (SDK v1.7.0+) |
JavaScript | Firefox | No |
JavaScript | Safari | Yes (Safari 12.1+ with SDK 1.17.0+) |
Android | N/A | Yes (SDK v2.1.0+) |
iOS | N/A | Yes (SDK v2.1.0+) |
Enabling Simulcast using the JavaScript SDK
By default, Simulcast is disabled. You can enable Simulcast on a per-Participant basis at Room connect-time. This is done using the ConnectOptions
as shown in the following code snippet:
// Web Javascript // Remember that Simulcast only needs to be enabled in media publishers // See compatibility table above with supported browsers and required SDK versions const room = await connect(token, { preferredVideoCodecs: [ { codec: 'VP8', simulcast: true } ] });
Any Group Room Participant with VP8 Simulcast enabled publishes all its video tracks using VP8 Simulcast. Once this is done, Twilio’s video infrastructure leverages Simulcast tracks to provide the best possible quality to any subscriber without requiring any additional action from you.
Enabling Simulcast using the iOS SDK
By default, Simulcast is disabled. You can enable Simulcast on a per-Participant basis at Room connect-time. This is done using the ConnectOptions
as shown in the following code snippet:
// Swift code // Remember that Simulcast only need to be enabled in media publishers // See compatibility table above to with required SDK versions let connectOptions = ConnectOptions(token: accessToken) { (builder) in builder.preferredVideoCodecs = [Vp8Codec(simulcast: true)] }
Any Group Room Participant with VP8 Simulcast enabled publishes all its video tracks using VP8 Simulcast. Once this is done, Twilio’s video infrastructure leverages Simulcast tracks to provide the best possible quality to any subscriber without requiring any additional action from you.
Enabling Simulcast using the Android SDK
By default, Simulcast is disabled. You can enable Simulcast on a per-Participant basis at Room connect-time. This is done using the ConnectOptions
as shown in the following code snippet:
// Java code // Remember that Simulcast only need to be enabled in media publishers // See compatibility table above to with required SDK versions ConnectOptions connectOptions = new ConnectOptions.Builder(accessToken).preferVideoCodecs(Collections.singletonList(new Vp8Codec(true))).build();
Any Group Room Participant with VP8 Simulcast enabled publishes all its video tracks using VP8 Simulcast. Once this is done, Twilio’s video infrastructure leverages Simulcast tracks to provide the best possible quality to any subscriber without requiring any additional action from you.
Resolution and Simulcast layers
Twilio SDKs encode up to three spatial layers when simulcast is enabled. The following table illustrates what layers are typically generated given a particular capture resolution. Remark that this is just an approximation and that the real behavior may be slightly different. In the table, disabled
means that that layer is not sent in those conditions (i.e. that quality is not generated by the
publisher and hence is not available at the SFU to be forwarded to subscribers.)
Capture resolution | Layer 1 | Layer 2 | Layer 3 |
---|---|---|---|
352x288 | 352x288 | disabled | disabled |
480x360 | 240x180 | 480x360 | disabled |
640x480 | 320x240 | 640x480 | disabled |
640x480 (with crop) | 240x240 | 480x480 | disabled |
960x540 | 240x135 | 480x270 | 960x540 |
1024x768 | 256x192 | 512x384 | 1024x768 |
1024x768 (with crop) | 240x192 | 480x384 | 960x768 |
1280x720 | 320x180 | 640x360 | 1280x720 |
1280x720 (with crop) | 225x180 | 450x360 | 900x720 |
Simulcast and Capture Settings on Mobile SDKs
To optimize video quality while minimizing CPU usage and bandwidth, it is recommended to use VP8 simulcast with the capture settings suggested below on each mobile platform.
iOS
Capture Frame Rate
24 FPS. When simulcasting, this will result in 3 temporal layers of 24 FPS, 12 FPS, and 6 FPS. Selecting 24 frames / second instead of the default of 30 reduces the CPU load on the VP8 software encoder.
Capture Dimensions
- 1024x768 on most iPhones
- 1280x720 on iPhone X and models that do not have support for 1024x768
- 640x480 on iPhone 6s and earlier models
iOS devices support high resolution capture formats with ratios of 1.33:1 and 1.77:1. When simulcasting, it is often desirable to produce a squarish ratio (1.25:1) that can be viewed by subscribers in landscape or portrait, and as smaller thumbnails. Cropping is performed at the source by using a format request. Besides changing the ratio of the captured video, cropping also reduces the number of pixels that need to be processed by the software encoder. Using 1280x720 or 1024x768 for video capture will result in 3-layer simulcast with the layer structure as shown in the table above. Using 640x480 is recommended on older iPhones and will result in 2-layer simulcast.
Other Considerations
If a Group Room is being used, it is recommended to remove the rotation tags using hardware acceleration using this API. Also, it is recommended to reduce the audio bitrate tuned for speech content.
Sample Code
The above recommendations are implemented in this code snippet:
struct CaptureDeviceUtils { // Produce 3 spatial layers ~ {960x768, 480x384, 240x192}. 1024x768 is captured on most phones // Produce 3 spatial layers ~ {900x720, 450x360, 225x180}, 1280x720 is captured on on iPhone X static let kSimulcastVideoDimensions = CMVideoDimensions(width: 900, height: 720) static let kSimulcastVideoFrameRate = UInt(24) static let kSimulcastVideoBitrate = UInt(1800) /* * @brief Finds the smallest format that is suitably close to the ratio requested. * * @param device The AVCaptureDevice to query. * @param targetRatio The ratio that is preferred. * * @return A format that satisfies the request. */ static func selectFormatBySize(device: AVCaptureDevice, targetSize: CMVideoDimensions) -> VideoFormat { // Arranged from smallest to largest. let formats = CameraSource.supportedFormats(captureDevice: device) var selectedFormat = formats.firstObject as? VideoFormat for format in formats { guard let videoFormat = format as? VideoFormat else { continue } if videoFormat.pixelFormat != PixelFormat.formatYUV420BiPlanarFullRange { continue } let dimensions = videoFormat.dimensions // Cropping might be used if there is not an exact match. if (dimensions.width >= targetSize.width && dimensions.height >= targetSize.height) { selectedFormat = videoFormat break } } return selectedFormat! } let options = CameraSourceOptions { (builder) in // Stripping rotation tags using hardware acceleration builder.rotationTags = .remove } camera = CameraSource(options: options, delegate: self) // Assume front camera is available let frontCamera = CameraSource.captureDevice(position: .front) if let camera = camera { localVideoTrack = LocalVideoTrack(source: camera, enabled: true, name: "Camera") // Discover a simulcast format for the front camera let format = CaptureDeviceUtils.selectFormatBySize(device: frontCamera!, targetSize: CaptureDeviceUtils.kSimulcastVideoDimensions) // Lower the frame rate to reduce CPU load, but still produce 3 temporal layers (f, f/2, f/4) format.frameRate = CaptureDeviceUtils.kSimulcastVideoFrameRate // Apply slight cropping to reduce CPU load, and provide square-ish video let croppedFormat = VideoFormat.init() croppedFormat.dimensions = CaptureDeviceUtils.kSimulcastVideoDimensions camera.requestOutputFormat(croppedFormat) camera.startCapture(device: device, format:format) { (captureDevice, videoFormat, error) in if let error = error { self.logMessage(messageText: "Capture failed with error.\ncode = \((error as NSError).code) error = \(error.localizedDescription)") } } } let connectOptions = ConnectOptions(token: accessToken) { (builder) in if let localVideoTrack = localVideoTrack { builder.videoTracks = [localVideoTrack] } builder.isNetworkQualityEnabled = true builder.networkQualityConfiguration = NetworkQualityConfiguration(localVerbosity: .minimal, remoteVerbosity: .minimal) // Enable Vp8 simulcast, and cap the bitrate at 1.8 Mbps to reduce strain on the sender. Reduce audio bitrate for speech content. builder.encodingParameters = EncodingParameters(audioBitrate:16, videoBitrate:1800) builder.preferredVideoCodecs = [Vp8Codec(simulcast: true)] }
Android
Capture Frame Rate
24 FPS. When simulcasting, this will result in 3 temporal layers of 24 FPS, 12 FPS, and 6 FPS. Selecting 24 frames / second instead of the default of 30 reduces the CPU load on the VP8 encoder.
Capture Dimensions
- 1280x720 on Android devices that support VP8 hardware acceleration
- 1024x768 on more recent Android devices that do not support VP8 hardware acceleration
- 640x480 on older Android devices
Using 1280x720 or 1024x768 for video capture will result in 3-layer simulcast with the layer structure as shown in the table above. Using 640x480 for video capture will result in a 2-layer simulcast.
Other Considerations
It is recommended to reduce the audio bitrate tuned for speech content.
Sample Code
The above settings are specified as part of the Video Constraints API as shown in the code snippet below:
import tvi.webrtc.MediaCodecVideoEncoder; VideoDimensions videoDimensions = VideoDimensions.VGA_VIDEO_DIMENSIONS; if (MediaCodecVideoEncoder.isVp8HwSupported()) { videoDimensions = VideoDimensions.HD_720P_VIDEO_DIMENSIONS; } VideoConstraints videoConstraints = new VideoConstraints.Builder() .maxFps(VideoConstraints.FPS_24) .maxVideoDimensions(videoDimensions) .build(); LocalVideoTrack localVideoTrack = LocalVideoTrack.create(context, true, videoCapturer, videoConstraints); // Enable network quality information for local and remote participants NetworkQualityConfiguration configuration = new NetworkQualityConfiguration( NetworkQualityVerbosity.NETWORK_QUALITY_VERBOSITY_MINIMAL, NetworkQualityVerbosity.NETWORK_QUALITY_VERBOSITY_MINIMAL); ConnectOptions connectOptions = new ConnectOptions.Builder(accessToken) .enableNetworkQuality(true) .networkQualityConfiguration(configuration) .videoTracks(Collections.singletonList(localVideoTrack)) // Cap the bitrate at 1.8 Mbps to reduce strain on the sender. Reduce audio bitrate for speech content. .encodingParameters(new EncodingParameters(16, 1800) // Enable Vp8 simulcast .preferVideoCodecs(Collections.singletonList(new Vp8Codec(true))) // Enable simulcast .build();
Pros and cons of Simulcast
When enabling Simulcast in your Group Rooms application you enjoy the following advantages:
- VP8 subscribers enjoy differentiated quality adapted to their available bandwidth. This significantly improves the quality on Group Rooms with many heterogeneous Participants.
- VP8 subscribers are isolated from each other so that a subscriber with a degraded network link does not affect the reception quality of other subscribers.
On the other hand, Simulcast also has some drawbacks:
- Simulcast only contributes to improve the video quality in Group Rooms with 3 or more Participants.
- Publishers battery consumption is higher due to the need of encoding multiple versions of the same video track.
- Publishers bandwidth consumption is higher (up to double in some cases) due to sending multiple versions of the same video track. Note that this increase does not impact your Programmable Video costs as Twilio does not charge upstream (i.e. from sender to Twilio’s cloud) bandwidth.
Limitations and known issues
- Simulcast should only be used in Group Rooms. Using it in P2P Rooms does not improve quality and only contributes to degrade application performance.
- Simulcast is only supported for the VP8 video codec.
- The combination of Simulcast and oscillating bandwidth conditions at the publisher might generate suboptimal recording qualities. If the primary objective of your application is to have optimal recording video quality you might prefer not to enable Simulcast on it.
Need some help?
We all do sometimes; code is hard. Get help now from our support team, or lean on the wisdom of the crowd browsing the Twilio tag on Stack Overflow.