Building a Video App: How to Monitor and Scale Your Project

September 27, 2022
Written by
Reviewed by
Lyssa Test
Paul Kamp

Building a Video App: How to Monitor and Scale Your Project

The transition to digital communication has brought video applications to the forefront of many businesses. Organizations are transforming how they interact with customers and how customers interact with one another. Twilio’s 2022 State of Customer Engagement report found that 70% of companies are currently investing in digital customer engagement and have plans to almost double their investment by 2025.

Business leaders considering the move to productizing video must navigate this transition attentively. To help, this article will cover the significant technical and implementation concerns your team needs to consider before kicking off your project. We’ll do a deep dive into the technology side, looking closer at:

  • Application Tech Stack
  • Infrastructure
  • Day Two Concerns
  • Performance and the Integration of New Technologies

This article is part two of three in our series. To revisit part one which covers the planning, requirements, and project concerns to consider before starting your video project, read our post How to Create a Video App: What to Consider Before You Start.

Application Tech Stack

When deciding on video integration, first consider your application tech stack. Should you build your own platform, or use an API? We covered platform choices in part one of this series, but now let’s get a little more technical.

Making an informed decision here means considering the level of customization required for your video, the structure and skillset of your build and support team(s), and any constraints your organization has regarding the use of open-source software (OSS).

You should also consider whether your organization can manage your application at scale on-premises. Deploying and maintaining a media streaming system that runs smoothly is complex, requiring considerable investment. Infrastructure costs and operational support costs will skyrocket as you scale.

If you decide to build your own platform, you’ll need to understand WebRTC and ICE. Most video chat systems rely on WebRTC, a standard way of enabling video communications across platforms. WebRTC handles capturing media from a user's device and establishing peer-to-peer connectivity for streaming.

Connectivity requires the use of Interactive Connectivity Establishment (ICE), a standard that covers Session Traversal Utilities for NAT (STUN) to identify a public IP and Traversal Using Relay around NAT (TURN), which a server implements to allow for client-to-client interaction.

A diagram showing how TURN and STUN servers allow client-to-client interaction.
A diagram showing how TURN and STUN servers allow client-to-client interaction.

In WebRTC, client communication is handled at the signaling layer, allowing events to flow between clients. Twilio’s Video API encapsulates all aspects of WebRTC and signaling, providing an API tailored to connecting to rooms and listening to events, rather than worrying about the low-level WebRTC and signaling concerns.

Authentication—across all video platforms—requires a server-side component that generates a token, which is then consumed by a client when connecting to a session. Twilio uses an API Key Secret which, of course, should be secured. API key exposure would allow malicious users to connect to any video stream or gain a privilege escalation. In addition, using Twilio's subaccounts feature provides two levels of keys for more granular access control. Periodically rotating the parent key is another useful security measure.

If your organization doesn’t need a self-hosted solution, then a SaaS offering like Twilio would likely fit better. Rather than offering a video application, Twilio provides the communication APIs—all the essential components required to integrate audio, video, and chat within existing products and systems. To implement video with Twilio, you call the APIs from your existing codebase, whether on-premises or in the cloud.


When it comes to infrastructure, consider whether your application will be hosted fully on-premises or in the cloud and whether you’ll use a SaaS platform or a server-based system.

Fully On-Premises

For on-premises models, deployment concerns and capacity requirements will be your biggest challenges.

Deployment Planning

A platform like OpenVidu uses a Docker-based model for deployment. Its on-premises deployment document outlines necessary components, which include:

  • OpenVidu for signaling
  • Kurento media server for streaming
  • Coturn for ICE functionality

Of course, operating Docker on-premises at scale also requires a container orchestration platform (like OpenShift or Kubernetes). All of this together requires significant investment in capital and operational knowledge. For organizations that don’t already maintain such infrastructure, consider a solution like Twilio.

Capacity Planning

Coturn mediates nearly all streams and will need significant bandwidth. Each client can use several megabits in each direction. Fortunately, Coturn scales linearly. Adding capacity simply means adding new instances, and the cost increase is predictable.

Processing capacity is a separate concern, however, with transcoding and storage of media streams handled by Kurento. An evaluation of Kurento’s performance by testRTC found an AWS instance (8 vCPU, 15GB of RAM) could reliably support up to ten 1:1 sessions or three group sessions of four users each. Multiple instances of Kurento may be needed to support the scale of your business application while providing spare capacity as headroom.

Recording and sharing sessions means considering the cost and mode of storage. Your application might record and store individual streams, or it might combine all the streams into a grid for a single recording.

For example, consider a session with two users, both streaming at 720p. At roughly 1Mb bitrate per stream, each minute will need 10MB of disk space. The stream quality, number of users, and average length of video sessions will all contribute to your calculation of required disk space. CPU resources for video post-processing and video file encryption should also be considered.

SaaS Platform and Cloud or On-Premises Application

If your organization doesn't have a private cloud or can't invest in building a scalable platform, then your best option is a SaaS solution from an organization that has already developed scalable solutions with easy integrations.

When considering a SaaS, the focus shifts instead to the platform's integration and features. You can refer to part one of this series for more in-depth coverage of the offerings, costs, and distinctives of the different SaaS providers.

In selecting a SaaS provider, you'll want to consider:

  • What kind of APIs and SDKs do they offer?
  • What browsers and devices are supported?
  • Does the platform support 1:1 sessions and/or group rooms?
  • How easy is it to integrate additional features like a shared whiteboard or group chat?
  • How does the platform handle recordings—storage location, access, encryption, etc.?

It's likely that no SaaS provider will fit your needs perfectly, so you'll want to weigh the tradeoffs and drawbacks:

  • An SDK may provide easy integration but have no recording functionality
  • There may be limitations in how recordings are stored or accessed
  • If the provider holds the recordings’ encryption keys, then it could theoretically access the videos internally.

Twilio, on the other hand, does not provide a pre-built client. Instead, Twilio provides the building blocks to create video applications. This surfaces some notable differences, especially when using Twilio's Security or Enterprise edition:

  • Recordings and compositions are written directly to AWS S3, bypassing Twilio’s cloud storage. Currently, Twilio only has integration with AWS, so cloud storage must utilize AWS.
  • Encryption keys can be configured such that content is stored encrypted and cannot be decrypted by Twilio under any circumstances.

Video stream encryption may be especially important if you offer telehealth services, which require confidentiality. With Twilio, all communication streams traverse encrypted channels, and recordings are encrypted at rest. Twilio will sign a Business Associate Addendum (BAA) to ensure that you can build a HIPAA-compliant application.

Day Two Concerns

After your application goes live, your focus will shift to day two concerns. This includes observability, dashboards, and long-term storage considerations.


Application observability provides context for understanding errors and tracking down the root cause of issues. Your system collects metrics, logs, and traces—on the client-side, the server-side, and across network interactions between both.

Many systems use OpenTelemetry to capture instrumentation data across their distributed systems. Platforms like Datadog now offer native functionality to ingest information from OpenTelemetry. Twilio provides a Request ID header that can be added to OpenTelemetry traces, allowing for visibility into events that happened within Twilio’s systems.


Dashboards, like Twilio’s Video Insights, bring real-time visibility to your telemetry, tracking usage and generating alerts when the system encounters connection spikes, failures, or resource hogging. Dashboards also track user-generated alerts related to violations of acceptable use and community policies.

Twilio's webhooks can collect data on rooms created, participants joining or leaving, and recording failures. And Video Insights brings self-service tooling to the Twilio Console to provide analytics and aggregations for observing your application, discovering trends, and troubleshooting rooms and participants. Having visibility into the usage, quality, and performance of your video calls is crucial to building great customer experiences.


In the long term, you will need to consider how recordings are archived or removed based on specific thresholds.

Certain host-based providers offer a storage limit per host. Twilio, on the other hand, charges a low, flat rate by GB stored per day. Currently, 1TB of video storage costs about $50 versus another provider’s $100/mo for 0.5TB. Using Twilio means deleting files less often for cost reasons.

Also important is the process of cleaning up recordings to prevent sensitive PII leaks in archived videos. If archiving specific recordings is required, then it is best to build automation to download those files, ensuring encryption with rotating keys by using HashiCorp’s Vault.

Performance and Integration of New Technologies

To build a more compelling video experience, some additional considerations merit your attention.

Performance and Quality

The optimal experience in high-quality video applications often comes at the intersection of resources available (screen size, CPU, bandwidth) and the video chat type. Features like Twilio’s Network Bandwidth Profile API, Dominant Speaker Detection API, Track Priority API, and Network Quality API ensure that the correct streams are prioritized and rendered, which helps to minimize end-user resource usage, like CPU, memory, and battery. Your application can create a meaningful experience by specifying the render dimensions, maximum bitrate, and maximum number of video tracks.

New Technologies

Several emerging technologies can further enhance the video experience.

One idea involves performing machine learning guided noise cancellation on the audio track of remote participants. In addition, you can perform audio post-processing on streams to ensure that the necessary audio is most prominent.

Also on the horizon is the integration of augmented reality. Twilio offers a DataTrack API that allows users to share low latency messages. An application could use this additional channel to create overlays of information onto a video stream.


With the acceleration of digital communication, integrating video chat is vital to keeping your apps relevant. The implementation concerns involved—your tech stack, infrastructure, day two concerns, and integration of new technologies—require careful consideration while you plan and help your team build the best video communication experience for your users.

Ready to move forward with your video app? Download our full Building a Video App: What to Consider Before You Start guide for everything you need to understand and make the best technical choices for your brand. With these

Or, continue on to the next section of our three-part blog series, Building a Video App: The Complete Buyer’s Checklist.