Voice-over-IP can be great, especially in places like call centers or offices with a managed network featuring properly applied quality of service, hefty bandwidth, and symmetric uplink/downlink allocations. In spots like these the improvement over traditional telephony speaks for itself: lower cost, elastic scalability, and call quality that is as-good-or-better-than plain old telephone service (POTS).
The issue is those pesky unmanaged networks, where a customer service agent working from home, a sales rep making a call from a coffee shop, or a voice application user on a cellular network may have no visibility into the performance of the network they are on, and in most cases zero ability to make changes to the network.
The overwhelming majority of call quality issues we see at Twilio are due to the local network conditions at the VoIP user end. Based on analysis of our Voice Insights data, on an average day a VoIP call is 60 times more likely to experience quality degradation than a PSTN call.
In the world of VoIP, network metrics and audio quality are essentially synonymous, and our analysis of hundreds of billions of calls over more than ten years indicates network transport issues are the number one contributor to reports of audio quality degradation for VoIP calls.
In the old school world of POTS the only transport metric that matters is physical continuity of copper wires buried underground. In a VoIP telecom deployment when people talk about "choppiness" on a call what they are actually talking about is packet loss. When they talk about "noise" or "robotic speech" they are almost certainly talking about jitter. "Talking over each other" is a symptom of round-trip time (RTT) or latency.
We collect, monitor, and report on these metrics because without visibility into the underlying behavior of the network infrastructure it is not possible to detect, diagnose, or resolve the causes of quality issues.
So, what to do with this information once we've got it? Since we know that network performance is highly correlated with call quality, we can dip our toes in the network before placing or answering a VoIP call, and if we find the conditions undesirable, we can choose to connect the user using plain old telephone service instead.
Browser-based calling: Twilio.PreflightTest
For workflows where the VoIP call is originating in the browser, Twilio provides a preflight test API you can run before placing an outbound call, or run periodically throughout the day as a user waits for incoming calls. Depending on the results of the preflight test you can then decide to pass on attempting VoIP and instead connect your user using the PSTN.
The easiest way to decide whether or not to proceed is probably to look at the MOS score returned in the report. MOS stands for "mean opinion score" and is basically a function that takes jitter, latency, and packet loss as inputs, and spits out a number between 1-5. Five represents a perfect call and is an impossible value, you'll never see it because latency can never be zero; don't blame us, blame the speed of light - we're not wizards you know. One represents a truly terrible call. In the industry a MOS score above ~4.2 is generally considered to be good quality. Voice Insights will tag as a call as having low MOS if the score is below 3.5, but Voice Insights is really trying to capture noticeable-but-tolerating with this tagging threshold. If you're looking for a quick and easy thumbs up/thumbs down do-not-proceed threshold, anything at or under 2 should be your no-go value. Here be dragons.
Depending on your use case, or the geographic location of your users, MOS might be too broad and imprecise. For example, if you happen to be operating a call center in the Philippines, you may find that due to the physical distance from the registered edge location all of your Voice SDK connections have high RTT values that result in MOS scores that are lower than the subjective experience of the user on the call indicates.
Additionally, browsers like Chrome have dynamic adaptive jitter buffers that can mask the impact of jitter by introducing a little latency, and codecs like Opus have packet loss concealment algorithms that can smooth out the impact of missing packets. You may find that the tolerance threshold of your users is much less sensitive than what the metrics, events, and call tags indicate thanks to all of the heavy lifting the browser is doing.
In order to tune these thresholds it's critical to gather subjective feedback from your users. You can do so by using the Feedback API in the SDK. The Feedback API allows you to capture 1-5 and provide an issue type, e.g. one-way audio.
In practice, it's best to just give your users a simple binary thumbs up / thumbs down where a thumbs down is a 1 and thumbs up is a 5. If your user gives the call a 5 then you know they tolerated that network condition regardless of what jitter, latency, MOS, and packet loss values were. If they give a 1, then go a little deeper ask them what type of issue they experienced and correlate that back to inform your decision making in the future. For example, if you find that your users don't seem to notice jitter values that make MOS take a nose dive, but they are highly sensitive to RTT values over 250, then you can chose to ignore jitter and MOS, and focus on RTT to decide on whether or not to roll with the PSTN.
Today the Twilio Voice SDKs for Android and iOS don't come out of the box with a preflight test like the JS SDK does; however, we can still get a sense of whether or not a VoIP call might be a good idea by calling on OS-level system APIs to check the type of network the user is on and get a peek at the the signal strength.
Of course it's not as simple as saying "Oh, this user is on WiFi? Let's use VoIP!" or "This user is on a cellular network, we should use the PSTN." In 4G and especially 5G situations the cellular network quality may be better than what is offered by a WiFi access point slapped on top of a DSL connection.
Inbound vs. Outbound Calls on Mobile SDKs
One challenge mobile applications face is the need for the application to be active for you to be able to make the API calls described above which can be a challenge for inbound calls.
On Android the high-priority FCM received for an inbound call will stay alive for 40 seconds (Twilio Call TIMEOUT) which should give developers enough time to spawn a background thread to check the signal strength and decide whether or not to bail on VoIP and try the PSTN. Similarly, VoIP push notifications using PushKit are given runtime to process a push, even if the app is operating in the background.
If you're looking at post-call data trying to figure out what went sideways, one easy way to know about sketchy network quality is to monitor whether or not the push notification even shows up on the end device. In these cases you'll see a last SIP response code of 487 in Voice Insights, which indicates that an attempt to reach a previously registered endpoint timed out; contrast this with a SIP 480 which means the user was not registered at the time the call was created.
One thing to keep in mind with these preflight tests is that they are a sample in time, and that the conditions may shift, in either direction, after the test has been performed. Mobile users in particular, since they are, well, you know, mobile, can encounter big swings in performance from one step to the next. You know how it goes, you get in your car, everything sounds great, you drive away from your house and switch from your WiFi to the cellular service and… blam-o!
Even in browser-based calling sometimes a switch from a wired connection to WiFi or moving around the house with a laptop can introduce issues that weren't present at the time the preflight test was performed, or eliminate issues that were present.
So how do you respond to these in-flight changes? The best response is to inform your user.
We expose network and audio warnings in the Voice SDKs (Android | iOS | JS) to allow developers to identify and respond to changing quality conditions before their users notice by surfacing warnings in their applications and giving prescriptive instructions.
For example, you can implement handlers for network-quality-warning-raised group to warn users that their local network conditions might be impacting call quality and use the SDK to display warnings and provide prescriptive actions to users in the application; e.g. “check headset connection” or “move to an area with better WiFi” or "quality issues detected, try switching to a wired connection".
As mentioned above, another best practice is to create post-call surveys using feedback events asking users to rate the subjective quality of experience, then correlate those responses with other metrics and properties to identify commonalities in call behavior changes.
As you are gathering the results of these preflight tests and subjective feedback from your users you can tune your decision making process based on these inputs so your application gets better and better at understanding what kind of network conditions really bother your users.
One complicating factor is the vast difference in tolerance between users; some users are highly sensitive to latency or "talking over each other" but don't mind a little noise or choppiness, and some users balk at the first crackle of jitter. There are also noticeable regional differences, with users in different parts of the world having wildly different thresholds for acceptable quality.
One way you can get in front of this is to use the data from your preflight tests, subjective user feedback, and correlate with location data. This could enable you to very finely tune your decision to use VoIP or PSTN, potentially all the way down to being able to identify which neighborhoods in a given city have poor connections for a particular cellular carrier.
Now you know how to test the waters in the local network to help inform your application whether or not to try a VoIP call or to fall back to the good old trusty PSTN. Take a look at your Voice SDK applications and make sure you are taking advantage of the preflight tests, system APIs, network/audio quality warnings, and gathering subjective feedback from your users to make informed decisions about which network to use. We can't wait to see what you build!
Michael Carpenter (aka MC) is a telecom API lifer who has been making phones ring with software since 2001. As a Product Manager for Voice & Video Insights at Twilio, the Venn Diagram of his interests is the intersection of APIs, SIP, WebRTC, and mobile SDKs. He also knows a lot about Depeche Mode. Hit him up at email@example.com