Sensors in the Twilio Voice SDKs and on Twilio media gateways gather call metrics and events and send them to the Voice Insights platform for analysis and aggregation.
In the world of Voice-over-IP network metrics and audio quality are essentially synonymous, and our analysis of hundreds of billions of calls over more than ten years supports the theory that network transport issues are the number one contributor to reports of audio quality degradation for VoIP calls.
In the old school world of plain-old-telephone-services the only transport metric that mattered was physical continuity of copper wires buried underground, but in a VoIP telecom deployment when people talk about "choppiness" on a call what they are actually talking about is packet loss. When they talk about "noise" or "robotic speech" they are almost certainly talking about jitter. Many reports of "one-way audio" and "dropped calls" are really high post-dial delay.
We monitor and report on these metrics because without visibility into the underlying behavior of the network infrastructure it is not possible to detect, diagnose, or resolve the causes of quality issues.
In-stream audio issues like echo, or noise that is not related to jitter or packet loss, can't be detected by Voice Insights today but may be captured in any recordings of the calls made using Twilio. In our experience, most echo is due to a call participant having their speaker volume or microphone gain set high enough that the output of the speaker is picked up by the microphone and pumped back into the media stream.
Twilio's media gateways can detect missing RTP streams or streams which contain only silence. We will mark those calls has having contained silence in the Call Summary API response and in the Call Summary properties in Console. Note that silence detection is not the same thing as human speech detection, so a call could conceivably contain no one speaking but display enough background noise to result in Twilio failing to mark the call as having been silent.
We look at our external signaling edge and the direction of which party sent the SIP BYE.
We use the source of the SIP BYE at our signaling edge to determine who hung up. From a SIP perspective a call needs to have been answered before it can be ended via a SIP BYE, so calls that are
canceled we will not provide who hung up; however you can infer that any unanswered call was ended by the calling party.
For Voice SDK calls, who hung up might be missing due to multiple reasons: computer crashes, network issues, the browser being closed or crashing while the call was in progress, the client object being destroyed before the event could be sent to Twilio Insights, or any other behavior that results in Twilio not receiving the final events from the SDK unexpectedly. These will appear as "disconnected by: unknown" when viewing Insights data in Console and aggregate reports.
Dropped calls are challenging to detect programmatically because the overwhelming majority of calls that are reported as "dropped calls" look normal from an application and signaling perspective. The best way to know if a call was a dropped call is to ask your user. The Voice SDKs come with a feedback API that can be integrated directly into your application to capture the subjective user feedback. For non-SDK calls Twilio provides a separate Feedback API.
In the absence of direct input from your users, the Voice Insights Call Summary includes the final SIP response which allows you to locate calls that did not end normally; any calls that do not end with a SIP 200 OK are considered abnormal sessions.
For Voice SDK calls you can look at the disconnected by reason; if you see unknown that means that a computer crash, browser crash, network loss, or application error prevented the final event reporting to Twilio, and whatever condition caused this almost assuredly resulted in the call being dropped as well.
For Voice SDK calls we sample each second; for Carrier and SIP calls we sample the cumulative stats for the previous 10 seconds every 10 seconds.
Our quality thresholds are based of ITU-T standards for VoIP quality, and do indeed lean toward sensitivity. The goal behind highlighting issues at-or-below the edges of perceptibility is to allow detection and mitigation to occur before users notice; however, we also provide cumulative metrics in the call Summary, which allow you to craft your own internal threshold for performance.
Why are the network and audio warnings emitted by the Voice SDKs more sensitive than the tagging thresholds?
Almost all quality issues for Voice SDK calls are due to local network conditions, which are typically caused by things like misapplied or absent quality of service (QoS), asymmetric bandwidth allocations, or bandwidth limitations. We expose a more sensitive threshold of data for Voice SDK calls to allow developers to identify and respond to changing quality conditions before their users notice by surfacing warnings in their applications and giving prescriptive instructions; e.g. "quality issues detected, try moving to a different location".
Different destinations have different expectations for post-dial delay, and accordingly varying thresholds for acceptable PDD. In the USA, for example, PDD higher than 6 seconds is typically something that could be escalated to the carrier; however, in South Africa PDD of 10 seconds is common and considered acceptable, so tagging calls to South Africa with PDD > 6 seconds would result in us tagging basically all the calls in South Africa, reducing the utility of the data to uncover outliers in performance.
Due to GDPR Voice Insights data is only stored for 30 days, so the Dashboard is limited to 30 days worth of data. Since only 30 days worth of data is available, we can only show comparisons for up to 15 days. CSV export is limited to 2000 rows.
For SIP Interface and Elastic SIP trunking calls you can view your local SIP infrastructure logs and pcaps, and compare with the Twilio public pcap available in the call resource page in Console.
Voice SDK SIP errors are often due to unexpected application behavior. We recommend enabling debug logging in your apps and reproducing the issue to understand the cause.
SIP errors that originate from the carriers are currently not actionable by you directly. In case there are issues with a specific carrier are impacting your users, you can reach out to Twilio Support with the data to get more help.
Voice Insights data is retained for 30 days after the call is made.
Why is there a discrepancy between the numbers we see in the call logs and the Voice Insights Dashboard?
In order to provide the dynamic filtering and search functionality in the Voice Insights Dashboard the top-level aggregations don't operate on values directly but on their hashes, and thus they return approximate results. The call logs use linear counting which has excellent precision on low-cardinality sets but does not provide multi-dimensional searches and top-level filtering like the Voice Insights Dashboard does. In practice, we typically see differences of less than five on sets of calls with ~100K results.
Enable Voice Insights Advanced Features using the Voice Insights Settings page in Console to see call events and metrics.
Missing metrics/events/summaries can be caused by:
- local network configuration blocking publishing to the Insights events gateway
- expired tokens
- publishing delays on the Voice Insights backend
Advanced features data is available only while the feature is active on the account, any calls placed before the feature is enabled will not be flagged by the Voice Insights infrastructure, and the per-interval metrics and event stream will not be stored.
Twilio RTP latency shows the average and max Twilio-internal media stream traversal time in milliseconds. The Voice Insights platform analyzes the timestamps of when RTP packets are received at the ingress of Twilio's media gateway and compares that timestamp against the timestamp of the same packets at the egress on the other media edge.
The overwhelming number of calls marked as having high latency are calls with participants in distant geographic locations; e.g. a Voice SDK user in the Philippines dialing into a conference being mixed in Ireland. Calls placed using a conference call flow will have a small jitter buffer that can result in conference calls being more prone to be marked as having high latency. Optimizing the region selections for Voice SDK instances and conferences is the best way to mitigate the impact of internal RTP traversal time. Ultimately if you have users separated by large distances, some degree of Twilio-internal latency can’t be avoided.
No. The SDK-level events like constant audio warnings and low MOS warnings can be instrumented without needing to pay for advanced features. Availability of the events via Console is an advanced feature.
Implement handlers for network-quality-warning-raised group to warn users that their local network conditions might be impacting call quality—use the SDK to display warnings and prescriptive actions to users in the application; e.g. “check headset connection” or “move to an area with better WiFi”.
Implement handlers for audio-level-warning-raised events to show visual indication to users that their audio is not being detected by the application.
Create post-call surveys using feedback events asking users to rate the subjective quality of experience and correlate responses with other metrics and properties to identify commonalities in call behavior changes.
The “To” for a Voice SDK call is the Twilio application SID; the TwiML returned from the webhook configured in the application SID will create a child call, and that child call will contain the expected “To” details.
Why do we see jitter on the incoming stream of one side of this call, but no jitter on the outgoing stream of the other call?
Twilio’s conference mixers have a small jitter buffer that can reduce or eliminate jitter received on one call before passing it along to the other calls. The tradeoff for this is a proportional increase in latency.
High round trip time (RTT) indicates the measurement of how long it took packets to arrive at the Voice SDK application sensors from Twilio’s media gateway has breached 400 ms in 3 out of the last 5 samples.
High latency indicates that the average of the Twilio internal RTP traversal time was greater than 150 ms.
This call looks like there was no jitter or packet loss on the incoming stream to Twilio, but on the outgoing stream of the child call we see some jitter and packet loss, is Twilio introducing these?
It’s possible. RTP media streams are sent via User Datagram Protocol (UDP) which is a fire-and-forget protocol with no error correction. Jitter and packet loss for UDP transport is a fact of life in any network, and calls whose media streams traverse large distances (e.g. a Voice SDK call in Singapore terminating in a conference mixed in Brazil) are more likely to be impacted by inevitable-but-rare backbone-level jitter and packet loss.
Why does this Voice SDK call have low bytes sent/received warning events, but no packet loss warning?
WebRTC uses RTCP to indicate the number of packets sent from the gateway to the Voice SDK application. Based on the RTCP sender reports if the sensors in the SDK are expecting packets, but packets don't arrive, those expected-but-missing packets will be calculated as packet loss. If the RTCP sender report indicates that no packets are expected, the absence of packets will not be calculated as packet loss, but will be reported as low bytes sent/received.
Based on the transport metrics this call should sound terrible, but the users didn’t report any problems?
It’s challenging to infer the subjective user experience from transport-level metrics alone. In general, it’s safe to say packet loss >5% is going to result in noticeably choppy audio; average jitter >5 is going to result in robotic audio, and RTT > 1000 ms is going to result in people either talking over each other or long periods of silence between speech.
Transport-level metrics like jitter and packet loss are attempts at monitoring indicators that can contribute to call quality degradation; however, browsers like Chrome have dynamic adaptive jitter buffers that can mask the impact of jitter by introducing latency. Codecs like Opus have packet loss concealment algorithms that can smooth out the impact of missing packets. At the SDK level we don’t have a good sense for the jitter buffer or packet loss concealment activities at any given time, so we report on the underlying transport metrics in the absence of subjective feedback. SIP and PSTN carrier infrastructure may also be implementing jitter buffers or transcoding to reduce quality issues.
Not directly; the IP address reported to Insights would be that of the VPN IP, not the local device location. You can sometimes infer this from conversations with users; e.g. a user who is physically located in Germany but whose IP is showing up as being in Spain.
Not at this time. We are working on including signal strength and battery metrics in future versions of Insights for the mobile SDK, but the PLMN conditions are not available today.
How can we check the external network status? What are the tools used to check the external network for audio issues?
The metrics reported at the carrier gateway represent what Twilio received from the PSTN. Twilio’s Super Network is monitoring these connections in real-time and will raise quality degradations to carriers and file incidents as appropriate. Destination carriers often have status pages that can be checked for their local conditions as well, but they tend to be very conservative and slow to update.
Voice Insights is limited to inferring quality issues from proxy metrics like jitter and packet loss or silence detection. Voice Insights has visibility into what we sent from, and was received by, our media edges; degradation introduced on the way to the destination due to poor signal strength or a carrier issue would be transparent to Voice Insights today.