Voice Architecture and Best Practices for Independent Software Vendors (ISVs)

August 12, 2021
Written by
Reviewed by
Jay Parisi
Twilion

ISV_voice_header

Intro 

Twilio is a trusted provider in the telecommunications industry, helping customers navigate the complexities of the ever-changing communications ecosystem. Twilio partners with a wide variety of Independent Software Vendors (ISVs) that leverage Twilio technology to build robust, scalable, and innovative communication infrastructures. ISVs serve hundreds, if not thousands of customers - building elegant and scalable platform architecture to support these customers is critical. In this blog post, we aim to provide ISV product managers, architects and developers with the design principles, product knowledge, and best practices needed to launch a successful ISV voice solution. At the end of this blog post, you should be equipped to start implementation on a voice enabled product specifically designed with your customer’s needs in mind.

Challenges for ISVs

Voice connectivity and call orchestration can be complex - ISVs need to provide reliable, easy to integrate, and flexible voice solutions to customers. ISVs should consider what type of voice solution makes the most sense for their target users:

  • A complete end-to-end voice product without custom integration
  • An open voice solution that allows users to integrate existing voice infrastructure with your product

It may also be pertinent to provide call statistics and aggregate metrics to your end users. Twilio offers a variety of methods for ISVs to consume call data and expose this to end customers through dashboards and call logs.

Lastly, number management and regulatory compliance need to be as simple as possible for your end users. Thankfully Twilio has a suite of APIs built specifically for ISVs to manage this on behalf of their customers.

Prerequisites

Table of Contents:

  • Common Voice Use Cases
  • Product Briefs - brief overview of Voice related products
  • Key Questions - for your team to ask yourself and your customers
  • Architectural Considerations
  • Example Architectures/Configurations
  • Best Practices - for voice-specific workloads
  • Voice Checklist

Common Voice Use Cases

Twilio supports thousands of ISVs in a variety of different voice-based use cases.

  • Contact center/IVR
    Talkpush uses Twilio SMS, voice calls, and web-based IVR as the backbone of their interviewing technology, enabling Talkpush to conduct thousands of interviews in a condensed format.
  • Lead generation
    Trulia leverages Twilio to connect prospective home buyers with real estate agents to get questions answered in real time using a bespoke click-to-call and call tracking solution.
  • Outbound dialer
    When Zendesk, the industry-leading SaaS helpdesk platform, wanted to embed voice calling in its service, it chose the global scale of Twilio. Zendesk now uses Twilio voice in over 40 countries around the world.
  • Field services
    Carsforsale.com chose Twilio to connect 22,000 dealerships with 100 million annual car shoppers. Dealerships use text, email, and voice to initiate conversations and build relationships with customers as they navigate the car buying and selling process.
  • Secondary business phone lines
    GoDaddy uses Twilio SIP trunking to power their Smartline solution, giving small business owners a second phone number to separate their business calls from their personal calls.

Product Briefs

In this section we will give a brief overview of common voice related solutions. Later in the blog we will work through different architecture implementations of the voice solutions.

  • Programmable Voice:
    An API that allows teams the ability to orchestrate a host of call flows. ISVs can embed this API into their solution to offer a variety of calling features without needing much telephony infrastructure.
  • Elastic SIP trunking:
    An interface that allows teams with an existing telephony infrastructure to connect via Twilio’s elastic trunks. For several ISVs, either they host / manage their own PBX or their customers need the ability to configure SIP trunks.
  • SIP Interface:
    SIP Interfaces allow ISVs to directly register their SIP devices to Twilio, connect existing SIP infrastructure using a private connection (such as a VPN or cross connect), and/or provide a way for SIP traffic to interact with programmable TwiML applications. These interfaces give ISVs a variety of different options for integrating customer’s existing telephony infrastructure with their solution.
  • BYOC Trunking
    An interface extending the function to Bring your own Carrier (BYOC) to other telephony providers. ISVs work with many different customers, some that have existing telephony solutions in place, and as such, ISVs need to be flexible to their customers.

Key Questions

Voice workloads can get complex, but by asking the right questions you can understand the level of complexity and make a plan to solve it. Below represent questions that your team should consider as you think about your customer use cases:

  • Is your company able to complete and gather the necessary regulatory compliance requirements in each country?
  • Will you provision numbers on behalf of your company or will you need to port numbers from existing telephony provider(s)?
  • Will you need private connections to your customers for QoS(quality of service) or does using the public internet meet your requirements?
  • Will you be required to provide E911 / emergency calling ?
  • Can you connect to your customers via the PSTN, SIP or WebRTC? If not, are you able to build a gateway or translation service?
  • Do you or your customers have existing PBX infrastructure you will need to integrate with?

Architecture Considerations

Now that we have a high level understanding of all the voice related interfaces, we will transition to more implementation details.

In this section we will discuss a variety of technical considerations that ISVs should think about.

Operational Considerations

Account structure

For ISVs partnering with Twilio, subaccounts are the best way to structure your accounts. Key benefits include:

  • Provision resources per tenant (your customer)
  • Segmented billing, connectivity credentials and analytics

Below is a high level architecture for Owl Inc. Owl Inc. is a fictional ISV customer. Owl Inc. has two projects; one for development (Dev) work and the other for production (Prod) traffic. The Prod account will contain subaccounts that Owl Inc. provisions for each of its customers.

subaccounts

 

If you want to learn more about subaccounts check out this blog.

Now that we have our account structure setup lets work through Phone Number management.

Phone Number Management

The first strategy to think about is which countries do you want to support voice workflows in.

Questions your team will need to think about are:

  • What countries do you want to offer voice calls in?
  • For each country, do you have the correct documentation necessary in order to procure a number?

For some countries a regulatory bundle is required before you are able to provision the number. 

  • Do numbers need to have both voice and messaging capabilities?
  • Do your customers have existing numbers they need ported over or can we provision a new number?
  • If porting, will they be outside of the US and Canada?
  • If porting, can we get the required documentation?  

Different countries may incur a porting fee. 

After you’ve answered these questions you will need to determine the right caller ID. There are different types of numbers world wide, however, unlike Sender IDs (for Messaging) they all are numeric based.

Below are the various types of phone numbers and if they are generally capable of voice workloads:

Number Type

Voice Capable?

Local

Yes

National

Yes

Mobile

Yes

Short Code

No

Alpha-numberic (Pre-register and/or Non Pre Register)

No

To see a country break down please review this list to ensure Twilio has the caller ID with the ability to place voice calls.

Device Management

Now that you’ve given thought to your phone number strategy, you’ll want to consider the devices (clients) that you will expose to your customer base.

With Twilio there are a few different ways to connect calls:

  • Phones connect via Public Switched Telephone Network(PSTN)
  • Devices connected via WebRTC which Twilo supports via our Platform SDKs
  • Javascript
  • iOS
  • Android
  • Devices connected via SIP. This can be done via a SIP domain or Elastic SIP Trunk
Limits

As your team looks to support a variety of use cases, we want to make sure you are aware of limits on the voice products. Below are links to limits that are enforced within the voice stack:

Elastic SIP Trunk limits

SIP Domain limits

Voice CPS

Cost Considerations

A large part of any solution is understanding the cost - and voice workloads are no different. Depending on the voice solution there will be a difference in the cost of the call. We will work through a few examples later on in the blog to show how different call flows can impact the cost of a call. You can always see pricing below for voice (filter by country on each page):

Programmable Voice Pricing

Elastic SIP Trunking Pricing

Performance Considerations

API latency Global Low Latency (GLL) and using Edge Locations

Make sure to use GLL or select the correct region when using Twilio Voice SDKs (Javascript, iOS and/or Android). Within the SDKs, teams can choose from different network regions to route to the determined Media server and Signaling server. If unsure which region to select, leverage GLL.  

If you are operating a restricted network that requires allowing of media IPs, our recommendation is to specify the edge location in your application to avoid one-way audio or call setup failures.

For API signaling, webSockets and Webhook StatusCallback, it is also important to specify Twilio’s API edge locations to optimize: Latency / Performance, Network failover, Enterprise security and Quality of service.  

For Example: customers with infrastructure in Australia can make use of the sydney edge location by using the base url of:

https://api.sydney.us1.twilio.com/2010-04-01

API Edge Locations both optimize routing to Regions, as well as significantly speeding up expensive operations, such as Transmission Control Protocol (TCP), Transport Layer Security (TLS), and authentication.

Calls per Second (CPS)

CPS is a measure of the amount of calls that need to terminate from Twilio at the same time. By default each account or subaccount will have 1 CPS. CPS can be raised via the Twilio console. There is a different configuration for Elastic SIP Trunks and Programmable Voice.

Image below for updating Programmable Voice CPS

console

Trunk Level Setting

console

When should I increase CPS?

It's great that you can update the CPS but how do you know when you should increase CPS?

Before we see how to determine if we should increase CPS that are a few questions you should consider:

  • Do you have specific SLAs to your customers for making calls? Or is a delay between your REST API outbound call and calls terminating from Twilio acceptable?

For Programmable Voice calls Twilio will queue calls and terminate at the CPS rate. For Elastic SIP Trunking any calls over the CPS will result in a 503 error.

  • What is your strategy to cover the increased Cost of Goods Sold (COGs) that come within increasing CPS?
  • Do our customers have burst traffic or consistent traffic patterns? CPS is allocated for a full month.

For ISVs the best way to determine if your CPS is at an acceptable rate is to observe the QueueTime attribute part of the Call Resource. Your application can set a limit and if it sees that the QueueTime exceeds the limit then it might be worth it to increase.

 

 

CPS will be charged at the aggregate ISV level rather than subaccount level. Please keep this in mind as CPS rates increase exponentially.

Security Considerations

API keys

Each account / subaccount created on Twilio will get its own Account SID and Auth Token. In the event that an Auth token is leaked, it can be very painful to revoke and create a new Auth Token (but can still be done via the Twilio Console).

An alternative strategy when giving applications permissions to send via a subaccount is to provision API keys. This way your team can programmatically revoke and mint new keys.

Here is an excellent blog on creating API keys and how to implement an API key rotation strategy.  

Validating Twilio Signature

Twilio will send an x-header within each API request we send to your infrastructure so that your team can protect your infrastructure. As best practice, ensure you validate these requests.

 

 

If using Twilio X-header Signature validation, you will need to use your Twilio Auth token and aren’t able to use the API key.  

There are a host of other security practices your team should consider, so please check out our docs on security to learn more.

Reliability Considerations

There are several strategies and configurations your team can utilize to ensure that your voice workloads are resilient and can gracefully handle failures.

Webhook connection override docs

StatusCallback Webhook

When making an outbound API call or returning TwiML, you can set a statusCallback webhook. Twilio will asynchronously post the call status to your endpoint. Monitoring call status events is the first step to determining if something isn’t working properly. Specifically if you see a failed final call status, it’s worth investigating why the call might be failing.

Voice Fallback URL

At the phone number level, you will configure the primary incoming webhook. The endpoint is where Twilio will send an http request and wait for a response (TwiML), however if this webhook isn’t responding, the call will fail. In the event your primary webhook isn't responding you can configure a fallback URL that Twilio will send a request to.

There are a few approaches for hosting the voice fallback URL that you can leverage with Twilio.

  • Create a Twilio Function: Twilio Functions is a serverless product where teams can write business logic for handling calls. As a fallback strategy, a Twilio Function can return a simple response to the end-user that there is a problem and they can reach out via another method. This doesn’t solve the problem of recovering the entire call flow but it does give the end-user information so they aren’t left wondering why their calls aren’t going through.

For ISVs checkout the Twilio Function API docs.

  • Create a Studio Flow: Twilio Studio is a serverless, state engine capability of hosting several different workloads. In the event you are hosting an IVR solution, you have a backup that runs via Twilio Studio.

For ISVs checkout the Studio API docs.

Debugger Event Webhook

The Debugger Event webhook is the first place to start when it comes to triaging errors and resolving them.

diagram

Anytime there is an error or warning, a webhook will be sent to the specified endpoint. ISVs should ensure that subaccounts are included. This is the payload that will be sent for each event:

PROPERTY

DESCRIPTION

Sid

Unique identifier of this Debugger event.

AccountSid

Unique identifier of the account that generated the Debugger event.

ParentAccountSid

Unique identifier of the parent account. This parameter only exists if the above account is a subaccount.

Timestamp

Time of occurrence of the Debugger event.

Level

Severity of the Debugger event. Possible values are Error and Warning.

PayloadType

application/json

Payload

JSON data specific to the Debugger Event.

Example Architectures/Configurations

We’ve talked about a lot of concepts in a theoretical sense, so let's actually work through a few examples. In the section we will identify a few use cases and architect a solution for each.

In our examples below, we will use a fictional ISV company called Owl Inc. Owl Inc. provides software solutions to bicycle shops to engage their customers. Owl Inc. has three customers that use its software; Pedaling Parrots, Fast Finches and Racing Robins.

diagram

The first solution Owl Inc. wants to offer the ability for its customers to send voice notifications. These notifications will be sent to their customers when their bike is ready to be picked up or could use a tune up. Owl Inc. has a customer called Pedaling Parrots.

Below is a reference architecture with pricing:

Here are the order of operations along with a ladder diagram:

  1. Owl Inc. will make a /POST call with TwiML to Twilio.
  2. Twilio makes a call to the Pedaling Parrots customer (Bob).
  3. When the end-user picks up they will hear a message.

 

diagram

ISV_Blog

 

The outbound notification use case is a great one, but sometimes end-users don’t pick up the phone and / or try to call the number back.

Pedaling Parrots has asked Owl Inc. to see if this feature can be built. Owl Inc. is able to create an experience where calls are routed to an employee via their existing software.

 

diagram

Here are the order of operations along with a ladder diagram:

  1. Pedaling Parrots customer calls Pedaling Parrots number +14151234567.
  2. Twilio sends a /POST request to the incoming phone number webhook to Owl Inc.
  3. Owl Inc. finds the correct identity of the client (Cathy) to route calls to and returns correct TwiML.
  4. Twilio makes a call to the identity.
  5. Pedaling Parrots employee answers via web browser.

ISV_Blog

 

The solution works great for Pedaling Parrots! However Owl Inc. has another customer, Fast Finches, with different requirements. Fast Finches have hard physical phones and want to use Owl Inc. software. Owl Inc. can use the SIP domain to register the hard phones and offer the solution to Fast Finches.

diagram

Here are the order of operations along with a ladder diagram:

  1. Pedaling Parrots customer calls Fast Finches number +14151239876.
  2. Twilio sends a /POST request to the incoming phone number webhook to Owl Inc.
  3. Owl Inc. finds the correct sip address and returns the correct TwiML.
  4. Twilio makes a call to the sip endpoint
  5. Pedaling Parrots employee answers via a sip endpoint.

ISV_Blog

 

Note: Pedaling Parrots may need to port their number to Twilio unless Pedaling Parrots can provide the SIP details to Owl Inc; in which case Owl Inc. could create a SIP domain for the SIP endpoint or send to Pedaling Parrots PBX that can route the call to the correct user.

Let's meet up with the Pedaling Parrots again. They’ve seen a growth in business and are getting a lot of orders. Sometimes when end-users call in and speak with a Pedaling Parrots employee, they need to add in a representative from the bike manufacturer. The bike manufacturer has a phone system in place so SIP can be used to setup the call. They’ve funneled this request back to Owl Inc. Owl Inc. can build out a voice solution using the Conference capability.

diagram

Here are the order of operations along with a ladder diagram:

  1. Pedaling Parrots customer calls Pedaling Parrots number +14151234567.
  2. Twilio sends a /POST request to the incoming phone number webhook to Owl Inc.
  3. Owl Inc. adds the end-user into a conference call called supportroom123
  4. Owl Inc. finds the correct identity of the client, makes an outbound call and adds to the conference call.
  5. Owl Inc. then makes a call to the bike manufacturer SIP endpoint. When bike manufacturer answers the call they are added into the conference.
  6. Twilio makes a call to the sip endpoint

Conference_Flow

 

Owl Inc. Account Structure and Implementation

We’ve seen the individual call flows, but we haven’t seen a full account structure for Owl Inc. with all the tenants. Below is an architecture diagram with the necessary resources to enable its customers to make these call flows happen.

Basic connectivity - PSTN for Pedaling Parrots and SIP for Fast Finches

Below is the architecture that Owl Inc will implement for Pedaling Parrots and Fast Finches.

SIP Diagram

While this architecture is good, we want to ensure that Owl Incs. customers are getting the best call attestation. So let's see what the architecture looks like when  Owl Inc creates:

  1. A Trusthub Profile
  2. Creates a SHAKEN/STIR profile
  3. Creates a CNAM profile

voice_2

 

 

Great! We have a good architecture diagram, however we haven't account for Racing Robins yet. Racing Robins has customers in Romania and has an existing telephony provider they want to bring to Twilio. Below shows the architecture for:

  1. Bring your own Carrier Trunk
  2. Creating a regulatory bundle for Romania

voice3

 

For more information on regulatory bundle please check out our docs:

Regulatory Compliance Workflow

Regulatory compliance REST Docs

Billing/stats management

As mentioned previously in this blog, providing call logs and aggregate metrics for customers is an important offering for ISVs. Twilio offers three ways of approaching this:

  1. receiving voice data in real time through webhooks
  2. retrieving call logs with the call resource API
  3. fetching analytics from the Voice Insights API.

It is also important to capture product usage data so you can accurately bill your customers. Twilio provides a usage API allowing ISVs to track usage on a subaccount-by-subaccount basis. This makes for quick billing, as a subaccount’s entire usage for a month can be pulled with one API call. Below is a diagram that lays out this architecture:

diagram
Emergency Calling

For some customers, having the ability to send calls to Public Safety Answering Points (PSAPs). As of this publication date, Twilio does support e911 and Basic for Elastic SIP Trunking and e911 SIP Domains, however we don’t support for other programmable workloads which would include making calls via WebRTC.

When using E911 with SIP domains there are two strategies for sending calls to Twilio. You must either provide a valid “From” number in the header or use the SIP domain level Emergency caller ID. This can have architectural implications.

Emergency Calling for SIP Interfaces

Below are two implementations of e911 with SIP domains.

  1. Set an Emergency Caller ID at the SIP Domain level
  2. Assigning each SIP client a phone number

 

e911-4

 

 

 

 

 

e911-6

 

 

 

 

 

Given those two strategies, let's think through some trade-offs.

Strategy

Pros

Cons

Unique phone number /
SIP client

  • Implementation is straight forward
  • Increase in phone number costs

Unique phone number / 
SIP domain

  • Ability to use internal naming convention for SIP username
  • In the event of call disconnect with PSAP, special routing to caller

Compliance

TCPA compliance

https://www.twilio.com/docs/glossary/what-is-telephone-consumer-protection-act-tcpa

Forbidden Twilio use cases

SHAKEN / STIR

SHAKEN/STIR is a protocol mandated by the FCC that seeks to reduce unwanted robocalls. As an ISV looking to extend the best experience to your customers and their end users, it's best to try and get an “A” Attestation. As part of Twilio implementing SHAKEN/STIR, there is a registration process that ISVs will need to go through. Here is a link to the onboarding guide and the voice documentation for accessing the attestation.

Later on in the blog we will model out the architecture of SHAKEN/STIR.

Best Practices

Network Testing

An end-user’s browser can impact their call experience. This is way Twilio has published an SDK to test the users network to ensure they can handle calls.

Toll Fraud Prevention

International revenue sharing fraud (IRSF), also known as toll fraud, is a scheme where fraudsters artificially generate a high volume of international calls on expensive routes. We published an article on how to reduce toll fraud risk. Using the Geo-permissions can help to limit areas that are higher risk.

Healthy Traffic

There are all types of use cases for voice workloads, however there are some guidelines to follow to ensure that your traffic remains healthy. The following represent best practices Twilio has observed:

  • Average call duration must be greater than 30 seconds
  • No more than 10% of your calls should have a call duration of less than or equal to 12 seconds
  • ASR (Answer-Seizure rate) must be greater than 70%
  • Higher attestation (via SHAKEN/STIR)
  • Not sending repetitive callerID
CNAM

Caller ID Name (CNAM) is a feature in the United States public telephone network that identifies an incoming caller by a personal or business name associated with the originating phone number. As of this publication date, CNAM is in Beta and can be set programmatically. Here is a link to the docs on the step by step process.


Voice Checklist

We went through a lot of information and we know it will take time to learn all of these different strategies. To keep things organized, we wanted to leave you with a high level checklist that your team can use when assessing different voice use cases. While this list is inclusive of all the edge cases, it’s a helpful list to work through.


Checklist

 

Authors

Austin Zuber is a Solutions Engineer and helps customers design and implement scalable omnichannel communications infrastructure. You can reach him at azuber [at] twilio.com.

Josh Siverson is a Principal Solutions Engineer focused on helping ISV Partners build scalable architectures and business on Twilio. You can reach him at jsiverson [at] twilio.com.