Messaging Architecture for Independent Software Vendors (ISVs)

July 06, 2021
Written by
Reviewed by
Chris Piwinski
Contributor
Opinions expressed by Twilio contributors are their own

Messaging Architecture for Independent Software Vendors (ISVs)

Independent Software Vendors (ISVs) are an especially important category of customers for whom building a trusted and scalable messaging architecture is paramount. If you aren’t sure if you are an ISV, checkout this blog post. If you are an ISV, you are serving hundreds to thousands of customers, and exponentially more end-users. There are a multitude of design patterns to consider when rolling out a messaging solution, and it’s easy to go down the wrong path. That’s where Twilio comes in – to help you avoid common pitfalls.

Twilio is a trusted provider in the telecommunications ecosystem – we help our customers navigate the complexities of the ever-changing communications landscape, and we’re here to empower you to build the best architecture for your messaging solutions. Designing the correct architecture is critical to ensure your infrastructure can scale; ultimately allowing you to support your customer base. From choosing the right sender type, to assessing design considerations, to error handling, and much more—we’ve got you covered.

This architecture-focused post will guide you through the steps to build out your messaging strategy as an ISV.

In this post we will discuss:

  • A brief overview of message senders and sender selection strategy
  • Messaging architecture guidance
  • Reliability considerations

Prerequisites

We recommend that you read:

Using subaccounts is a best practice for ISVs to segment your customers within your Twilio Account.

The foundation of building a robust messaging architecture starts with a scalable account structure in Twilio. As an ISV, you will want to separate usage and billing for your customers, among other things, and this is where subaccounts are a valuable resource. For the remainder of this post, we will stick to ISV customer implementations that leverage subaccounts. The diagram below provides a high-level example of an ISV and customer rollout using subaccounts.

Below is a high level architecture for Owl Inc. Owl Inc. has two projects; one for development (Dev) work and the other for production (Prod) traffic. The Prod account will contain subaccounts that Owl Inc provisions for each of its customers.

Account setup for theoretical Owl Bank

Sender Selection and Strategy

Sender classification

There are two broad categories of senders: Alphanumeric and Numeric Senders.

While we won’t get into all the differences between the different sender types, a few key differences are:

Alphanumeric:

Below is a helpful diagram showing the taxonomy of Sender IDs.

Twilio Sender ID hierarchy

Notes on Pre-registered Alpha Sender vs. Dynamic Alpha Senders:

  • To ensure the best deliverability we recommend using a pre-registered sender. Though there is a more robust registration process, it will help to deliver messages within the provisioned country.
  • Here is a list of countries that support Alphanumeric senders. Please note that not all countries support pre-registration.

Sender Matrix

 

Short Codes

A2P 10DLC

Toll-Free SMS

Alphanumeric Sender ID (Non registered)

Alphanumeric Sender ID (Registered)

Long Code- Sending Internationally

MPS

Country dependent

3-180 based on Trust Score

3+

10+

10+

10

Message Volume

Unlimited

T-Mobile limits

1000-200k**

Unlimited

Unlimited

Unlimited

Unlimited

Registration Interface

Application Form

API

API*****

 

Application

API

Voice

No

Yes

Yes

No

No

Yes- if within same country

DLRs

Country dependent

Carrier

Handset (US/CA)

Country dependent

Country dependent

Country dependent

Provisioning Time

6-10 weeks

Minutes

Instant****

Minutes

Country specific

Instant

Recurring Fees*

Yes. Check country specific pricing

Registration + Campaign fees***

Depends on throughput

No

Yes- country specific

No

Additional Carrier Fees

Yes

Yes

Yes

No

No

No

Fees outside of phone number cost.
** Depends on Starter vs. Standard. Standard brands can send over 200k with a Special Business Review approval.
*** Difference in cost depending on Starter vs. Standard Brands.
**** Verified Toll Free number takes 5-7 business days.
***** Verification process requires application form.

Sender Decision Flow Chart

The chart below highlights a few key questions to ask your customers when they are looking to send messages. Answering these questions will help you select the right sender for your workload.

ISV Sender Decision Flow Chart

Why each question is important to ask:

  • Use Case: Countries have different regulations on what types of messages can be sent using which types of number (alphanumeric or numeric). In certain countries a Regulatory Bundle must be completed before provisioning a number. Broadly speaking, there are three types of Messaging:
    • Conversational: A back and-forth conversation that takes place via text, where an end-user initiates a conversation
    • Transactional: When a Consumer gives their phone number to a business and asks to be contacted in the future
    • Promotional: A message sent that contains a sales or marketing promotion. Adding a call-to-action (e.g., a coupon code to an informational text) may place the message in the promotional category
  • One-way vs. Two-way Messaging: Not all phone numbers are capable of sending and receiving SMS messages. In countries where there are multiple Sender types there are usually trade-offs when it comes to choosing the right sender.
  • Country:
    • When possible, always select a local sender, especially for two way messaging. This is best for deliverability and the end-user experience.
    • Sending from a non-local sender could result in the sender ID being overwritten and impact user experience, especially if the use case is to support two way messaging.
    • In countries where alpha pre-registration is required, this will also impact deliverability. It is best to use pre-registration alphanumeric numbers as carriers tend to favor this mechanism for sending.
    • In countries that do not require pre-registration, Twilio will overwrite to an effective local sender such as a shared SC where necessary to deliver the message to the end user. Note that the local sender that Twilio sends from may change, so keep only one-way workloads on this sender.
    • Keep in mind that in many countries you will need to provide documentation as part of local regulations. For example, German local numbers require proof of identity and a local address.
  • High vs. Low Message Throughput: Understanding what your customers' throughput requirements are will help ensure you select the right sender. For example, for time-sensitive use cases like delivering One Time Passcodes (OTP), a short code is normally the best sender versus in a conversational use case, where a local 10-digit code may be preferred.
  • Cost vs Deliverability: Senders have trade-offs. Customers should prioritize deliverability,however, there may be budget-related reasons or limited inventory that make it challenging to prioritize sender selection. Please speak with your Account Team to talk about these trade-offs.
  • Phone provisioning urgency: Certain senders, such as pre-registered Alphanumeric Senders and short codes, take longer to provision compared to long codes. Ensure you are aware of lead times when working with your customers.

Now that we understand the main questions to ask, let's walk through a quick example.

Owl Inc. is a fictional ISV that provides a customer engagement platform. Falcon Flights, a client of Owl Inc., wants to send out order status updates to their customers.

The diagram below shows the questions Owl Inc. should take into consideration as they engage in a discovery conversation with Falcon Flights.

United States 🇺🇸

US Sender diagram for fictional Owl Bank

US-based senders should reference our phone number guide.

Brazil 🇧🇷

Owl Inc. is expanding into other countries. They’d like to offer conversational  use cases within Brazil. Let's walk through the decision tree for Brazil.

Brazil Sender diagram for fictional Owl Bank

International based senders can reference our international phone number guide.

 

Please read our Best Practices for Scaling with Messaging Services documentation for more details.

I can’t find a number…

In certain countries, Twilio isn’t able to provide numbers for a variety of reasons, but there are alternatives you can consider:

Architecture Considerations

It is critical to understand a few key architectural considerations for messaging.

Messaging rate limits

Let’s understand the flow of messages that pass through Twilio. There are a few systems involved, and these systems affect the speed at which Twilio accepts and delivers messages.

Messaging flow diagram inside of Twilio
  1. Twilio’s API edge is the first checkpoint, and it determines how fast Twilio can receive messages. It is defined as the number of concurrent requests that Twilio can receive. Each Twilio account / subaccount has its own request limits, or the number of simultaneous requests you can make to Twilio.

    For example, let’s pretend that your Twilio account has a request concurrency limit of 50, and on average the Twilio API’s response latency is 250ms. We can calculate the total number of requests that can be made in one second as follows:

    Number of requests per second = (API concurrency * 1000 milliseconds) / API latency in milliseconds

    Twilio provides a Twilio-Request-Duration header on each API response, allowing you to evaluate the API’s processing time compared to your network latency 

    If you exceed this limit you will receive HTTP 429 error messages.

    Some best practices we recommend are:  

    • Implement exponential backoff to retry messages that fail to send.
    • Use a 3rd party message queue (like RabbitMQ) Quality of Service (QoS) setting with an exact number of consumers or threads feeding from that queue and sending messages to Twilio. Since concurrency is applied per subaccount, ISVs should implement these queues at the subaccount level as well.
    • Use a reverse proxy to throttle outbound http requests from your application(s).
  2. Twilio egress to the downstream provider is measured in messages per second (MPS) or message segments per second. To understand more about a message segment check out our blog post called What the Heck is a Segment?.

    Note: ISVs that want to expose message length to their customers can embed a segment calculator.

    As explained earlier, each sender type has a different rate limit. Sender rate limits define how fast Twilio sends messages to downstream carrier partners.

    If you have selected a Short code (SC) with 100 MPS, Twilio will send messages to downstream providers at 100 segments per second. If you are sending at a rate higher than your available MPS, messages will be queued.

  3. Queuing occurs when messages are sent at a higher rate than the sender’s available MPS. Messages are always queued First In First Out (FIFO). All Twilio queues are 4 hours long, so the queue can hold messages for up to 4 hours before they expire.

    For example, if you have Short code sender with a default of 100 messages per second (MPS), you are going to get a 4 hour queue with 100 * 4 * 60 * 60 = 1,440,000 message segments.

    When messages are queued, they are checked for the validity period (an attribute on the Message Resource) and the max queue size. If the message validity period is less than the queue size—meaning Twilio can dequeue a message before the validity period expires—then messages are rejected with an HTTP error of 429.  If no validity period is defined, then the max queue size is used which has a validity period of 4 hours. In this situation, messages fail with the error ‘30001 - Queue overflow’.

There are two strategies for sending messages to Twilio:

1. Sending with a sender specified via the From parameter in a /POST request:

curl -X POST https://api.twilio.com/2010-04-01/Accounts/$TWILIO_ACCOUNT_SID/Messages.json \
--data-urlencode "From=+15017122661" \
--data-urlencode "Body=ISVs are awesome!" \
--data-urlencode "To=+15558675310" \
-u $TWILIO_ACCOUNT_SID:$TWILIO_AUTH_TOKEN

2. Sending with a Messaging Service specified in a /POST request:

curl -X POST https://api.twilio.com/2010-04-01/Accounts/$TWILIO_ACCOUNT_SID/Messages.json \
--data-urlencode "MessagingServiceSid=MG9752274e9e519418a7406176694466fa" \
--data-urlencode "Body=ISVs are awesome!" \
--data-urlencode "To=+15017122661" \
-u $TWILIO_ACCOUNT_SID:$TWILIO_AUTH_TOKEN

Messages sent with a defined sender type are queued directly to the sender queue and receive responses synchronously. If any errors arise, they will be either HTTP 429 or Twilio error ‘30001 - Queue overflow’. It is best to implement exponential backoff to retry failed messages.

If a messaging service is used, the messages are first sent to the messaging service’s queue. These messages are dequeued at a rate higher than the available MPS of senders, which means the messaging service queues will always be empty. When the downstream senders’ queues are full, messages fail asynchronously with the Twilio error ‘30001 - Queue overflow’.

Implementing your own queue

Handling messaging rate limits can be architecturally challenging, and if you’re an ISV using subaccounts to manage your customers, you will want to consider building scalable designs such as a multi-queue worker system that distributes resources in order to adhere to concurrency limitations.

Implementing this architecture will allow ISVs the ability to :

  • Segment traffic resources
  • Prioritize high priority messages vs. low priority messages
  • Scale horizontally

Implementing a message queue in Owl Bank's fictional architecture

Reliability Considerations

Now that we have a good understanding of the right sender types and architecture, let's think about reliability and how to make our infrastructure resilient.

This section will walk through a few ways to handle failures for your messaging workloads.

Status Callbacks

Status callbacks are specified on each message sent or on the Messaging Service. As the message goes through Twilio and the messaging ecosystem, Twilio will send status updates to the specified status callback url.

If you see statuses such as undelivered and failed start investigating what might be leading to these statuses.

Debugger Event Webhook

The Debugger Event webhook is the first place to start when it comes to triaging errors and resolving them.

Twilio webhook trigger form

Anytime there is an error or warning, a webhook will be sent to the specified endpoint. ISVs should ensure that subaccounts are included. This is the payload that will be sent for each event:

PROPERTY

DESCRIPTION

Sid

Unique identifier of this Debugger event.

AccountSid

Unique identifier of the account that generated the Debugger event.

ParentAccountSid

Unique identifier of the parent account. This parameter only exists if the above account is a subaccount.

Timestamp

Time of occurrence of the Debugger event.

Level

Severity of the Debugger event. Possible values are Error and Warning.

PayloadType

application/json

Payload

JSON data specific to the Debugger Event.

Fallback URL

When a message is sent into Twilio’s infrastructure, we send a webhook to the provision endpoint via the incoming webhook. However, if your infrastructure is down, your application won’t get a POST request.

One strategy to mitigate this is to add a fallback URL at the Messaging Service or Phone number level.

Note: Make sure that your fallback URL is hosted in a different service than that of your primary URL. If both your primary and fallback URL are in the same service and that service is not responding, the fallback URL won’t help. In the case of network interruptions, consider webhook overrides to fine-tune webhook logic.

Twilio Functions + Sync

Twilio Functions is a serverless environment that allows builders to write application code and deploy without worrying about underlying infrastructure.

Twilio Sync is a serverless storage layer.

A fallback strategy consists of:

  1. Create a Twilio function
  2. /POST inbound messages to a Sync Resource (List, Map or Document)
  3. When your infrastructure recovers, poll the Sync Resource to collect information that was missed during the outage

While this strategy isn’t perfect, it does provide the benefit of persisting all the messages in a single source to update your records when your application comes back online. Additionally, within Twilio Functions you can write logic to respond back to the end-user.

Event Streams

Event Streams is an API that allows developers to aggregate Twilio events and send them to a specified destination. Destinations include Sink types (such as Amazon Kinesis) and a webhook.

Messaging Insights

Messaging Insights is a dashboard that aggregates account-level Messaging metrics. For ISVs, it’s a helpful view of parent account and subaccount data, making it easier to understand high-level trends and to monitor key metrics such as deliverability, opt-outs and errors.

Good error rates are 0-6% and 0-0.3% for opt-out.

Conclusion

That’s it! You’re now fully equipped with the information you need as an ISV to build and launch a supercharged messaging solution. As you implement the messaging recommendations, use this list to check off the major components related to SMS messaging.

SMS Checklist

We hope this tutorial was valuable for learning the ins and outs of messaging architecture. We can’t wait to see what you build.

Valerie is a Principal Solutions Engineer at Twilio, focused on enabling Platform ISVs to build innovative engagement solutions on Twilio. You can reach her at vlim[at]twilio.com.

Pathik Soni is a Principal Solutions Engineer helping Enterprise ISV partners reimagine their customer engagement experience using Twilio.

Josh Siverson is a Principal Solutions Engineer focused on helping ISV Partners build scalable architectures and business on Twilio. You can reach him at jsiverson [at] twilio.com.