You’ve crafted the perfect text message. The punch of a novel packed into a single SMS, worthy of the bard himself. Your campaign goes off without a hitch. Then, when you take a look at your costs you see they’re four times what you expected. Leading you to think: What the heck is a segment and why am I being charged for so many of them?
We’ll pull back the covers on SMS standards to give you an answer. Here’s where we’ll cover:
- Understanding what a segment is and how it affects your bill
- Encoding standards and headers you use to send messages
- Crafting the perfect message
- Subtle gotchas & pro-tips
Looking Back On The Nokia Brick Phone To Understand Message Segments
Think back to when you first started texting on your good ol’ indestructible Nokia brick. While hammering out messages on a T9 keyboard, you may have noticed a counter ticking down from 160 next to a 1. When that counter hit 0, you’d see that 1 that was sitting next to the 160 jump up to a 2.
This means you’d end up with two messages on your bill. This first number was counting how many characters you had left per segment and the second one was counting how many segments you had used.
What’s Changed About Segments Since Back In The Day
SMS standards have barely changed since the days of the brickphone. Messages are still sent in 140 byte chunks known as message segments.
When Twilio communicates with carriers to send out SMS messages, we send them one segment at a time. To figure out how many characters this affords you, we’re going to have to do a little math.
A Little Math, Much Clearer Insight Into Segments
Standard SMS encoding uses the GSM 03.38 character set which takes 7 bits to encode a character. 140 bytes x 8 bits in a byte divided 7 bits leaves us with the 160 character message segment.
Message segments are how Twilio (and the SMS industry as a whole) counts messages.
This means that in addition to your costs, you should also think in terms of segments when you’re analyzing SMS throughput. Throughput varies by the Sending number you’re using, but in all cases it’s counted in terms of Message Segments per Second rather than total messages.
If getting your message out in a certain window is important to you, make sure you know how many segments you’re sending.
How Does The Perfect Message Behave?
Going back to your perfect text message, you count up the characters, and something still seems off. You’ve only used 210 characters but it looks like each of these messages has more than two segments.
Part of the answer lies in the encoding. Notice that this message has UCS2 listed as the encoding instead of GSM. To accommodate a message as lit as this one, Twilio has to use a different character set. You may have noticed if you clicked on the GSM link above that it didn’t contain any ?’s. When you send messages with non-GSM characters such as Emojis we have to use a different type of encoding known as UCS-2. UCS2 takes 16 bits to encode each character so going back to the math we did above we now have a limit of 70 characters (140 bytes * 8 bits in a byte / 16 bits). Besides emojis you should also be careful with accented characters. GSM 03.38 includes some accented characters such as ñ, à, and ö, but does not include others such as á, í, or ú.
What Exactly Does A Data Header Do?
Still, it looks like with this 70 character limit, this message should still only be three segments, not four. The last piece of the puzzle lies in concatenation. When you send multi segment, messages Twilio uses User Data Headers to tell the destination how to reassemble it. This takes up 6 bytes per message leaving only 67 characters for UCS2 encoded messages or 153 for GSM encoded messages.
Maybe it turns out the fire emojis aren’t worth it after all. However, when you trim the same message down and resend it, it still doesn’t seem to work out quite right:
This message contains two of the “gotchas” that commonly cause encoding issues: smart quotes and non-GSM spaces. Take a look at this message that appears almost identical:
There are only three characters that have been switched: the spaces between sentences were changed from ‘ ’ to ‘ ’ (U+2002 to U+0020) and the “smart quote” after Shakespeare was replaced with a standard apostrophe ‘ instead ‘ (U+2019 to U+0027). Smart quotes are usually a result of text editors being too darn helpful. Non-GSM spaces are usually a result of copying and pasting. Be extra careful with those as they’re often converted to conventional spaces for display. The twilio.com console is one of those places, meaning that message bodies that contain non-GSM spaces in the API will be formatted as regular U+0020 spaces for display.
UPDATE: Twilio's Copilot service now includes a Smart Encoding feature which will automatically translate Unicode characters into GSM7/ASCII characters for you if there are GSM7/ASCII alternatives and no characters that can’t be converted in the message, e.g. emoji or language based characters. For more details see these pages:
It’s always important to be cognizant of the character set before you’re sending messages. You can also use this app to check specific messages ahead of time: http://chadselph.github.io/smssplit/.
After reading this blog post you’re fully equipped to make decisions about whether Emojis are worth it, eliminate smart quotes and non-GSM spaces before sending messages, and accurately count up segments before you run a campaign instead of after.