Calculating Character Count of RCS Messages

September 03, 2025
Written by

With the recent launch of RCS to General Availability here at Twilio, it’s important to understand how long your messages are. The length of your messages impact how they are categorized, processed, and billed on the Twilio platform for all channels (SMS, MMS, WhatsApp, and RCS). In this technical article we will review the details around message length, message encoding differences between RCS and SMS, and how all of these are important when building out your customer engagement.


To understand message length and segments on a deeper level, read What The Heck Is A Segment?.

 

Why does message length matter in RCS?

In most destination countries, RCS Business Messaging has two types of messages, basic and single.

  • Basic: A text only message up to 160 UTF-8 bytes
  • Single: All text only messages 161 UTF-8 bytes and over & Any rich message type, including media.

 

In the US carriers have rolled out a pricing model that is different; rich and rich media messages.

  • Rich: Text only messages counted in segments of 160 UTF-8 bytes, similar to SMS, and could be rich content (e.g. quick replies) with no media.
  • Rich Media: Messages include either media or certain rich features in the template in them, no segment cost.

 

It seems simple enough. You count the number of characters in the message and get the segment cost… right?

 

RCS Encoding Nitty Gritty

The key point to focus on is UTF-8, which is a variable length Unicode encoding that uses 8-bit code units. Just because RCS counts characters based on UTF-8 doesn’t mean that every character will only occupy 8 bits, or 1 byte. Take an emoji as an example: 😎

This is encoded as 4 bytes in UTF-8 because there are 4 bytes used to represent this emoji which has a code point of U+1F60E: F0 9F 98 8E (4 code units).

A code point is a unique numeric identifier assigned to a specific character within a character encoding system, Unicode in this case since UTF-8 is Unicode Transformation Format - 8-bit. An encoding maps each code point to one or more code units, 8-bit bytes for UTF-8. The code units are what actually count as characters in your message in the associated encoding.

We can also look at non‑Latin characters such as 太陽, which are represented with 6 bytes total in UTF‑8.

  • 太: U+592A - E5 A4 AA (3)
  • 陽: U+967D - E9 99 BD (3)

Let’s compare that to a Latin character “A”

  • A: U+0041 - 41 (1)

The letter “A” only takes 1 byte to represent in UTF-8.

 

Complex Emojis

Finally let’s take a look at a single emoji that takes 35 bytes to represent in UTF-8 and we’ll break it down to understand modifiers to emojis and how that can affect character count.

🧑🏾‍❤️‍💋‍🧑🏻

In this emoji there are 10 code points used to represent the entire grapheme cluster, which is just a fancy way to say “multiple code points combined to represent one ‘character’ to the end user.”

  • U+1F9D1 PERSON - F0 9F A7 91 (4)
  • U+1F3FE EMOJI MODIFIER FITZPATRICK TYPE‑5 - F0 9F 8F BE (4)
  • U+200D ZERO WIDTH JOINER (ZWJ) - E2 80 8D (3)
  • U+2764 HEAVY BLACK HEART - E2 9D A4 (3)
  • U+FE0F VARIATION SELECTOR‑16 - EF B8 8F (3)
  • U+200D ZERO WIDTH JOINER - E2 80 8D (3)
  • U+1F48B KISS MARK - F0 9F 92 8B (4)
  • U+200D ZERO WIDTH JOINER - E2 80 8D (3)
  • U+1F9D1 PERSON - F0 9F A7 91 (4)
  • U+1F3FB EMOJI MODIFIER FITZPATRICK TYPE‑1–2 - F0 9F 8F BB (4)

Total: 4+4+3+3+3+3+4+3+4+4 = 35 bytes in UTF-8.

 

A zero width joiner (ZWJ) is an invisible character that tells the device to render the adjacent characters into a single grapheme cluster. If the device does not support ZWJs the end user would see the characters separately.

A variation selector-16 (VS16) forces an emoji presentation for characters that have both text and emoji forms, e.g. U+2764 which would display as just ❤ without the VS16.

Then there are skin tone modifiers in emojis. These are separate code points that modify certain emojis and are encoded independently.

 

As you can see from the example above this emoji (or grapheme cluster) is a combination of many different types of emojis and modifiers on those emojis to display a complex representation to the end user. It’s important to consider that when sending emojis, simple emojis and non-modified emojis will take up less character count in your RCS messages.

 

Why should I care about RCS message length?

Since text based RCS messaging has limits on the length for basic messages, and a per segment cost in the US, knowing how long your actual message will be is critical to understanding billing.

If you were to send you an RCS message that had a message body of:

“RCS on Twilio is GA! 🚀 Upgrade SMS to rich, branded chats with images, carousels, buttons, and verified senders ✅ Enjoy read receipts, analytics, and global reach 🌍 Simple APIs + Studio flows. Build wow moments today with Twilio RCS 🎉

This has a total length of 245 bytes in UTF-8. If you had a UK mobile number then this would be considered a single message. However, if you had a US mobile number then this would be considered a 2 segment rich message.

 

Wait a second, how is this different from SMS?

In SMS the default encoding is GSM-7 which only uses 7 bits to represent characters, and if characters are used outside of GSM-7 then the message will be encoded with UCS-2 which uses 16 bit chunks to represent code points. The primary difference between GSM-7 and UCS-2 against UTF-8 is that GSM-7 and UCS-2 are fixed width, meaning they will always use 7 or 16 bit chunks to represent code points. UTF-8 is variable width and as you can see from above can use 1-4 bytes to represent code points. We’ll take the same examples above and compare them to the encoding used in SMS.

 

😎: 2 code units in UCS-2 (32 bits)

  • U+1F60E - D83D DE0E (2 code units)

 

太陽: 2 code units in UCS-2 (32 bits)

  • U+592A - 592A (1)
  • U+967D - 967D (1)

 

A: 1 code unit in GSM-7 (7 bits)

  • U+0041 - 41 (1)

 

🧑🏾‍❤️‍💋‍🧑🏻: 15 code units in UCS-2 (240 bits)

  • U+1F9D1 - D83E DDD1 (2)
  • U+1F3FE - D83C DFFE (2)
  • U+200D - 200D (1)
  • U+2764 - 2764 (1)
  • U+FE0F - FE0F (1)
  • U+200D - 200D (1)
  • U+1F48B - D83D DC8B (2)
  • U+200D - 200D (1)
  • U+1F9D1 - D83E DDD1 (2)
  • U+1F3FB - D83C DFFB (2)

 

Limitations of GSM-7 and UCS-2

GSM-7 is a very limited set of characters and is used because each SMS segment can carry up to 140 bytes, by using only 7 bits per characters you can fit 160 characters in the first segment, but anything above that requires a header that occupies 7 of those characters, so 161 characters and above would be split into segments of 153. Importantly Twilio can help with common non GSM-7 characters and automatically replace them with a GSM-7 equivalent using smart encoding.

UCS-2 is also limited in modern encoding and only covers what is known as the Basic Multilingual Plane (BMP). The BMP is U+0000 through U+FFFF. Astral, or supplementary, planes occupy U+10000 through U+10FFFF and to represent these characters in UCS-2 requires a surrogate pair which is composed of a pair of 16-bit code units in the reserved range within the BMP of U+D800 through U+DBFF (high) and U+DC00 through U+DFFF (low). While technically UCS-2 doesn’t support surrogate pairs, the successor UTF-16 does and has nearly the same code points for all characters as UCS-2 which remains only in name for SMS, though SMS cannot take advantage of the variable width of UTF-16. Any code points in these reserved ranges by themselves mean nothing, but combined can represent characters from astral planes, e.g. U+1F9D1 uses the surrogate pair D83E DDD1 to display 🧑.



In Summary

  • Code points are an abstract number assigned to a character.

  • Code units are the actual storage unit an encoding uses to represent the code point in the encoding used and the encoding also specifies the bit length. This is what counts towards your overall message length.

  • Grapheme clusters are what users perceive as single characters, they may consist of one or more code points (with or without ZWJ).

  • Zero width joiners combine the characters in the grapheme cluster.

  • Variation selectors force emojis where an equivalent non-emoji character resides at that code point.

  • Skin tone modifiers change certain emojis and are calculated as independent code points.

 

You can find more information about SMS segments here:

Messaging Character Limits

What is GSM-7 Character Encoding?

What is UCS-2 Character Encoding?

Messaging Segment Calculator