Twilio Changelog | Oct. 07, 2025

ConversationRelay now supports SSML tags to fine-tune speech

We have added support for SSML tags to ConversationRelay to provide the pronunciation of a word or an acronym, specify where pauses should be, or increase or decrease the speed of spoken text.

You can now pass through SSML tags within the token of a text token message to fine-tune synthesized speech. Supported SSML tags depend on the active ttsProvider for the session.

If the ttsProvider is Google or Amazon, see the Speech Synthesis Markup Language (SSML) documentation for the list of supported tags.

If the ttsProvider is ElevenLabs, the language must be en-us , and only the SSML <phoneme> tag for pronunciation is supported. See the ElevenLabs documentation for guidance on defining phoneme tags. For example:

{ "type": "text", "token": "Hello from <phoneme alphabet=\"ipa\" ph=\"ˈtwɪlioʊ\">Twilio</phoneme>.", "last": false, "interruptible": false, "preemptible": false }

For more information, please refer to the docs:

Voice API GA

ConversationRelay now supports SSML tags to fine-tune speech

Additional Resources

Blog

Docs

Events