Integrate ElevenLabs Voices with Twilio's ConversationRelay
Time to read:
Integrate ElevenLabs Voices with Twilio's ConversationRelay
Twilio’s ConversationRelay service allows you to integrate Speech-to-Text (STT), Text-to-Speech (TTS) and AI Large Language Models (LLMs) with your Twilio Voice Applications. It handles the complexities of synchronous voice calls so that you can focus on processing conversational AI logic in the backend. With this setup you can create AI chatbots or personalized assistants that can respond to customer queries through natural voice interactions without having to connect with an agent.
Recently, ConversationRelay has been updated to include ElevenLabs as a Text-to-Speech provider. With over 1000 voices to choose from, you can deliver and customize expressive human-like voices that feel natural and engaging. You can find a list of ElevenLab voices available with ConversationRelay here.
In this tutorial, you’ll learn how to integrate and customize ElevenLabs voices with ConversationRelay in Node.js.
Prerequisites
- A Twilio account - Sign up with Twilio for free here
- A Twilio number - Read our docs here on how to obtain a Twilio number
- Node.js installation - Download Node.js here
- ngrok installation - Download ngrok here
- Git Installation - Download Git here
- OpenAI Account - Sign up here
Set up ConversationRelay Service
Clone Quickstart Repo
To speed things up, we’ll be building off a ConversationRelay demo application that you can find here: Voice Assistant with Twilio and Open AI (Node.js) Github repository.
If you're curious on the implementation of this demo and how it was built, I highly recommend you check out the tutorial that this demo is based off of: Integrate OpenAI with Twilio Voice Using ConversationRelay blog post.
To clone this repo, open your terminal/shell, navigate to your preferred directory and paste the following command:
Run Ngrok Tunnel
Your ConversationRelay demo application will spin up a WebSocket server using Fastify which will be used to communicate between your application and Twilio. However, this server is only locally hosted on your computer which means it's not publicly accessible. ngrok will be used to connect your Fastify server to the internet by generating a public URL that will tunnel all requests directly to your local server on your computer.
Open a new tab on your terminal and execute the following command to spin up a tunnel to port 8080 (which is where your server will be hosted on):
After executing the command, your terminal should look like the following:
 
			
		 
			
		Copy the Forwarding URL as you’ll need it for the next section.
Import Environment Variables
Open up the /cr-demo project directory on VSCode (or your preferred IDE) and navigate to the .env.example file. Rename this file (by right-clicking and selecting Rename) to .env. Replace the OPENAI_API_KEY placeholder with your actual API key and replace the NGROK_URL placeholder with the Forwarding URL from the previous section. After pasting the URL, remove the https:// from the URL as this won’t be needed in your application.
Now that your environment variables are set, let's add an ElevenLabs voice to your ConversationRelay application.
Integrating ElevenLabs
Within your project directory, navigate to the server.js file. This file handles the core communication setup between your phone call, Twilio, and the WebSocket it will set up.
This file defines a /twiml route that returns TwiML instructions for Twilio to initiate a WebSocket connection via the /ws route. Through this WebSocket, messages are exchanged between Twilio and your app which will enable both receiving input and sending text responses for Text-to-Speech.
Locate lines 27 to 32 where you’ll see the following TwiML instructions that will connect the phone call to the ConversationRelay service:
Line 4 in the above code snippet is where you’ll adjust the TwiML to add an ElevenLabs voice to your service. Before we add ElevenLabs as a TTS provider for your service, let's choose a voice and customize it.
Customizing your ElevenLabs Voice
With ElevenLabs, you have access to over 1000 voices from their library. In addition to providing realistic and natural-sounding voice synthesis, they also allow you to customize voice settings, including the model, speed, similarity and stability of the voices.
You’ll be customizing these voices in the voice attribute of the ConversationRelay TwiML noun by adding a hyphen to the end of the attribute followed by an underscore-separated string with values for speed, stability, and similarity respectively. This may sound confusing but you’ll see how the final string will look at the end of this section.
Voice
You can find a list of voices to use here: Available ElevenLabs Voices.
For this tutorial we’ll be using Amelia – a young British English woman's voice that's expressive and enthusiastic. However, you can choose whichever voice you’d like for your service from the linked list above. Once you’ve found a voice, take note of its voice ID. Amelia’s voice ID is ZF6FPAbjXT4488VcRRnw.
Audio Model
The default audio model that's used for the voices is the Flash 2.5 model. It’s ElevenLabs fastest speech model with high-quality speech and ultra-low latency.
For this tutorial, I'll be using the Turbo 2.5 model which is similar to the Flash 2.5 model but compromises a bit of latency for a higher quality voice generation.
To add this non-default model to your voice settings, you’ll add a hyphen followed by the model ID after the voice ID. The model ID for the Turbo 2.5 model is flash_v2_5.
Speed
Adjusting the speed setting allows you to control how fast or slow the generated speech sounds. The default speed is set to 1.0. The lowest speed you can set is 0.7 and the highest speed you can set is 1.2.
I tend to find that the speech in automated calls often feels too slow, so for this tutorial I'll set the speed to 1.2.
Stability
The stability setting allows you to control how stable the voice is and the randomness of the generation. Lowering the stability makes the voice more emotional and dramatic and increasing the stability will make the voice more monotone and serious.
For this tutorial, I’ll be using the default stability value of 1.0.
Similarity
Changing the similarity setting will change how close the AI will mimic the original voice when attempting to replicate it. I’ll also be using the default value of 1.0 for this setting.
Text Normalization
You can enable text normalization using the elevenlabsTextNormalization TwiML attribute. By default this setting is off but by setting this attribute to ”on”, ElevenLabs’ will take the extra step of normalizing the text for you. Although it will increase the latency a bit, I'll be turning this attribute to ”on” to ensure the quality of pronunciations.
Applying the Voice Settings
Now that we’ve chosen the settings, let's adjust the code to add our customized ElevenLabs voice. To recap, the following are the values we will be choosing for our voice settings:
- Voice ID: ZF6FPAbjXT4488VcRRnw(Amelia)
- Model: flash_v2_5
- Speed: 1.2
- Stability: 1.0
- Similarity: 1.0
To apply these settings, add a hyphen to the end of the voice ID followed by an underscore-separated string with values for speed, stability, and similarity respectively. This final string will be the value of the voice attribute:
To turn on text normalization, the  elevenlabsTextNormalization TwiML attribute will be set to ”on”:
Finally, you’ll need to set the ttsProvider attribute to ”ElevenLabs”:
Add the above attributes to the ConversationRelay noun on your code (line 30) and it should look like the following:
Save the file and your ConversationRelay service should be ready to test!
Connect your Application to Twilio
Your ngrok tunnel should already be running so navigate back to where it's running on your terminal and copy the Forwarding address (this time with the https://).
You’ll now hook up your application to your Twilio number using this forwarding URL; this URL will forward all requests to your Node.js application.
Navigate to the Active Numbers section of your Twilio Console. You can head there by clicking Phone Numbers > Manage > Active numbers from the left tab on your Console.
Now, click on the Twilio number you’d like to use for your ConversationRelay service. Within the Voice Configuration section, select Webhook for the “A call comes in” dropdown and then within the next textbox, enter your forwarding URL given by ngrok followed by " /twiml" (see below how the URL should look like). Lastly, ensure the HTTP dropdown has HTTP GET selected
.
 
			
		 
			
		Once added, scroll down and click Save configuration.
Run and Test the Application
Navigate back to your terminal and leave the ngrok tunnel running. Open a new tab (ensure you’re in the cr-demo directory) and run the following command to install the needed dependencies for your application (you can find a list of the packages in the package.json file):
Your application is now ready to run! Execute the following command to start your service:
Now call your Twilio number to hear how your chosen ElevenLabs voice sounds! Feel free to adjust the settings on the voice attribute to refine the voice, then restart your node server to hear the updated voice.
Conclusion
By integrating ElevenLabs with Twilio’s ConversationRelay, you can create rich, human-like voices that bring your voice applications to life. Now that you’re ready to explore and experiment with your voice application, take it a step further with our other integrations with ConversationRelay:
- ConversationRelay Application and Architecture for Voice AI Applications Built on AWS
- Integrate Twilio ConversationRelay with Twilio Flex for Contextual Escalations
Dhruv Patel is a Developer on Twilio’s Developer Voices team. You can find Dhruv working in a coffee shop with a glass of cold brew or he can either be reached at dhrpatel [at] twilio.com.
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.
 
     
    