Voice Bot Integration with Twilio Video
Time to read: 4 minutes
Imagine a customer finishing a support session in a Video Room – and instead of the session simply ending, a voice bot silently joins the conversation to collect spoken feedback and trigger next steps. In this setup, the voice bot isn’t a physical user or visible participant. It’s a Twilio Phone Number making an outbound voice API call that joins the Video Room via SIP.
Think of this as a “ghost leg”: a virtual participant connected behind the scenes, designed to listen using Twilio’s speech-to-text <Gather> verb. The bot can capture a keyword, such as “done” or “unsatisfied,” and based on that input, trigger actions like ending the Video Room, sending a notification, or routing data to your CRM.
In this blog post, you’ll learn how to use a Twilio Phone Number to dial a voice bot into a Video Room. We’ll walk through using Twilio’s Programmable Video and Voice APIs, plus built-in SIP support, along with TwiML for instructions, Twilio’s serverless Functions to host our logic, and the Programmable Voice API to make it happen.
Prerequisites
In order to follow along, you’ll need:
- A Twilio account. If you don’t yet have one, sign up here for a free trial: https://www.twilio.com/try-twilio
- You’ll also need to purchase a phone number. See this post for more details.
- If you haven't set up the Video Room yet, refer to the JavaScript SDK Demo App guide for setting up a Room to test: https://github.com/twilio/twilio-video-app-react
- Familiarity with Twilio Video, TwiML (Twilio Markup Language), and Functions
Once you have the prerequisites completed and a running Video application, we’re ready to begin the tutorial. Let’s get started!
Step 1: Create Two TwiML Bins
We’ll start by creating two TwiML responses using TwiML Bins. To create a TwiML Bin, navigate to TwiML Bins in your Twilio Console, and hit the blue Plus (+) or Create button to continue.
TwiML Bin 1 - Connect to the video room
This TwiML connects the Voice Bot (Incoming Voice Call) to the Video Room with unique room name “my-video-room”:
TwiML Bin 2 - Collect feedback via the Voice Bot
This TwiML prompts the user to say a word, pauses briefly and sends the result to a Twilio Function:


Here, be sure to change Your_Function_URL_From_The_Next_Step
to the … after you write the Function in the next step. You’ll find it in the pulldown menu on the right of the Function edit screen:


Step 2: Create a Twilio Function to act on input
Nice work! You now have two TwiML Bins handling the logic for incoming phone calls and collecting feedback. Now we’re going to move onto building a Function which handles our “hang up” logic.
Create a new Twilio Function (you can find a longer tutorial here, as well) that listens for the <Gather> input and ends the Video Room if the trigger word is detected:
Step 3: Make an API call to initiate the Voice Bot
Use a curl command (or, if you prefer, any backend logic) to make an API call that dials your Twilio number and kicks off the interaction.
This call triggers TwiML Bin 2, which initiates the feedback prompt.
Step 4: Flow in action
- A user is in a Twilio Video Room with room name “my-video-room”.
- An Outbound Voice API call is made to a Twilio number which has the TwiML 1(Connect to Video Room) associated with the phone number, connecting the voice bot to this Room.
- Once the bot is in the Room, the TwiML Bin 2 (
Collect Feedback via Voice Bot
) will execute, asking the user to say the word “disconnect” to end the Room. - If the word “disconnect” is spoken, the Twilio Function is triggered and the Video Room is closed.
Try it now, you should be able to connect to the video room, then call your bot into the room (the “ghost leg”), provide feedback, then ask the bot to close the room by saying “disconnect”. Pretty neat, right?
Conclusion
With Twilio’s Programmable APIs, combining Video and Voice unlocks powerful opportunities to create intelligent, automated, and deeply engaging customer interactions. In this tutorial, I walked through how a voice call can be used to trigger a bot to join a Video Room, collect participant feedback using Twilio’s <Gather>
TwiML verb, and then take action – such as ending the Video Room – based on that input.
This approach demonstrates how you can build more responsive and streamlined user journeys without relying on complex front-end workflows or additional manual intervention. By leveraging a Voice Bot to manage feedback and drive actions in real time, developers can deliver video experiences that adapt dynamically to customer input.
And now that you’ve seen one way to enhance your callers’ experience by inviting a survey taking bot into a room, see what may be next – my colleague Paul developed an application to have an LLM power a video avatar, so you can video call an AI Agent.
Khushbu Shaikh is a dedicated Technical Lead, Principal Technical Account Manager , serving as an invaluable asset to the Personalized Support team. With a wealth of experience, Khushbu not only excels in managing multiple customer accounts and driving impactful solutions, but also plays a key leadership role in guiding and supporting her team. For any inquiries or assistance, Khushbu can be reached at kshaikh [at] twilio.com.
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.