ConversationRelay and the AI Agent Builder Dify: A Multimodal Example with Voice
Time to read:
Think about those great customer experiences you can remember – they often go beyond a single channel. Your customers want to solve their issues quickly and efficiently, but prefer different channels for different tasks, notifications, and follow-ups.
In this post, we’ll show you how to use the Agentic Workflow Builder Dify to craft a multi-channel AI experience. You’ll use Twilio Voice and ConversationRelay to build an airline concierge that listens and responds over voice, along with Twilio SMS where your AI Agent can send you confirmations.
Sound like something you’d like to experience? Us, too – let’s build it.
Architecture of a multimodal experience built with Dify
For the primary experience you’re building, you’ll call into a virtual agent hosted with Dify. You’ll be able to talk to an airline concierge and ask questions and book travel, then receive confirmation over SMS.
Here’s a diagram of the flow you’ll be building:
![Sequence diagram showing interactions between User, Twilio Messaging, Server, and Dify for message and voice management. sequenceDiagram
participant U AS User
participant C AS ConversationRelay
participant M AS Twilio Messaging
participant S AS Server
participant F AS Dify
U->>C: [Voice] Calls in
activate U
C->>S: [HTTP] /twiml
activate C
S->>C: [Http Response] <Connect>...</Connect>
deactivate C
C->>S: [WS] Connection
S->>F: [HTTP] Streaming
F-->>S: [Http Response] Streaming back
S-->>C: [Http Response] Stream back
S-->>U: [Http Response] Read back messages
Note left of F: User can text at any point<br/>or Dify can send a text
F->>M: [HTTP] Send SMS to User
M->>U: [Text] Receive SMS
U->>M: [Text] User replies
M->>S: [HTTP] Webhook to process SMS
Note right of S: Map user phone number to<br/>Dify session id
S->>F: [HTTP] Update context of Flow
deactivate U](/content/dam/twilio-com/global/en/blog/importer-images/a-c/conversationrelay-multi-agent-dify-voice-messaging/media1/_jcr_content/renditions/compressed-original.webp)
![sequenceDiagram
participant U AS User
participant C AS ConversationRelay
participant M AS Twilio Messaging
participant S AS Server
participant F AS Dify
U->>C: [Voice] Calls in
activate U
C->>S: [HTTP] /twiml
activate C
S->>C: [Http Response] <Connect>...</Connect>
deactivate C
C->>S: [WS] Connection
S->>F: [HTTP] Streaming
F-->>S: [Http Response] Streaming back
S-->>C: [Http Response] Stream back
S-->>U: [Http Response] Read back messages
Note left of F: User can text at any point<br/>or Dify can send a text
F->>M: [HTTP] Send SMS to User
M->>U: [Text] Receive SMS
U->>M: [Text] User replies
M->>S: [HTTP] Webhook to process SMS
Note right of S: Map user phone number to<br/>Dify session id
S->>F: [HTTP] Update context of Flow
deactivate U](/content/dam/twilio-com/global/en/blog/importer-images/a-c/conversationrelay-multi-agent-dify-voice-messaging/media1/_jcr_content/renditions/compressed-original.webp)
And here’s a video of what you’ll be building:
Prerequisites
Before you can build our sample app, you’ll need to sign up for accounts and have a few things ready.
- A Twilio Account (you can sign up here for free
- A Twilio Phone number with Voice capabilities
- (Optional) A Twilio phone number with Messaging capabilities
- Dify Account
- API Key to your fav LLM model
- Node.js (20+)
- Ngrok (guide: https://ngrok.com/docs/getting-started/)
- Fly.io (if you want to host remotely)
Build a multimodal experience with Dify
After you build the multi-model customer experience with Dify, you’ll be able to make a voice call (handled by Twilio Voice and ConversationRelay).
First, we’ll log into Dify and build our agent experience. Then, we’ll clone the repo locally and do some set up steps. After that, we’ll test locally with ngrok, then optionally deploy to Fly.io. Let’s get going!
Build agents on Dify
It’s time to head to Dify! Follow these steps to build your agent. After you run through these, we’ll start working on the local server.


1. Visit https://dify.ai/ and sign up for an account, if you haven’t yet.
2. Log in and create a new Chatflow and configure it as you see fit.
3. Click on your avatar in the top right corner, then select Settings, then Model Provider. Set up the model(s) you want to use (I used gpt-4o
for this demo) and provide any needed API keys.
4. Go back to the home page and select Tools from the top menu bar. Search for Twilio
, and select Send Message. Set up the credentials for your Twilio account (you can find your Account SID and Auth Token in your Twilio Console).
5. Go back to the Dify Studio from the top menu. Click Create from blank and select Chatflow. Give it a name and description.
6. Delete the LLM component. Add a node using +, then select Agent. Connect Start -> Agent -> Answer.
7. Select Start and add a custom input field. Choose Short Text, set the name as from
and mark it as required.
8. Select Agent:
- For Agentic Strategy, select Agent -> ReAct.
- For Model, choose your favorite model from the dropdown.
- Add Twilio - Send Message to the list of available tools.
- Add a friendly instruction for the system prompt. (You can find our example below the list).
- Inside the Query add the following:
Query:{{start.sys.query}}
CustomerNumber:{{start.x.from}}
ConversationId: {{start.sys.conversation_id}}
- Enable memory
9. Now Publish your changes.
10. Select API Access from the left panel, and click on API Key in the top right corner. Create a new service key there – you will save it to your local .env file as DIFY_API_KEY
.
Example system prompt
Feel free to use our system prompt for your test!
Okay, great! Now it’s time to do some local work. With your DIFY_API_KEY
, let’s head to the terminal.
Set up your local environment
Let’s get everything set up locally. Open up a terminal, then following these steps should get you where you need to be.
- Clone our repo using
git clone https://github.com/twilio-samples/cr-dify
. - Run
cd cr-dfy
to enter the directory. - Run
npm install
to install the dependencies. - Then, copy the example file to an .env file with
cp .env.example .env
. Then, in your favorite text editor or IDE, set up the variables in the .env file (including theDIFY_API_KEY
from the previous step. - Again in your IDE, modify
handlers/twiml.ts
to change the default greeting message or make other changes to ConversationRelay. (All available attributes can be found here.) - Run
npm run dev
to start the dev server. - In another tab, expose your local server to the internet using
ngrok http 8080
- Visit your Twilio Console Buy a Number page to purchase a number with Voice capabilities, if you haven’t yet.
- Then, configure the Incoming Webhook (from selecting the number you want to use from your Available Numbers page) to point to
ngrokdomain.com/twiml
, swappingngrokdomain.com
to the ngrok domain you got from step 7, above.
Now – fingers crossed – go ahead and call your number.
Once you hear a greeting (nice!), you can test by asking the agent for flights between two cities, for example, "Can you tell me your flight options leaving from Boston heading to Tampa on Thursday?", then attempting to book one of the options, and waiting for a result. You can also ask the Agent to send you a list of flight information as SMS, then confirm via text or voice.
If you got a confirmation? Hang up – congrats! You now have a multimodal AI agent experience running over both Voice and SMS which you can quickly edit using Dify.
(Optional) Deploy to the cloud with Fly.io
Ngrok is an amazing tool for local development, but once you close your laptop, your phone number will no longer work.
Here, we have instructions on how to use Fly.io to host your application. It supports long-running WebSocket connections and has a free tier (note that when you sign up, you have to add your credit card, but you can deploy free servers for testing purposes).
This repo includes the configuration for easily deploying the application to Fly.io. All the configuration is located in the fly.toml file.
- Sign up for Fly.io and install/setup the CLI.
- Create a new Fly App:
fly apps create cr-dify
- Grab the domain name of your newly created Fly app and update the local .env with the
DOMAIN
value. - Set up the secrets:
fly secrets import < .env
- Deploy the application:
fly deploy
- Scale down instances to 1 (if you want to keep the free tier) using
fly scale count 1
- View the logs:
fly logs
You can now set the Incoming Webhook of your Twilio number to point to https://DOMAIN/twiml
and repeat the tests – even with your laptop closed!
Setting up a multi-agent solution with Flowise, ConversationRelay and Twilio Voice
You now have a multimodal AI pattern you can reuse: ConversationRelay and Twilio Voice handle the audio, while your Dify agent drives reasoning and actions (including those awesome out of band SMS follow-ups through Twilio SMS).
You’ve now built with Dify, tested locally with ngrok, and possibly finished by deploying to the cloud with Fly.io. You’ve confirmed that you can speak naturally while your AI agent provides options, asks clarifying questions, responds via voice, and shares confirmations over SMS. Now all that’s left is for you to make the experience yours. Happy building!
Additional resources
About Twilio Forward
Twilio Forward focuses on Horizon-3 initiatives focused on driving step-change innovation that empowers builders and unlocks Twilio’s next era of growth. As an incubation lab, we explore bold new ideas, from the most advanced, almost unimaginable technologies to emerging solutions that address today’s real-world challenges. Our mission is to push boundaries, reimagine what’s possible, and build what comes next.
Kousha Talebian is a Principal Engineer from Vancouver, BC, working on the Emerging Technology and Innovation team. You can reach him at ktalebian [at] twilio.com . Outside of work, Kousha enjoys running with his dog and experimenting with various cuisines from around the world.
Rikki Singh is a product and engineering leader based in Bay Area, California. At Twilio, she leads the Emerging Technology and Innovation group Twilio Forward. Outside of work, Rikki enjoys hiking and camping with her husband and toddler.
Paul Kamp is the Technical Editor-in-Chief of the Twilio Blog. You can reach him at pkamp [at] twilio.com
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.