ConversationRelay and the AI Agent Builder Dify: A Multimodal Example with Voice

September 29, 2025
Written by
Reviewed by
Paul Kamp
Twilion

Think about those great customer experiences you can remember – they often go beyond a single channel. Your customers want to solve their issues quickly and efficiently, but prefer different channels for different tasks, notifications, and follow-ups.

In this post, we’ll show you how to use the Agentic Workflow Builder Dify to craft a multi-channel AI experience. You’ll use Twilio Voice and ConversationRelay to build an airline concierge that listens and responds over voice, along with Twilio SMS where your AI Agent can send you confirmations.

Sound like something you’d like to experience? Us, too – let’s build it.

Architecture of a multimodal experience built with Dify

For the primary experience you’re building, you’ll call into a virtual agent hosted with Dify. You’ll be able to talk to an airline concierge and ask questions and book travel, then receive confirmation over SMS.

Here’s a diagram of the flow you’ll be building:

sequenceDiagram
    participant U AS User
    participant C AS ConversationRelay
    participant M AS Twilio Messaging
    participant S AS Server
    participant F AS Dify


    U->>C: [Voice] Calls in
    activate U
    C->>S: [HTTP] /twiml
    activate C
    S->>C: [Http Response] <Connect>...</Connect>
    deactivate C
    C->>S: [WS] Connection
    S->>F: [HTTP] Streaming
    F-->>S: [Http Response] Streaming back
    S-->>C: [Http Response] Stream back 
    S-->>U: [Http Response] Read back messages
    Note left of F: User can text at any point<br/>or Dify can send a text
    F->>M: [HTTP] Send SMS to User 
    M->>U: [Text] Receive SMS
    U->>M: [Text] User replies
    M->>S: [HTTP] Webhook to process SMS
    Note right of S: Map user phone number to<br/>Dify session id
    S->>F: [HTTP] Update context of Flow
    deactivate U
sequenceDiagram
    participant U AS User
    participant C AS ConversationRelay
    participant M AS Twilio Messaging
    participant S AS Server
    participant F AS Dify


    U->>C: [Voice] Calls in
    activate U
    C->>S: [HTTP] /twiml
    activate C
    S->>C: [Http Response] <Connect>...</Connect>
    deactivate C
    C->>S: [WS] Connection
    S->>F: [HTTP] Streaming
    F-->>S: [Http Response] Streaming back
    S-->>C: [Http Response] Stream back 
    S-->>U: [Http Response] Read back messages
    Note left of F: User can text at any point<br/>or Dify can send a text
    F->>M: [HTTP] Send SMS to User 
    M->>U: [Text] Receive SMS
    U->>M: [Text] User replies
    M->>S: [HTTP] Webhook to process SMS
    Note right of S: Map user phone number to<br/>Dify session id
    S->>F: [HTTP] Update context of Flow
    deactivate U

And here’s a video of what you’ll be building:

Prerequisites

Before you can build our sample app, you’ll need to sign up for accounts and have a few things ready.

Build a multimodal experience with Dify

After you build the multi-model customer experience with Dify, you’ll be able to make a voice call (handled by Twilio Voice and ConversationRelay).

First, we’ll log into Dify and build our agent experience. Then, we’ll clone the repo locally and do some set up steps. After that, we’ll test locally with ngrok, then optionally deploy to Fly.io. Let’s get going!

Build agents on Dify

It’s time to head to Dify! Follow these steps to build your agent. After you run through these, we’ll start working on the local server.

Diagram showing the flow from a start point to an agent using GPT-4 model and answering in a tool.

1. Visit https://dify.ai/ and sign up for an account, if you haven’t yet.

2. Log in and create a new Chatflow and configure it as you see fit.

3. Click on your avatar in the top right corner, then select Settings, then Model Provider. Set up the model(s) you want to use (I used gpt-4o for this demo) and provide any needed API keys.

4. Go back to the home page and select Tools from the top menu bar. Search for Twilio, and select Send Message. Set up the credentials for your Twilio account (you can find your Account SID and Auth Token in your Twilio Console).

5. Go back to the Dify Studio from the top menu. Click Create from blank and select Chatflow. Give it a name and description.

6. Delete the LLM component. Add a node using +, then select Agent. Connect Start -> Agent -> Answer.

7. Select Start and add a custom input field. Choose Short Text, set the name as from and mark it as required.

8. Select Agent:

  • For Agentic Strategy, select Agent -> ReAct.
  • For Model, choose your favorite model from the dropdown.
  • Add Twilio - Send Message to the list of available tools.
  • Add a friendly instruction for the system prompt. (You can find our example below the list).
  • Inside the Query add the following:Query:{{start.sys.query}}

CustomerNumber:{{start.x.from}}ConversationId: {{start.sys.conversation_id}}

  • Enable memory

9. Now Publish your changes.

10. Select API Access from the left panel, and click on API Key in the top right corner. Create a new service key there – you will save it to your local .env file as DIFY_API_KEY.

Example system prompt

Feel free to use our system prompt for your test!

You are a friendly and knowledgeable concierge for a fictitious airline called Owl Air.
When a caller speaks to you:
1) Your goal is to quickly understand their request, provide all relevant details, and guide them to a confirmed booking if they wish.
2) Always give complete, informative responses — never say vague filler like “Let me check that for you” or “I’ll get back to you”. Instead, give the answer in full, including any relevant times, dates, prices, and options you have found.
3) If information is unclear or missing, ask precise follow-up questions to clarify so you can give an exact, accurate answer.
4) Speak in a warm, conversational tone, like a high-end travel concierge who knows everything about Owl Air’s flights.
5) When providing flight options, always provide a few options - NEVER provide a single flight option.
6) Once the customer confirms they want a flight, book the flight for them and provide them the confirmation number.
7) Never break character — you are always the Owl Air concierge, available 24/7 to help book flights.
Note: Flight information should always include a flight ID, price, and time.

Okay, great! Now it’s time to do some local work. With your DIFY_API_KEY, let’s head to the terminal.

Set up your local environment

Let’s get everything set up locally. Open up a terminal, then following these steps should get you where you need to be.

  1. Clone our repo using git clone https://github.com/twilio-samples/cr-dify.
  2. Run cd cr-dfy to enter the directory.
  3. Run npm install to install the dependencies.
  4. Then, copy the example file to an .env file with cp .env.example .env. Then, in your favorite text editor or IDE, set up the variables in the .env file (including the DIFY_API_KEY from the previous step.
  5. Again in your IDE, modify handlers/twiml.ts to change the default greeting message or make other changes to ConversationRelay. (All available attributes can be found here.)
  6. Run npm run dev to start the dev server.
  7. In another tab, expose your local server to the internet using ngrok http 8080
  8. Visit your Twilio Console Buy a Number page to purchase a number with Voice capabilities, if you haven’t yet.
  9. Then, configure the Incoming Webhook (from selecting the number you want to use from your Available Numbers page) to point to ngrokdomain.com/twiml, swapping ngrokdomain.com to the ngrok domain you got from step 7, above.

Now – fingers crossed – go ahead and call your number.

Once you hear a greeting (nice!), you can test by asking the agent for flights between two cities, for example, "Can you tell me your flight options leaving from Boston heading to Tampa on Thursday?", then attempting to book one of the options, and waiting for a result. You can also ask the Agent to send you a list of flight information as SMS, then confirm via text or voice.

If you got a confirmation? Hang up – congrats! You now have a multimodal AI agent experience running over both Voice and SMS which you can quickly edit using Dify.

(Optional) Deploy to the cloud with Fly.io

Ngrok is an amazing tool for local development, but once you close your laptop, your phone number will no longer work.

Here, we have instructions on how to use Fly.io to host your application. It supports long-running WebSocket connections and has a free tier (note that when you sign up, you have to add your credit card, but you can deploy free servers for testing purposes).

This repo includes the configuration for easily deploying the application to Fly.io. All the configuration is located in the fly.toml file.

  1. Sign up for Fly.io and install/setup the CLI.
  2. Create a new Fly App: fly apps create cr-dify
  3. Grab the domain name of your newly created Fly app and update the local .env with the DOMAIN value.
  4. Set up the secrets: fly secrets import < .env
  5. Deploy the application: fly deploy
  6. Scale down instances to 1 (if you want to keep the free tier) using fly scale count 1
  7. View the logs: fly logs

You can now set the Incoming Webhook of your Twilio number to point to https://DOMAIN/twiml and repeat the tests – even with your laptop closed!

Setting up a multi-agent solution with Flowise, ConversationRelay and Twilio Voice

You now have a multimodal AI pattern you can reuse: ConversationRelay and Twilio Voice handle the audio, while your Dify agent drives reasoning and actions (including those awesome out of band SMS follow-ups through Twilio SMS).

You’ve now built with Dify, tested locally with ngrok, and possibly finished by deploying to the cloud with Fly.io. You’ve confirmed that you can speak naturally while your AI agent provides options, asks clarifying questions, responds via voice, and shares confirmations over SMS. Now all that’s left is for you to make the experience yours. Happy building!

Additional resources

About Twilio Forward

Twilio Forward focuses on Horizon-3 initiatives focused on driving step-change innovation that empowers builders and unlocks Twilio’s next era of growth. As an incubation lab, we explore bold new ideas, from the most advanced, almost unimaginable technologies to emerging solutions that address today’s real-world challenges. Our mission is to push boundaries, reimagine what’s possible, and build what comes next.


Kousha Talebian is a Principal Engineer from Vancouver, BC, working on the Emerging Technology and Innovation team. You can reach him at ktalebian [at] twilio.com . Outside of work, Kousha enjoys running with his dog and experimenting with various cuisines from around the world.

Rikki Singh is a product and engineering leader based in Bay Area, California. At Twilio, she leads the Emerging Technology and Innovation group Twilio Forward. Outside of work, Rikki enjoys hiking and camping with her husband and toddler.

Paul Kamp is the Technical Editor-in-Chief of the Twilio Blog. You can reach him at pkamp [at] twilio.com