Integrate Google Gemini with Twilio Voice Using ConversationRelay and Python

April 20, 2026
Written by
Reviewed by

Imagine having real-time, human-like conversations with an AI model over the phone. With Twilio ConversationRelay, you can make that a reality. It connects your voice call with any Large Language Model (LLM) via a fast, event-driven WebSocket, allowing seamless communication.

In this post, I'll show you how to set up Google Gemini with Twilio Voice using ConversationRelay. By the end, you'll have a simple Python server running, ready to let you dial into a Twilio number and talk with Gemini. It's a perfect starting point for building more interactive, feature-rich voice applications.

Prerequisites

To follow along with this tutorial, you'll need:

Set up the project

Let's start by creating a new directory for your project:

mkdir twilio-cr-gemini-python
cd twilio-cr-gemini-python

Next, create a virtual environment and activate it:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Now, install the required dependencies using pip:

pip install google-genai python-dotenv uvicorn fastapi websockets

You'll use FastAPI as your framework to quickly spin up a server for both the WebSocket and the Twilio route.

All of the code for this quickstart is available on GitHub.

Configure environment variables

You'll need to securely store your Google Gemini API key. Create a .env file in your project folder and add the following:

GOOGLE_API_KEY="YOUR_GEMINI_API_KEY"
NGROK_URL="your-ngrok-subdomain.ngrok-free.app"

Be sure to replace the placeholders with your actual API key and your Ngrok forwarding URL (you'll set this up later in the post).

Create the server

Now, create a new Python file called main.py. This is where you'll write the main logic of the application.

1. Import necessary libraries and set up the environment

The first step in setting up the server is importing the necessary libraries. These libraries will help you build the server, handle WebSocket connections, interact with Google Gemini, and load environment variables.

import os
import json
import uvicorn
from google import genai
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from fastapi.responses import Response
from dotenv import load_dotenv
  • FastAPI: This is the main web framework used for building the server. It allows you to easily define routes and handle HTTP requests. It also enables WebSocket communication between the server and Twilio's ConversationRelay.
  • uvicorn: A fast ASGI server that runs the FastAPI application.
  • genai: The Google Gen AI SDK allows you to interact with Google's Gemini models to generate responses.
  • dotenv: This module helps load environment variables (like your Google API key and Ngrok URL) from a .env file.

After importing the libraries, add the following line to load environment variables from the .env file:

load_dotenv()

This ensures sensitive information (like API keys) are not hardcoded into the code.

2. Constants and configuration

Define some constants to use throughout the code to customize the behavior of the application.

PORT = int(os.getenv("PORT", "8080"))
DOMAIN = os.getenv("NGROK_URL")
if not DOMAIN:
    raise ValueError("NGROK_URL environment variable not set.")
WS_URL = f"wss://{DOMAIN}/ws"
WELCOME_GREETING = "Hi! I am a voice assistant powered by Twilio and Google Gemini. Ask me anything!"
SYSTEM_PROMPT = """You are a helpful and friendly voice assistant. This conversation is happening over a phone call, so your responses will be spoken aloud. 
Please adhere to the following rules:
1. Provide clear, concise, and direct answers.
2. Spell out all numbers (e.g., say 'one thousand two hundred' instead of 1200).
3. Do not use any special characters like asterisks, bullet points, or emojis.
4. Keep the conversation natural and engaging."""
sessions = {}
  • PORT: The port for the server to listen on (either from an environment variable or default to 8080).
  • DOMAIN: The Ngrok URL that will be used to establish a WebSocket connection with Twilio. The code validates it immediately and raises an error if it's not set.
  • WS_URL: The full WebSocket URL for the endpoint where Twilio will connect.
  • WELCOME_GREETING: The greeting that will be spoken when the call connects.
  • SYSTEM_PROMPT: This is a special message sent to the Gemini model to instruct it on how to behave. It's important because it helps shape the tone and format of the AI's responses (e.g., spelling out numbers, avoiding emojis). Since this is a voice conversation, the rules help ensure clean, speakable output.
  • sessions: A dictionary to store active chat session objects for each call (using the callSid as the key). Note: This implementation only cleans up on clean disconnect – in production, consider adding session timeout handling.

3. Initializing FastAPI and the Gemini client

Initialize the FastAPI application and set up the Gemini client using the API key stored in the .env file.

GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
if not GOOGLE_API_KEY:
    raise ValueError("GOOGLE_API_KEY environment variable not set.")
client = genai.Client(api_key=GOOGLE_API_KEY)
app = FastAPI()
  • GOOGLE_API_KEY: The code loads the API key from environment variables and validates it's present before proceeding.
  • client: Initialize the Google Gen AI client using genai.Client(). This is the new SDK pattern – if you've used the older google-generativeai package before, note that this is the newer google-genai SDK.
  • FastAPI app: This creates the web server that will handle HTTP requests and WebSocket connections.

4. AI response function

Now create a function that will interact with the Gemini API and get the AI's response.

def gemini_response(chat_session, user_prompt):
    """Get a response from the Gemini API."""
    response = chat_session.send_message(user_prompt)
    return response.text
  • gemini_response(): This function takes a Gemini chat session object and the user's prompt, sends the message, and returns the text response. The chat session object automatically maintains conversation history, so Gemini has full context of the conversation with each new message.

5. TwiML response for Twilio

The /twiml endpoint is designed to provide Twilio with instructions on how to connect a voice call to your WebSocket application. Twilio uses TwiML (Twilio Markup Language) to determine how to handle voice interactions.

@app.post("/twiml")
async def twiml_endpoint():
    """Endpoint that returns TwiML for Twilio to connect to the WebSocket"""
    xml_response = f"""<?xml version="1.0" encoding="UTF-8"?>
    <Response>
    <Connect>
    <ConversationRelay url="{WS_URL}" welcomeGreeting="{WELCOME_GREETING}" ttsProvider="ElevenLabs" voice="FGY2WhTYpPnrIDTdsKH5" />
    </Connect>
    </Response>"""
    return Response(content=xml_response, media_type="text/xml")
  • The TwiML response instructs Twilio to connect to the WebSocket server (via the WS_URL) and to greet the caller with the WELCOME_GREETING message.
  • In this example, the code uses ElevenLabs as the text-to-speech provider with a specific voice ID (FGY2WhTYpPnrIDTdsKH5 — this is an example voice ID; replace it with your own if using ElevenLabs). This is an optional customization — you can remove the ttsProvider and voice attributes entirely to use Twilio's default TTS, or swap "ElevenLabs" for "Amazon" or "Google" if you prefer a different provider.
  • Refer to the ConversationRelay Docs for more information on all the supported attributes like language, ttsProvider, and voice.
  • Note: This endpoint doesn't validate Twilio request signatures. For production use, see Twilio's request validation documentation.

6. WebSocket connection handling

The /ws WebSocket endpoint handles real-time communication between Twilio and the server. It receives messages from Twilio, processes them, and sends responses back.

@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
    """WebSocket endpoint for real-time communication"""
    await websocket.accept()
    call_sid = None
    try:
        while True:
            data = await websocket.receive_text()
            message = json.loads(data)
            if message["type"] == "setup":
                call_sid = message["callSid"]
                print(f"Setup for call: {call_sid}")
                sessions[call_sid] = client.chats.create(
                    model="gemini-2.5-flash",
                    config={"system_instruction": SYSTEM_PROMPT}
                )
            elif message["type"] == "prompt":
                if not call_sid or call_sid not in sessions:
                    print(f"Error: Received prompt for unknown call_sid {call_sid}")
                    continue
                user_prompt = message["voicePrompt"]
                print(f"Processing prompt: {user_prompt}")
                chat_session = sessions[call_sid]
                response_text = gemini_response(chat_session, user_prompt)
                await websocket.send_text(
                    json.dumps({
                        "type": "text",
                        "token": response_text,
                        "last": True
                    })
                )
                print(f"Sent response: {response_text}")
            elif message["type"] == "interrupt":
                print(f"Handling interruption for call {call_sid}.")
            else:
                print(f"Unknown message type received: {message['type']}")
    except WebSocketDisconnect:
        print(f"WebSocket connection closed for call {call_sid}")
        if call_sid in sessions:
            sessions.pop(call_sid)
            print(f"Cleared session for call {call_sid}")
  • The while loop listens for incoming messages from Twilio. The server processes different message types:
  • setup: Initializes a new Gemini chat session for this call. The code uses client.chats.create() with the gemini-2.5-flash model and passes the system prompt via the config parameter. This creates a stateful chat object that maintains conversation history automatically.
  • prompt: Processes the user's voice input (transcribed to text by Twilio) and sends it to Gemini via the gemini_response() function. The response text is sent back to Twilio, which converts it to speech.
  • interrupt: Handles any interruptions during the conversation (e.g., when the caller speaks over the AI).
  • Unknown message: Logs any unrecognized message types.
  • sessions: Active Gemini chat sessions are stored in the sessions dictionary, indexed by the call_sid. When the WebSocket disconnects, the session is cleaned up.

7. Run the FastAPI server

Finally, run the FastAPI server using Uvicorn:

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=PORT)

uvicorn.run(): This starts the server, listening on the specified port (default is 8080). The server is now ready to handle both HTTP requests (for TwiML) and WebSocket connections (for real-time conversation).

Run the server

To run the server, first, open a terminal and start an Ngrok tunnel:

ngrok http 8080

Open up the ngrok connection socket first, because you will need the ngrok URL in two places: in the Twilio console, and in your .env file.

Take note of your Ngrok URL (e.g., https://1234abcd.ngrok.app) and add the domain part to your .env file:

NGROK_URL="1234abcd.ngrok-free.app"

Now you can start the server with:

python main.py

Configure Twilio

Go to the Twilio console and select your Twilio phone number. In the Configure tab, under Voice Configuration settings, set the webhook URL to your Ngrok URL (including /twiml), like so:

https://1234abcd.ngrok-free.app/twiml

Test the integration

With everything set up, dial your Twilio phone number from your mobile phone. You should hear the greeting message, and you can start asking questions to the AI.

What's next?

This Python app sets up a FastAPI server that connects Twilio's voice capabilities with ConversationRelay, allowing real-time communication with an AI assistant powered by Google Gemini. The server takes care of both HTTP requests (to give Twilio its instructions) and WebSocket connections (for live communication with the AI). As you dial in, ConversationRelay orchestrates transcription handling, communications with the LLM, and text-to-speech based on the LLM's response, to power a fluid conversation between you and a Gemini-powered AI assistant.

Now that you have a working real-time voice assistant, you can explore other customizations like swapping TTS providers, adding Function Calling to let Gemini check the weather or book appointments during the call, or experimenting with different Gemini models.

If you are looking for more examples and demos with ConversationRelay, check out these blog posts from other Twilions:

Prefer to watch? Here's my walkthrough of the Twilio and Gemini Flash integration:




Rishab Kumar is a Developer Evangelist at Twilio and a cloud enthusiast. Get in touch with Rishab on Twitter @rishabincloud and follow his personal blog on cloud, DevOps, and DevRel adventures at youtube.com/@rishabincloud