Build Real-Time Voice Translation with Python, FastAPI, and Twilio ConversationRelay
Time to read:
Language barriers in real-time communication have always presented a fascinating technical challenge: How do you build a system that can seamlessly translate conversations between two people speaking different languages during a live phone call?
A previous serverless demo tackled this using AWS’s infrastructure, but this new project takes a different approach with FastAPI and Python. Using Twilio's ConversationRelay, OpenAI's translation capabilities, and modern web technologies, this demo explores how real-time, cross-lingual communication can be made efficient and scalable.
This demo project tackles these challenges using a modern tech stack to create a working proof-of-concept.
Demo app overview
This demo app enables real-time, bidirectional voice translation between two phone participants. Each person can speak and hear in their own language. The system is built on a scalable FastAPI backend and uses Twilio ConversationRelay to manage calls and media processing.
Key Features
- Bidirectional Translation: Both parties can speak and hear in their preferred languages.
- Configurable Languages: Set phone numbers, source and target languages through Web UI.
- Scalable Architecture: Uses FastAPI and asynchronous processing for high performance.
Technical architecture


Core components in the application
To better understand how this application works under the hood, here’s a breakdown of the key components and endpoints that power real-time voice translation. These elements coordinate call sessions, handle WebSocket communication, process voice input, and manage translation flows—working together to deliver a seamless multilingual conversation experience.
- Translation Session Management
TranslationSession
class manages call pairs and WebSocket connections- Tracks source and target call SIDs, phone numbers, and languages
- Maintains WebSocket connections for both callers
- WebSocket Endpoints
/ws/source/{session_id}
- Handles source caller WebSocket/ws/target/{session_id}
- Handles target callee WebSocket- Real-time bidirectional voice processing
- Voice Webhooks
/voice/source/{session_id}
- Outbound call webhook for source language speaker/voice/target/{session_id}
- Outbound call webhook for target language speaker
- Translation Engine
- Streaming translation using LiteLLM and OpenAI GPT-4
- Configurable source and target languages
How it works
The system creates a bridge between caller and callee through the following flow:
- Session Management: Each translation session initiates two outbound calls—one to the source (caller) and one to the target (callee).
- ConvRelay Configuration: Outbound call webhooks are configured to set up ConversationRelay sockets for media streaming.
- Bi-Directional flow:
- On the source-to-target path:
- ConvRelay Socket 1 receives transcribed text from the Source Caller (via internal STT).
- The text is translated and sent to ConvRelay Socket 2, where it is converted back to speech (via TTS) and played to the Target Callee.
- On the target-to-source path:
- ConvRelay Socket 2 captures the Target Callee's speech, transcribes it to text (via internal STT), and sends it to ConvRelay Socket 1.
- The text is translated and synthesized (via TTS) before being played to the Source Caller.
- On the source-to-target path:
Get started with the demo
Prerequisites
- Python 3.8+
- UV for python package management
- API keys for the LLM providers you’d like to test (OpenAI is used In this demo)
- ngrok for exposing your test server to Twilio
- A Twilio account and a registered phone number with voice capabilities
Run the demo locally
Now I’ll walk you through cloning the repo, getting everything set up, then running and calling the translation app.
1. Clone the Git repository:
2. Install Dependencies with: uv sync
3. Configure Environment Variables: Copy .env.sample to .env ( `cp .env.sample .env` and add your Twilio and OpenAI credentials in your favorite text editor or IDE. Also add a Twilio number where the call will be made from, in E.164 format.
You can find your Twilio credentials here in your Twilio Console.
4. Run the application:
5. Run Ngrok: Expose the application on the internet from your local machine.
6. Access the demo app by using the link provided by Ngrok in step 5.
Test the demo
Ready to see the magic in action? You’ll need two phone numbers—one for each participant. Think of it like setting up a cross-language conversation between two people who don't speak the same language!
Choose a language for each person, and the system will automatically place two outbound calls using your Twilio number (configured in your .env file). You can also select your favorite ElevenLabs voice to personalize the experience even further, use this document to start. Once connected, you’ll hear real-time translation kick in – just like having a personal interpreter on the line.
It’s a great way to experience what you’ve just built, and it feels pretty cool seeing it all come together!


Conclusion
This demo showcases how you can use FastAPI, Python, Twilio’s ConversationRelay, and large language models (LLMs) together to address complex real-time communication challenges. While there's still work to be done to make it production-ready, this foundation highlights the powerful potential of using technology to break down language barriers.
Additional Resources
- Twilio ConversationRelay
- Build a real-time Voice AI Assistant with Twilio’s ConversationRelay, LiteLLM, and Python
- Integrate Claude, Anthropic’s AI Assistant, with Twilio Voice using ConversationRelay
Hao Wang is a Solution Architect at Twilio, dedicated to empowering customers to maximize the potential of Twilio’s products. With a strong passion for emerging technologies and Voice AI, Hao is always exploring innovative ways to drive impactful solutions.
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.