This application demonstrates how to use Twilio and OpenAI's Realtime API for bidirectional voice language translation between a caller and a contact center agent.
The AI Assistant intercepts voice audio from one party, translates it, and speaks the audio in the other party's preferred language. Use of the Realtime API from OpenAI offers significantly reduced latency that is conducive to a natural two-way voice conversation.
See here for a video demo of the real time translation app in action.
Below is a high level architecture diagram of how this application works: Realtime Translation Diagram:
This application uses the following Twilio products in conjunction with OpenAI's Realtime API, orchestrated by this middleware application:
Two separate Voice calls are initiated, proxied by this middleware service. The caller is asked to choose their preferred language, then the conversation is queued for the next available agent in Twilio Flex. Once connected to the agent, this middleware intercepts the audio from both parties via Media Streams and forwards to OpenAI Realtime for translation. The translated audio is then forwarded to the other party.
The code for this sample is available on GitHub to view.
You will need an Account SID and Auth Token in order to run this code.
Checking for existing account...
Follow the setup instructions in the README to get the sample up and running.