Building an Outbound Voice Agent with Twilio and Deepgram
Time to read:
Building an Outbound Voice Agent with Twilio and Deepgram
Building an outbound voice agent on Twilio means solving problems that an inbound voice agent doesn't have. You have to originate the call, detect whether a human or voicemail picked up, inject context so the agent knows who it's talking to and why, and route the outcome back into your system of record when the call ends.
My team at Deepgram shipped a reference implementation for outbound telephony voice agents on Twilio that handles all of this. Your system triggers a call via a REST API, an AI agent runs a scripted conversation with the recipient, and a structured call outcome lands in your CRM when it's done. Voicemail gets a personalized message instead.
Here's the architecture and what it handles that simpler implementations skip.
How the outbound voice agent architecture works
If you've read the companion post on inbound voice agents, the core audio bridge here is the same: a Python application built on Starlette with a VoiceAgentSession class that holds two WebSocket connections (one to Twilio, one to Deepgram's Voice Agent API) and shuttles audio between them. Deepgram's Voice Agent API combines speech-to-text, LLM reasoning, and text-to-speech into one real-time loop. You send audio in and get audio back, instead of stitching together separate services.
Flux, Deepgram's speech-to-text model for voice agents, handles turn-taking natively so the agent knows when the caller is done speaking. Barge-in (caller interruption) is wired into the Twilio clear event so callers can cut in at any time.
What's different from inbound is everything that happens before the audio bridge kicks in.
The example scenario is a homeowners insurance lead follow-up. A customer submits a quote request on your website. Later, your system triggers an outbound call to verify their details, gather a few more data points (roof age, recent claims), and book a consultation with a licensed agent. If the call goes to voicemail, the agent leaves a personalized message instead.
To kick off that call, an external system (your CRM, a CLI script, a webhook) hits the server's POST /make-call endpoint with the recipient's phone number and lead context:
The server takes it from there: it calls the Twilio REST API with inline TwiML containing <Connect><Stream> pointing back at its own WebSocket, Twilio dials the recipient, and once the call connects, the VoiceAgentSession bridge takes over. It's a pattern Twilio customers build all the time: CRM-triggered outbound voice with structured data flowing back into the system of record.
Answering machine detection, CRM integration, and other outbound concerns the repo handles
Answering machine detection (AMD) with Twilio's async API. When the call connects, the server doesn't know whether a human or voicemail picked up. It buffers incoming audio while Twilio's async answering machine detection runs (roughly 2 to 4 seconds), and when the result POSTs back to /amd-result, the session branches:
- Human detected: connect to the Deepgram Voice Agent API, flush the buffered audio, start the conversation.
- Voicemail detected: deliver a personalized message using Deepgram's Aura-2 text-to-speech model and hang up.
The repo also handles late detection. If AMD flags voicemail after the agent has already started talking to what it thought was a human, the session tears down the Deepgram connection mid-call and switches to voicemail delivery. That edge case is the kind of thing that only surfaces when you're doing outbound at volume, and it's already solved here.
Lead context injection into the voice agent prompt. The agent's system prompt is built dynamically from the lead data in the POST /make-call request. The agent knows the recipient's name, property details, and what they requested before the conversation starts. That's what lets the opening line be "I'm following up on your homeowners insurance quote for 742 Evergreen Terrace" instead of "Hi, this is an automated call."
Structured call outcomes posted back to the CRM. This is the detail that turns a voice agent from a demo into a workflow. The Deepgram Voice Agent API supports function calling, which means the LLM powering the agent can trigger specific actions during the conversation (checking appointment availability, booking a slot, etc.). An update_lead function runs at the end of every call, capturing disposition, verified info, new info gathered, and a natural-language summary as structured JSON. In the reference implementation it logs to the console. In production it would POST to your CRM, webhook, or database. The LLM decides when the call is done and what to report, and the server routes it to your backend.
Silence monitoring with Deepgram's InjectAgentMessage. A silence monitor tracks whether the recipient is responding. After a stretch of dead air, the agent prompts the caller with "Are you still there?" using Deepgram's InjectAgentMessage API. This is a server-side command that tells the voice agent to speak a specific line you provide, rather than having the LLM generate something on its own. The server eventually ends the call if there's still no response. It's a small detail, but it matters when you're doing outbound at volume and paying for dead air.
Getting started with the outbound voice agent
The repo includes a setup wizard that configures Twilio and deploys to Fly.io (a platform for running app servers close to your users). You'll need Python 3.12+, a Twilio account, and a Deepgram API key (free, $200 in credits, no credit card required).
The wizard walks you through picking a Twilio number, generating an endpoint secret for the /make-call endpoint, and deploying to Fly.io. Once it's up, trigger a test call:
Your phone rings. The voice agent runs through the insurance lead follow-up conversation. Check the server logs for the full transcript and the structured call outcome JSON.
Fork it, strip out the insurance scenario, and drop in your own prompts, functions, and CRM integration. MIT-licensed.
If you're looking for the inbound counterpart, where the caller dials your number and talks to an AI receptionist, see Building an Inbound Voice Agent with Twilio and Deepgram.
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.