How to run Automated AMD Tests and Fine-Tune Twilio AMD for Accurate Voice Automation
Time to read:
Accurately detecting whether a call is answered by a human or a machine (voicemail) is fundamental to building effective outbound communication workflows. Twilio Answering Machine Detection (AMD) offers flexible parameters for recognizing humans versus machines, but tuning these for your unique business scenarios requires testing and iteration.
In this post, you'll learn how to automate AMD testing, fine-tune parameters for your use case, and analyze real call recordings to maximize AMD reliability using Python. We’ll leverage open-source tools and Twilio’s Programmable Voice APIs to create a robust test loop. Whether you’re building outbound sales, reminders, or callbacks, this approach will help you improve customer experience and operational efficiency.
In this post, you’ll learn how to:
- Set up a reproducible AMD test harness with Twilio and Python
- Prepare and label stereo call recordings for analysis
- Run automated AMD test campaigns, record results, and visualize detection
- Iteratively tune Twilio AMD parameters for maximum accuracy
Let’s get started!
Prerequisites
You’ll need:
- A Twilio Account if you don’t have one yet. (You can sign up free here)
- Two Twilio phone numbers: One for outbound calls (From), one for receiving/playing back recordings (To). See how to search for and buy phone numbers here.
- Your Twilio Account SID and Auth Token, from your Twilio Console.
- ngrok (or other tunneling) for local webhook exposure. ( Download here)
- Python 3 and virtualenv
- A batch of dual-channel (stereo) .wav call recordings (with both the caller and callee separated). We’ll show you how to generate these during the tutorial steps.
- ffmpeg installed, if it isn’t yet. You can download it here.
1. Clone and install the Twilio AMD Optimization Toolkit
First, get the open-source AMD Optimization toolkit. This contains scripts for splitting audio channels, visualizing speech/silence, launching AMD-enabled test calls, and analyzing webhook results.
Why dual-channel?
To fine-tune AMD, you need to understand exactly what the callee hears. Dual-channel (stereo) recordings let you isolate the callee audio, removing agent speech, DTMF, or background noise.
Steps:
- Place your stereo .wav files in the recordings/ directory.
- Manually label each call: Is it a human or a machine (voicemail)? This enables accurate ground-truth comparison later.
3. Configure your environment
Create a .env file at the repo root with:
Reload your environment:
You’ll get new .wav files in channel_audio/left/ and channel_audio/right/.
5. (Optional) Visualize speech, silence, and detection events
To better understand how each part of your audio recording affects AMD detection, you can visualize the audio waveforms and annotated events:
Or, to pick a different channel/output folder:
The generated images in channel_analysis/ highlight:
- Initial Silence: The quiet period before the first utterance.
- Utterance Durations: Periods of detected speech.
- Silence Gaps: The pauses between utterances throughout the greeting.
- Final Silence: The pause at the end before the call is answered or a machine beep.
- The total duration of the voicemail or greeting.
By observing these characteristics for each call, you can make informed decisions when tuning AMD parameters (like MachineDetectionTimeout, MachineDetectionSpeechThreshold, MachineDetectionSpeechEndThreshold, and MachineDetectionSilenceTimeout). For example:
- If most greetings have a short initial silence but long pauses mid-greeting, you may want to increase MachineDetectionSpeechEndThreshold to avoid premature classification.
- Calls with long final silences may require a longer MachineDetectionSilenceTimeout to allow for complete greeting playback.
These visual cues are a fast and reliable way to tailor AMD settings to your actual call data, boosting detection accuracy across your customer scenarios.
6. Set Up Your AMD test configuration
Edit amd_config.json to specify the AMD test settings.
- Update the AMD parameters for your scenario ( MachineDetectionTimeout, MachineDetectionSpeechThreshold, etc.)
- Insert your ngrok URL in the StatusCallback and AsyncAmdStatusCallback fields.
Twilio's AMD docs describe these parameters in detail.
7. Configure your “To” phone number to play recordings
Before launching any automated AMD tests, it’s important to ensure that your "To" phone number (the recipient in your test setup) knows how to serve the correct audio for each inbound call during a test run.
How to set up the webhook
- Open the Twilio Console and navigate to Phone Numbers
- Select Your To Number
- Configure the Voice Webhook to https://your_domain.ngrok.app/incoming-call
Replace your_domain.ngrok.app with your actual public ngrok URL, as generated by your tunnel in Step 10.
8. Prepare a list of recordings for testing
Add entries to recording_urls.csv in this format:
Example:
Each row corresponds to a labelled, callee-channel audio file.
9. Run the Flask server to accept webhooks
Launch a terminal and run:
If you wish to test the right channel, change the folder as:
10. Expose your server publicly with ngrok
In a separate terminal:
Take the forwarding URL (e.g., https://your_subdomain.ngrok.app) and use it as your webhook base.
11. Run automated AMD calls
With the server running and URLs configured, run:
This will initiate calls using your test scenarios, play the isolated audio, and capture all Twilio AMD detection outcomes via webhook.
12. Review the results and iterate
Each call logs the webhook results to the call_results.csv file in reports/. Compare detected outcomes against your human/machine ground truth. Use this feedback cycle to tune timeouts and thresholds in amd_config.json, then rerun your tests to verify improved accuracy.
Best practices for tuning
What you’ll need to test comes down to your business requirements, but from our experience you want to ensure a few things for your testing and tuning:
- Use a wide variety of greetings and real-world environments.
- Test incrementally, adjust one parameter at a time.
- Always validate with fresh, labelled call data.
Just like with software, you should continuously add tests and iterate. If you run into a real-world scenario where you lose detection accuracy, add a scenario to your tests.
When you are happy with your tuning, use Twilio’s Trust and Engagement Insights to monitor the real-world health and engagement of your communications.
Wrap-up
With this workflow, you can automate AMD testing, visualize detection accuracy, and make data-driven parameter adjustments for your IVR, appointment reminder, or outbound voice campaigns. Well-tuned AMD improves answer rates, saves agent time, and maximizes customer engagement.
Got questions, need help? Reach out to the Twilio Support or check out the AMD Optimization Toolkit code for more.
Happy testing and tuning!
Further Resources
- Twilio Answering Machine Detection documentation
- Answering Machine Detection FAQ & Best Practices
- AMD Optimization Toolkit on GitHub
- Detecting iOS 26 Call Screening and Leaving Voicemail
Rosina Garcia Bru is a Product Manager at Twilio Programmable Voice. Passionate about creating seamless communication experiences. She can be reached at rosgarcia [at] twilio.com
Fernando Vieira Machado is a Solutions Architect at Twilio, with over 15 years of experience designing and delivering impactful customer engagement solutions across LATAM.
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.