How to run Automated AMD Tests and Fine-Tune Twilio AMD for Accurate Voice Automation

Time to read:

December 15, 2025

Written by

Rosina Garcia Bru

Twilion

Fernando Vieira Machado

Twilion

Reviewed by

Jason Spulak

Twilion

Paul Kamp

Twilion

How to run Automated AMD Tests and Fine-Tune Twilio AMD for Accurate Voice Automation

Accurately detecting whether a call is answered by a human or a machine (voicemail) is fundamental to building effective outbound communication workflows. Twilio Answering Machine Detection (AMD) offers flexible parameters for recognizing humans versus machines, but tuning these for your unique business scenarios requires testing and iteration.

In this post, you'll learn how to automate AMD testing, fine-tune parameters for your use case, and analyze real call recordings to maximize AMD reliability using Python. We’ll leverage open-source tools and Twilio’s Programmable Voice APIs to create a robust test loop. Whether you’re building outbound sales, reminders, or callbacks, this approach will help you improve customer experience and operational efficiency.

In this post, you’ll learn how to:

Set up a reproducible AMD test harness with Twilio and Python
Prepare and label stereo call recordings for analysis
Run automated AMD test campaigns, record results, and visualize detection
Iteratively tune Twilio AMD parameters for maximum accuracy

Let’s get started!

Prerequisites

You’ll need:

A Twilio Account if you don’t have one yet. (You can sign up free here)
- Two Twilio phone numbers: One for outbound calls (From), one for receiving/playing back recordings (To). See how to search for and buy phone numbers here.
- Your Twilio Account SID and Auth Token, from your Twilio Console.
ngrok (or other tunneling) for local webhook exposure. ( Download here)
Python 3 and virtualenv
A batch of dual-channel (stereo) .wav call recordings (with both the caller and callee separated). We’ll show you how to generate these during the tutorial steps.
ffmpeg installed, if it isn’t yet. You can download it here.

1. Clone and install the Twilio AMD Optimization Toolkit

First, get the open-source AMD Optimization toolkit. This contains scripts for splitting audio channels, visualizing speech/silence, launching AMD-enabled test calls, and analyzing webhook results.

git clone https://github.com/rosinaa/Twilio-AMD-Optimization.git
cd Twilio-AMD-Optimization
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Make sure ffmpeg is installed and on your system PATH (required by pydub).

2. Prepare and label your test recordings

If you don’t currently have any recordings, you can see how to start with Dual Channel Recording from the Calls Resource.

Why dual-channel?

To fine-tune AMD, you need to understand exactly what the callee hears. Dual-channel (stereo) recordings let you isolate the callee audio, removing agent speech, DTMF, or background noise.

Steps:

Place your stereo .wav files in the recordings/ directory.
Manually label each call: Is it a human or a machine (voicemail)? This enables accurate ground-truth comparison later.

3. Configure your environment

Create a .env file at the repo root with:

TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxx
TWILIO_AUTH_TOKEN=xxxxxxxxxxxxxxx
OUTBOUND_PHONE_NUMBER=+12345xxx
INBOUND_PHONE_NUMBER=+1456xxxx

Reload your environment:

source .venv/bin/activate

4. Split stereo recordings into channels

Run:

python split_audio_channels.py

You’ll get new .wav files in channel_audio/left/ and channel_audio/right/.

Typically, for calls made with the Twilio Calls API, the left channel is the callee side, and the right channel is the caller/agent.

5. (Optional) Visualize speech, silence, and detection events

To better understand how each part of your audio recording affects AMD detection, you can visualize the audio waveforms and annotated events:

python channel_visualization.py  # Defaults to left channel

Or, to pick a different channel/output folder:

python channel_visualization.py --input_dir channel_audio/right --output_dir new_output_folder

The generated images in channel_analysis/ highlight:

Initial Silence: The quiet period before the first utterance.
Utterance Durations: Periods of detected speech.
Silence Gaps: The pauses between utterances throughout the greeting.
Final Silence: The pause at the end before the call is answered or a machine beep.
The total duration of the voicemail or greeting.

By observing these characteristics for each call, you can make informed decisions when tuning AMD parameters (like MachineDetectionTimeout, MachineDetectionSpeechThreshold, MachineDetectionSpeechEndThreshold, and MachineDetectionSilenceTimeout). For example:

If most greetings have a short initial silence but long pauses mid-greeting, you may want to increase MachineDetectionSpeechEndThreshold to avoid premature classification.
Calls with long final silences may require a longer MachineDetectionSilenceTimeout to allow for complete greeting playback.

These visual cues are a fast and reliable way to tailor AMD settings to your actual call data, boosting detection accuracy across your customer scenarios.

6. Set Up Your AMD test configuration

Edit amd_config.json to specify the AMD test settings.

Update the AMD parameters for your scenario ( MachineDetectionTimeout, MachineDetectionSpeechThreshold, etc.)
Insert your ngrok URL in the StatusCallback and AsyncAmdStatusCallback fields.

Twilio's AMD docs describe these parameters in detail.

7. Configure your “To” phone number to play recordings

Before launching any automated AMD tests, it’s important to ensure that your "To" phone number (the recipient in your test setup) knows how to serve the correct audio for each inbound call during a test run.

How to set up the webhook

Open the Twilio Console and navigate to Phone Numbers
Select Your To Number
Configure the Voice Webhook to https://your_domain.ngrok.app/incoming-call

Replace your_domain.ngrok.app with your actual public ngrok URL, as generated by your tunnel in Step 10.

8. Prepare a list of recordings for testing

Add entries to recording_urls.csv in this format:

CALL_SID,Audio
<the_call_sid>,https://<your_ngrok_subdomain>.ngrok.app/audio.wav?recording=<your_recording_file_name>

Example:

CA8e4d46e5f5bb44c510ef,https://your_domain.ngrok.app/audio.wav?recording=RE2c841e54640596c44aff4e5_left

Each row corresponds to a labelled, callee-channel audio file.

9. Run the Flask server to accept webhooks

Launch a terminal and run:

python server.py

If you wish to test the right channel, change the folder as:

python server.py --audio_dir channel_audio/right

10. Expose your server publicly with ngrok

In a separate terminal:

ngrok http 5000

Take the forwarding URL (e.g., https://your_subdomain.ngrok.app) and use it as your webhook base.

If your ngrok subdomain changes, remember to update the Callback URLs in the amd_config.json (Step 6 above) and the To Phone Number webhook (Step 7).

11. Run automated AMD calls

With the server running and URLs configured, run:

python automated_amd_call.py

This will initiate calls using your test scenarios, play the isolated audio, and capture all Twilio AMD detection outcomes via webhook.

12. Review the results and iterate

Each call logs the webhook results to the call_results.csv file in reports/. Compare detected outcomes against your human/machine ground truth. Use this feedback cycle to tune timeouts and thresholds in amd_config.json, then rerun your tests to verify improved accuracy.

Best practices for tuning

What you’ll need to test comes down to your business requirements, but from our experience you want to ensure a few things for your testing and tuning:

Use a wide variety of greetings and real-world environments.
Test incrementally, adjust one parameter at a time.
Always validate with fresh, labelled call data.

Just like with software, you should continuously add tests and iterate. If you run into a real-world scenario where you lose detection accuracy, add a scenario to your tests.

When you are happy with your tuning, use Twilio’s Trust and Engagement Insights to monitor the real-world health and engagement of your communications.

Wrap-up

With this workflow, you can automate AMD testing, visualize detection accuracy, and make data-driven parameter adjustments for your IVR, appointment reminder, or outbound voice campaigns. Well-tuned AMD improves answer rates, saves agent time, and maximizes customer engagement.

Got questions, need help? Reach out to the Twilio Support or check out the AMD Optimization Toolkit code for more.

Happy testing and tuning!

Further Resources

Rosina Garcia Bru is a Product Manager at Twilio Programmable Voice. Passionate about creating seamless communication experiences. She can be reached at rosgarcia [at] twilio.com

Fernando Vieira Machado is a Solutions Architect at Twilio, with over 15 years of experience designing and delivering impactful customer engagement solutions across LATAM.

Related Resources

Twilio Docs

From APIs to SDKs to sample apps

API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.

Resource Center

The latest ebooks, industry reports, and webinars

Learn from customer engagement experts to improve your own communication.

Ahoy

Twilio's developer community hub

Best practices, code samples, and inspiration to build communications and digital engagement experiences.

How to run Automated AMD Tests and Fine-Tune Twilio AMD for Accurate Voice Automation

Related Posts

Related Resources