Answering Machine Detection, AMD, enables you to determine the receiving side of an outgoing call and tailor your call flow accordingly. With AMD you can determine if a human, answering machine or fax machine has picked up an outbound voice API call.
Twilio's AMD operates in two modes, depending on whether you need to leave a voicemail as part of your call flow. The default AMD solution tries to balance recognition speed and accuracy. You may also tune the performance of the engine based on your use case via optional API parameters.
Please see the following API parameters for configuring AMD on your outbound call.
|Parameter Name||Allowed Values||Default Value|
Enable if you would like Twilio to return an AnsweredBy value as soon as it identifies the called party. This is useful if you would like to take a specific action (e.g. connect to an agent, play a message) for a human but hang up on a machine.
If you would like to leave a voicemail on an answering machine, specify
DetectMessageEnd. In that case, Twilio will return AnsweredBy immediately when a human is detected but for an answering machine, AnsweredBy is returned only once the end of the greeting is reached, usually indicated by a beep.
MachineDetection parameters are provided,
MachineDetection will be ignored.
The default AMD solution is based on an algorithm that isolates human speech audio and measures periods between speech and silence in the greeting, and then uses this data to determine the answering party. Since not all humans and not all voicemail greetings follow similar patterns in answering calls, it's possible that AMD will not always return the right answer. The AMD engine may, for example, interpret a very short two second voicemail greeting as a human picking up.
With this imperfection in mind, we have provided four optional API tuning parameters that allow customers to tune the performance of the AMD engine.
|Parameter Name||Allowed Values||Default Value|
The number of seconds that Twilio should attempt to perform answering machine detection before timing out and returning AnsweredBy as
Increasing this value will provide the engine more time to make a determination. This can be useful when
DetectMessageEnd is provided in the
MachineDetection parameter and there is an expectation of long answering machine greetings that can exceed 30 seconds.
Decreasing this value will reduce the amount of time the engine has to make a determination. This can be useful particularly when the
Enable option is provided in the
MachineDetection parameter and you want to limit the time for initial detection.
The number of milliseconds that is used as the measuring stick for the length of the speech activity. Durations lower than this value will be interpreted as a human, and longer than this value as a machine.
Increasing this value will reduce the chance of a False Machine (detected machine, actually human) for a long human greeting (e.g. a business greeting) but increase the time it takes to detect a machine.
Decreasing this value will reduce the chances of a False Human (detected human, actually machine) for short voicemail greetings. (Note that the value of this parameter may need to be reduced by more than 1000ms to detect very short voicemail greetings. A reduction of that significance can result in increased False Machine detections. Adjusting the
MachineDetectionSpeechEndThreshold is likely the better approach for short voicemails.) Decreasing
MachineDetectionSpeechThreshold will also reduce the time it takes to detect a machine.
The number of milliseconds of silence after speech activity at which point the speech activity is considered complete.
Increasing this value will typically be used to better address the short voicemail greeting scenarios. For short voicemails, there is typically 1000ms-2000ms of audio following by 1200ms-2400ms of silence and then additional audio before the beep. Increasing the MachineDetectionSpeechEndThreshold to ~2500ms will treat the 1200ms-2400ms of silence as a gap in the greeting but not the end of the greeting and will result in a machine detection. The downsides of such a change include:
- Increasing the delay for human detection by the amount you increase this parameter (e.g. change of 1200ms -> 2500ms increases human detection delay by 1300ms)
- In cases where a human has two utterances separated by a period of silence (e.g. A “Hello”, then 2000ms seconds of silence and another “Hello”), it may be interpreted as a machine
Decreasing this value will result in faster human detection. The consequence is that it can lead to increased False Human (detected human, actually machine) detections because a silence gap in a voicemail greeting (not necessarily just in short voicemail scenarios) can be incorrectly interpreted as the end of speech.
The number of milliseconds of initial silence after which an
unknown AnsweredBy result will be returned
Increasing this value will result in waiting for a longer period of initial silence before returning an ‘unknown’ AMD result.
Decreasing this value will result in waiting for a shorter period of initial silence before returning an ‘unknown’ AMD result.
AMD results are returned in the
AnsweredBy parameter of the webhook issued to the URL you provide in the outbound call request.
|AnsweredBy||The result of answering machine detection.
Answering Machine Detection will be charged at $.0075 per call where enabled and the called party picks up. Busy or Failed calls may engage our AMD system but will not be charged.
The life cycle of a call using AMD is below. The user experience for a recipient of a call using AMD is impacted if there is a delay from the time they pick up the phone to the first packet of audio they hear. Twilio has optimized our AMD system to quickly classify calls, it's also important that you optimize your application to respond quickly.
To minimize delay, ensure that you benchmark your application to ensure that webhooks from Twilio are processed and responded to in a timely manner. In test applications running in EC2 we can get this time under 150ms, TwiML served from TwiMLBins typically comes in under 100ms.
If you are using
<Play> verbs we recommend hosting your media in AWS S3 in US East 1. No matter where you host your media files, always ensure that you're setting appropriate Cache Control headers. Twilio uses a caching proxy in its webhook pipeline and will cache media files that have cache headers. Serving media out of Twilio's cache can take 10ms or less. Keep in mind that we run a fleet of caching proxies so it may take 10 or so requests before all of the proxies have a copy of your file in cache.
To help you benchmark your server's response time to Twilio, we expose the request duration in milliseconds for every request in the request inspector. You can view these clicking into the call detail page in the console.