Answering Machine Detection (AMD) is a feature focused on detecting whether or not a call was picked up by a human or by an answering machine. There are two main outcomes of this analysis, the determination that a machine answered (answering machine, fax, SIT tones) or the determination that a human answered. There are three common use cases:
- Developers can detect that an answering machine/voicemail picked up the call and they can then use <Say> text-to-speech or <Play> to leave complete, intact messages into the voicemail box; seen commonly with notifications and appointment reminders.
- It's very expensive to have agents sit on the phone and listen to phones ring and wait to talk to a person, and that time is wasted if the call is answered by a machine. What's more common is for outbound dialers to use AMD to detect when a human answers and only then invoke the agent, either by connecting the callee to the agent directly or by adding the callee to a conference where the agent is already waiting.
- Last, do both of the above, and use the results to leave a message in the voicemail box if a machine answers, or connect to an agent if a person answers, dynamically optimizing the call flow.
Elastic SIP trunking calls cannot utilize AMD because they bypass the Programmable Voice infrastructure. <Dial><Client>, <Dial><Conference>, and <Dial><Queue> calls cannot utilize AMD because the destination is a Twilio internal service.
There are two aspects of AMD that are important to understand. The first is tone detection; i.e. the ability to detect machine beeps, fax tones, busy tones, in-band DTMF, etc.
The second is voice activity detection, or human speech detection; i.e. the ability to separate human speech from background noise, isolate it from barking dogs, etc. AMD analyzes the timing, pattern, and frequency information to make its determination.
Twilio's AMD analyzes the inbound media leg returned from the called party and, depending on the configuration parameters provided, will return the results of its analysis or notify your application when a voicemail beep has been detected.
Out of the box with default settings we see accurate detection to US destinations >90% of the time, though there are a lot of caveats here, especially considering that Twilio provides multiple parameters to control performance characteristics of the detection.
DetectMessageEnd accuracy is close to 100% accurate, the tones emitted by voicemail boxes and answering machines are distinctly different from human speech patterns of frequency and amplitude; however, it is possible to set the timeout to be too short which will result in the detection never happening because the AMD engine stopped listening.
As with accuracy, there is a big caveat to keep in mind which is that AMD can be configured to perform in a manner that results in slower detection. On average Twilio's AMD will return results within ~4 seconds after the call was answered. Once the determination has been made, Twilio will make a request to the provided webhook with the
AnsweredBy results. This network transit to the server and the reciprocal response from the application are the most common culprits when investigating "slow" AMD times; best practices are to specify the edge for webhook egress to eliminate as much latency as possible. If you are able to run the application in the same AWS that your webhook egresses you can eliminate significant latency.
It's important to note that there are trade-offs between speed and accuracy. If you set the timeout to an extremely short duration you may not be giving the system enough time to gather sufficient data to make a determination, which will negatively impact accuracy.
Twilio's AMD told me a human answered and it was a machine/a machine answered and it was a human/that the message end was detected when it was not!
Twilio's AMD is tuned for accuracy over speed, but even with the parameters provided, it's possible to return a false positive or false negative for detection. A single failure here or there is unavoidable and to be expected, but you can track AMD detection results using Voice Insights Call Summary records via the Annotation API. Doing so may help you identify commonalities between inaccurate detection results; e.g. you may discover that a majority of your false positives are associated with a single destination carrier in a specific region, or that a handful of agents in a contact center are disproportionally represented in the inaccurate results indicating a process adherence or data integrity problem. In any case, once the commonalities between failed detections are identified you can A:B test different configurations based on those commonalities; i.e. if you know that calls going to Comcast landlines in the US need a slightly longer timeout than other landline carriers, you can adjust the parameters when you know the destination is a Comcast landline, etc.
MachineDetection=Enable is useful for reducing agent idle time use cases. Twilio will return
AnsweredBy as soon as a determination is made.
MachineDetection=DetectMessageEnd is geared toward the "leave a message in the voicemail box" use case. If a machine is detected Twilio will wait until we hear a beep to return
Normally using the /Calls API AMD occurs before Twilio fetches TwiML instructions. The /Calls API also provides an
AsyncAMD boolean that allows TwiML instructions to execute and the call to progress normally while AMD occurs in the background.
AMD on the /Participants API and <Dial><Number> or <Dial><Sip> are asynchronous by default and cannot be configured to behave otherwise.
Humans answering phones as individuals, either at residences or on their mobile; e.g. "Hello?" or "Hi, this is Michael." These greetings are typically pretty short, <1800ms.
Businesses answer phones like "Thanks for calling Duct Tape Warehouse. This is Howard." These greetings are typically longer, ~1800-3000ms.
Answering machines answer with longer messages commonly punctuated with beeps which contain audio frequencies outside of normal speech range; "Hi you've reached the Fletchers. We're not here right now please leave a message after the beep. [BEEP]". These greetings are typically longer, >3000ms.
Use Programmable Voice's call recording capabilities and capture recordings from ringing. You can then open the recording file in an audio editor like Audacity or Garageband and explore the precise timings of how long things like the gap between answer and initial audio, how long the initial utterance lasts, how long there is silence before the AMD determination has been made, and then use those values to adjust performance, but see our warning about hypertuning performance to a single destination below.
It's not possible to completely eliminate unknowns, as those are calls where the thresholds and timeouts provided the AMD algorithm have not provided enough information to the engine to make a decision. This is most commonly due to people/machines answering with silence that lasts longer than the provided speech end or machine detection timeouts. The more aggressive you are in trying to get responses faster, the more
answered_by: unknown you will receive.
- Hyper-tuning performance for a single device or a handful of devices can result in unexpected behavior once applications are deployed widely. We see cases where someone spends a bunch of time making sure the AMD application is detecting their personal cell phone quickly and with high accuracy without understanding that not everyone else in the world has the same carrier, same voicemail message length, etc. Testing should be done with a large number of carriers, on a diverse set of devices, with varied message responses.
MachineDetectionTimeoutis only relevant for
MachineDetection=DetectMessageEndand shortening this value does not speed up detection for
- Failing to provide a sufficient timeout for
MachineDetection=DetectMessageEnd. Our default is 30 seconds, but almost all of the changes to this configuration option we see are to decrease this value, not increase it. If you are only trying to land messages in residential voicemail boxes, 30 seconds is probably sufficient for the majority of cases. If you are trying to land messages in business voicemail boxes, 30 seconds is frequently not enough time. Also, some residences have bizarrely long messages, so expect some outliers.
MachineDetection=DetectMessageEnd, make sure you provide ample time for the beep to occur (some people have very long answering machine messages), and return TwiML that utilizes <Play> (or <Say>) to deliver the message.
I want to minimize idle time for my agents and only engage them when a real person is on the line, how do I do that?
MachineDetection=Enable. If you are calling individuals, residences, or mobile phones, customers have had good results with setting
MachineDetectionSpeechThreshold to 1800-2000 and
MachineDetectionSpeechEndThreshold to 1400-1500.
If you are calling businesses, you will want to set
MachineDetectionSpeechThreshold somewhere between 1800 and 3000 with
MachineDetectionSpeechEndThreshold set to 1400-1500.
I want to leave a message if no one answers, but if someone does answer, I want to connect them to an agent. How do I do that?
To mitigate the impact of false positives your default behavior should be to leave a message, and tune your application to use the same
MachineDetection=Enable parameters above, but have a handler that is listening for the
AnsweredBy parameter. In the event that
AnsweredBy = human is received by your application, modify the call via API to point to a new TwiML instruction that connects the called party to a waiting agent.