Voice Intelligence - Key Concepts
Public Beta
Voice Intelligence is currently available as a public beta release. Some features are not yet implemented and others may be changed before the product is declared as Generally Available. Beta products are not covered by a Twilio SLA.
Learn more about beta product support.
Language Operators
Language Operators — often just referred to as "operators"— turn transcripts into structured information using a variety of techniques, including machine learning. There are currently two categories of operator that can be used with Voice Intelligence.
- Pre-built operators — These operators are created, trained, and maintained by the Twilio team. They are trained across a wide swath of customer data and typically map to pieces of information that are agnostic to use-case or industry. Pre-built operators cannot be modified or made more specific.
- Custom operators — These operators are created and maintained by our customers. They are specific to an individual customer’s use case and data. Custom Operators are literal operators — for example, they are keyword or phrase based — and can be used to spot phrases or classify transcripts.
To find out more about the Pre-built operators that Twilio currently makes available, and for examples of the actions they perform, please review Pre-built Language Operators.
Operator actions
Operators perform a specific action on a conversation or a sentence within a conversation. There are five types of actions that an operator is able to perform.
Operator Action | Status | Description | Example |
Classify | Available | Classify a conversation into a predefined category | Classify if the call was transferred to another agent |
Phrase matching | Available | Determine if an event occurred or if a piece of data or a phrase was mentioned during a conversation | Spot whether or not an agent told a customer that their call is being recorded |
Redact | Available | Find and redact a value mentioned during a conversation | Redact a social security number that was mentioned during a call |
Custom Operators
Custom Operators are text-parsing based operators and support Classify and Phrase Matching.
To add a Custom Language Operator to your Services follow the steps below:
- Navigate to the Language Operators tab on your Service.
- Click Create Custom Operator.
- Add the name of the operator and select Phrase Matching or Classify.
- Create Phrase Sets. Each Phrase Set can have multiple words or phrases to extract from the transcript. For each phrase, you can select:
A. Exact Match: Find exact words or phrases in transcript
B. Fuzzy Match: Find words or phases in transcripts using machine learning techniques even if that match less than 100% - Once all the Phrase Sets are created, add the new Custom Language Operator to your Service. The next time a Transcript is created, the new Custom Language Operator results will be on the
OperatorResults
of the Transcript.
Non-Speech Tags
When there is audio that doesn’t correspond to speech, or isn’t recognized by Twilio's speech recognition engine, it will be labeled with a Non-Speech Tag. Currently, there are the following tags.
Non-Speech Tag | Description |
[applause] |
Included if a participant claps on a call |
[dtmf] |
Included when a participant provides input via DTMF (Dual-Tone Multi-Frequency). This tag is only included when DTMF is embedded in the audio of the recording. Out-of-band DTMF is not captured |
[foreign] |
Included when the speech recognition engine does not recognize the audio as being part of a supported language |
[hes] |
Included when a participant says a hesitation marker like umm , uhh , or hmm |
[inaudible] |
Included when there is unclear audio that cannot be recognized by the speech recognition engine |
[laugh] |
Included when a participant laughs on a call |
[music] |
Included when music is detected on a call. This marker typically shows up with hold music |
[noise] |
Included when there are noises that are not recognized as speech. |
[ring] |
Included when there is ringing on a call. This typically shows up when a call is recorded with the record-from-ringing parameter or when a bridged leg plays ringback |