Skip to contentSkip to navigationSkip to topbar
Rate this page:
On this page

Voice Intelligence - Key Concepts


(warning)

Warning

Public Beta

Voice Intelligence is currently available as a public beta release. Some features are not yet implemented and others may be changed before the product is declared as Generally Available. Beta products are not covered by a Twilio SLA.

Learn more about beta product support(link takes you to an external page).


Language Operators

language-operators page anchor

Language Operators — often just referred to as "operators"— turn transcripts into structured information using a variety of techniques, including machine learning. There are currently two categories of operators that can be used with Voice Intelligence.

  • Pre-built Language Operators — These operators are created, trained, and maintained by the Twilio team. They either are trained across a wide swath of data and typically map to pieces of information that are agnostic to use-case or industry, or use a 3rd party predictive AI model or LLM (Large Language Model). Pre-built operators cannot be modified or made more specific.
  • Custom operators — These operators are created and maintained by our customers. They are specific to an individual customer's use case and data. Custom Operators are literal operators — for example, they are keyword or phrase based — and can be used to spot phrases or classify transcripts.
(information)

Info

To find out more about the Pre-built operators that Twilio currently makes available, and for examples of the actions they perform, please review Pre-built Language Operators.

Operator actions

operator-actions page anchor

Operators perform a specific action on a conversation or a sentence within a conversation. There are five types of actions that an operator is able to perform.

Operator ActionStatusDescriptionExample
ClassifyAvailableClassify a conversation into a predefined categoryClassify if the call was transferred to another agent
Phrase matchingAvailableDetermine if an event occurred or if a piece of data or a phrase was mentioned during a conversationSpot whether or not an agent told a customer that their call is being recorded
RedactAvailableFind and redact a value mentioned during a conversationRedact a social security number that was mentioned during a call

Custom Operators are text-parsing based operators and support Classify and Phrase Matching.

To add a Custom Language Operator to your Services follow the steps below:

  1. Navigate to the Language Operators tab on your Service.
  2. Click Create Custom Operator .
  3. Add the name of the operator and select Phrase Matching or Classify .
  4. Create Phrase Sets. Each Phrase Set can have multiple words or phrases to extract from the transcript. For each phrase, you can select:
    A. Exact Match: Find exact words or phrases in transcript
    B. Fuzzy Match: Find words or phases in transcripts using machine learning techniques even if that match less than 100%
  5. Once all the Phrase Sets are created, add the new Custom Language Operator to your Service. The next time a Transcript is created, the new Custom Language Operator results will be on the OperatorResults of the Transcript.

When there is audio that doesn't correspond to speech, or isn't recognized by Twilio's speech recognition engine, it will be labeled with a Non-Speech Tag. Currently, there are the following tags.

Non-Speech TagDescription
[applause]Included if a participant claps on a call
[dtmf]Included when a participant provides input via DTMF (Dual-Tone Multi-Frequency). This tag is only included when DTMF is embedded in the audio of the recording. Out-of-band DTMF is not captured
[foreign]Included when the speech recognition engine does not recognize the audio as being part of a supported language
[hes]Included when a participant says a hesitation marker like umm, uhh, or hmm
[inaudible]Included when there is unclear audio that cannot be recognized by the speech recognition engine
[laugh]Included when a participant laughs on a call
[music]Included when music is detected on a call. This marker typically shows up with hold music
[noise]Included when there are noises that are not recognized as speech.
[ring]Included when there is ringing on a call. This typically shows up when a call is recorded with the record-from-ringing parameter or when a bridged leg plays ringback

Rate this page: