How to use Task Confidence
Task Confidence gives you more granular control over how you handle the output of Autopilot’s NLU ("Natural Language Understanding") engine. When a task is triggered by a user’s utterance, Autopilot includes:
- a confidence score via the CurrentTaskConfidence parameter
- the unique name of the task with the second highest probability of matching that input via the NextBestTask parameter
in its request to your application. This data is also available on the Queries resource via the confidence and next_best_task attributes in the results object. The NextBestTask will be null if the fallback task is selected as the next best task.
The confidence score is a number between 0 and 1. Since Autopilot uses different machine learning models depending on the nature and quantity of your training data, the score is a relative measure of the probabilities returned by the model for the two tasks with the highest probabilities — (probability#1 - probability#2)/(probability#1).
A high confidence score indicates a larger difference between the probabilities of the top two tasks identified by the model. The task confidence feature is therefore very useful for fine-tuning the user experience when the NLU engine returns a low confidence score.
There are three recommended applications of this feature that can be used independently or in conjunction with each other:
- Disambiguation
- Setting a confidence threshold
- Training and annotation
Clarify intent with Disambiguation
Disambiguation essentially involves asking the user to clarify their intent when a low-confidence score is received. For example, consider an appointment management bot that, among other things, can help users with canceling and rescheduling appointments via two tasks — cancel_appointment
and reschedule_appointment
. If a user interacts with the bot by saying "I need help with my appointment", the model will likely pick one of these two tasks with a low-confidence score and return the other task as the next-best task.
This information is included in Autopilot’s request to your application as follows:
Parameter | Description | Example |
AccountSid | Your Twilio account ID. It is 34 characters long, and always starts with the letters AC. | ACXXXXX |
AssistantSid | The Autopilot assistant ID. It is 34 characters long, and always starts with UA. | UAXXXXX |
DialogueSid | The session identifier. It is 34 characters long, and always starts with the letters UK. | UKXXXXX |
UserIdentifier | The unique user identifier coming from the channel. For Voice and SMS it will be the user's phone number. | +18304765664 |
CurrentInput | The last thing the user said. | "I need help with my appointment" |
CurrentTask | The user's current task. | reschedule_appointment |
Field_{field-name}_Value | The key-value pair of the field value recognized. A different key-value pair will be sent for each field value. | Field_CLAIM_NUMBER_Value |
Field_{field-name}_Type | The key-value pair of the field type recognized. A different key-value pair will be sent for each field type. | Twilio.ALPHANUMERIC |
DialoguePayloadUrl | A URL to the Dialogue JSON payload that contains the context and data collected during the Autopilot session. | https://autopilot.twilio.com/v1/Assistants/UAXXXX/Dialogues/UKXXXX |
Memory |
A JSON Payload that contains all the Autopilot memory values. NOTE: Memory is only sent in POST requests to prevent query params from getting truncated. |
|
Channel |
The channel the interaction is taking place. |
SMS |
CurrentTaskConfidence | The confidence score for the task detected | 0.7 |
NextBestTask | The task with the next highest confidence score | cancel_appointment |
In this scenario, you can ask the user to clarify or disambiguate with a response along the lines of — "I can help you with that. Would you like to reschedule your appointment or cancel your appointment?"
With Autopilot's Disambiguation feature, you create a smarter user experience instead of triggering a task that may not match the user’s intent.
Specify confidence thresholds
Your application can also specify confidence thresholds to decide how to respond to the user’s query. These thresholds can then be used to decide whether to trigger a disambiguation response, trigger the fallback task, or hand off the conversation to a human agent.
Continuing the example of the appointment management bot, let’s say Autopilot provides a confidence score of 0.2 for the query "I can’t find my appointment". Setting confidence thresholds for triggering the fallback task and disambiguation of 0.3 and 0.5 respectively instructs your application to respond using the fallback task instead of a disambiguation flow. The response in the fallback task can be used to get the user back on track with the bot.
Many factors, such as the nature of the training data and customer expectations, will differ between bot types and the experiences you want to design for your end users. Therefore, it's not possible to recommend a specific value for each threshold; it will depend in your use case(s).
Training and annotation
The confidence score is also recorded for each query in the Queries page in the Autopilot console. The page also lets you filter queries by confidence score, allowing you to focus your training and annotation efforts on queries that have low-confidence scores.
Need some help?
We all do sometimes; code is hard. Get help now from our support team, or lean on the wisdom of the crowd by visiting Twilio's Stack Overflow Collective or browsing the Twilio tag on Stack Overflow.