Speech Recognition is Now Generally Available

Speech Recognition Generally Available
  • Convert speech to text and analyze its intent during any voice call.
  • Support for 119 languages and dialects.
  • Now Generally Available.

We are excited to announce that Speech Recognition is now Generally Available. This allows developers to convert speech to text and analyze its intent during any call coming through the Twilio Programmable Voice API.

With Speech Recognition, there are no models to train or machine learning to orchestrate. You simply specify that you’d like to take voice input and you’re good to go. It works in 119 languages and dialects and has a simple, pay-as-you-go pricing structure.

Here are some areas where Speech Recognition will come in handy:

  • IVR phone tree navigation: With Speech Recognition, you can build applications to navigate IVRs with voice instead of keypad input with DTMF tones. Letting your customers just say what they need is faster, easier, and leaves them happy.
  • Data capture: Now, you don’t need an agent to capture short snippets of data such as addresses, names, and account numbers—simply use Speech Recognition in your app.
  • Conversational bots: Use Speech Recognition together with Understand to let customers talk to bots instead of agents.
  • Real-time transcriptions: Convert conversations between your customers and agents to measure call success by analyzing keywords or using sentiment analysis. You can also get alerted on conversations where your attention is required by using keyword spotting.

How Does it Work?

Speech recognition is integrated directly into Twilio’s <Gather> verb so you can update the code you already have in place. Because it supports 119 languages and dialects, you can upgrade your application to support customers across a broad range of regions with hardly any effort at all. Adding speech is as simple as adding a new parameter called “input” as shown in the TwiML below:

You can also provide hints up to 500 words to boost the accuracy of the speech to text result. Customers in the beta version of Speech Recognition have found that providing good hints makes the matching very accurate.

Finally, you can use speechTimeout to control how long to wait after the caller has started speaking to terminate speech and provide the results back to you.

If you’re expecting a “short utterance” as an input, you should use speechTimeout with value auto. Examples of “short utterances” include collecting numbers, addresses or short commands.

On the other hand, if you expect your users to speak “long utterances”, you’ll need to set the value of speechTimeout to > 0. For example, if you expect your customer to say something like  “I would like help resetting my debit card pin” then you’ll want to specify a timeout.

Below is an example where the speechTimeout has been set to two seconds.

To see Speech Recognition in action, check out this video—you’ll learn how to build a facts hotline that returns facts about cats, numbers, and Chuck Norris. You’ll also be able to see the usefulness of Speech Recognition in interactive voice response (IVR) applications.


Speech Recognition works in 119 languages and dialects and has a simple, pay-as-you-go pricing structure. To get started, log in or create a Twilio account, and check out our docs to start building.


  • Nathan Loyer

    The code examples look incorrect. The closing tag for Gather is out of place.

    • Satgraha

      good catch.

    • Megan Speir

      Hey Nathan, good catch indeed! Can I send you a Twilio t-shirt to say thanks? Email mspeir [at] twilio [dot] com for details.

  • Jaeson Booker

    Will it incorporate VoiceBase or IBM?