Speech recognition

Convert speech to text and analyze its intent during any voice call. Available with pay-as-you-go pricing.

How speech-to-text works

    <Gather input=”speech”
    <Say>Say ahoy to Twilio Speech Recognition!</Say>

Using a simple <Gather> command, the Speech Recognition API captures your speech in real-time, transcribes it, and returns text.

Real-time transcription

Add automatic speech recognition (ASR) the easy way.

No training required

Transcribe a wide range of industry-specific words and phrases out of the box, without any pre-training.

Streaming results

Build responsive voice applications that act on partial recognition results as your customer speaks.

Multiple languages

Recognizes 119 languages and dialects (and more coming soon) to support your global user base.

Use cases

Give customers the choice to use their natural language to navigate menus and collect information.

  • Chat bubble with clock

    Turn nested phone trees into simple “what can I help you with” voice prompts

  • Mobile notification
    Voice search

    Allow customers to dial into your knowledge base and get the answers they need

  • User speech bubble
    Form fills

    Ask customers questions and capture their answers using ASR to fill out forms and qualify leads.


Pay-as-you-go with no upfront costs.

Standard Models


per 15 sec of <Gather>

Enhanced Models*


per 15 sec of <Gather>

<Gather> with speech has a maximum duration of 60 seconds.
With an annual commit over 100,000 <Gather> / month.

The Twilio difference

Business connecting to customer through preferred communication channels
Business connecting to customer through preferred communication channels

*Only phone_call model is available for premium