Introducing Speech Recognition – Public Beta Now Open

Time to read:

May 24, 2017

Written by

Twilion

This post is part of Twilio’s archive and may contain outdated information. We’re always building something new, so be sure to check out our latest posts for the most up-to-date insights.

Introducing Speech Recognition – Public Beta Now Open

Convert speech to text and analyze its intent during any voice call.
Support for 89 languages and dialects.
Available now in public beta.

Speech is a powerful and expressive medium for customer communications. With speech technology improving massively over the last four years, we were excited to leverage that progress to finally offer Twilio developers a speech recognition feature for Programmable Voice. Starting today, Twilio Speech Recognition allows developers to convert speech to text and analyze its intent during any voice call, and is available in public beta. There are no models to train or complicated machine learning to orchestrate.

Our customers have long used keypad input to navigate users through phone menus and collect their feedback on surveys. While keypad input is now universally understood by users, it can be cumbersome and imprecise, and isn’t always a great experience for the caller.

Over the next several years, we expect speech-driven interfaces to become ubiquitous. The potential for nuanced human-machine interaction driven by speech is readily apparent to anyone who has asked Alexa to play their favorite music from Spotify.

With Speech Recognition, you can now capture speech from your customers in real-time. It works in 89 languages and dialects, and has a simple, pay-as-you go pricing structure.

<Gather> with Speech

Speech recognition is integrated directly into Twilio’s <Gather> verb so you can update the code you already have in place. Because it supports 89 languages and dialects, you can upgrade your application to support customers across a broad range of regions. Adding speech is as simple as adding a new parameter called “input” as shown in the TwiML below.

<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Gather  input="speech" action="/finalresult">
   <Say>Welcome to Twilio, how can I help you?</Say>
  </Gather>
</Response>

If you specify speech as an input, Twilio will add a new parameter called SpeechResult in the request to your action url.

AccountSid	AC25e16e9a616a4a1786a7c83f58e30082
ApiVersion	2010-04-01
CallSid	CA607dee6b7647243904ebc8db64a2a5c2
CallStatus	in-progress
Called	+18182004120
Confidence	0.77388394
Direction	inbound
From	+15623000628
Language	en-US
SpeechResult	I’d like to learn more about Speech Recognition
To	+18182104120

If you’d like to build more responsive applications, we also offer the ability to get speech results in real time as we process speech. To access the real-time voice stream, you can specify a partial results callback:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Gather  input="speech" action="/finalresult" partialResultCallback="/partialresult">
   <Say>Welcome to Twilio, how can I help you?</Say>
  </Gather>
</Response>

Once you specify a callback url for partialResultCallback, you will get requests as your customers speak. Since HTTP requests may arrive out of order, we include a sequence number to help you use your customer’s speech as it was spoken.

SequenceNumber: 1
UnstableSpeechResult: Disco
 
SequenceNumber: 2
UnstableSpeechResult: fiscal
 
SequenceNumber: 0
UnstableSpeechResult: this
 
SequenceNumber: 3
UnstableSpeechResult: Fiscal Sanity
 
SequenceNumber: 4
UnstableSpeechResult: this call Sandra
 
SequenceNumber: 5
UnstableSpeechResult: this will send
 
SequenceNumber: 7
UnstableSpeechResult: this will send requests
 
SequenceNumber: 6
UnstableSpeechResult: this will send requests
 
SequenceNumber: 8
UnstableSpeechResult: this will send requests
 
SequenceNumber: 9
UnstableSpeechResult: this will send requests
 
SequenceNumber: 10
UnstableSpeechResult: this will send requests as
 
SequenceNumber: 11
UnstableSpeechResult: This will send requests as you.
 
SequenceNumber: 12
UnstableSpeechResult: This will send requests as you see.
 
SequenceNumber: 13
UnstableSpeechResult: This will send requests as you speak.
 
SequenceNumber: 14
UnstableSpeechResult: This will send requests as you speak.
 
SequenceNumber: 15
UnstableSpeechResult: This will send requests as you speak.
 
SequenceNumber: 16
UnstableSpeechResult: This will send requests as you speak.
 
SpeechResult: This will send requests as you speak.

This allows you to evaluate the speech of your user as they speak to build responsive voice applications. A detailed explanation of Speech Recognition features and TwiML examples can be found here.

Pricing

Speech Recognition uses a scalable pay-as-you go model, with requests starting at $0.02 per 15 seconds of recognition. Those who have operated a speech recognition system know how time consuming and difficult planning for channels or ports can be. Speech Recognition from Twilio does away with this burden and scales with your business—plug it in and it just works. If you’re planning for significant traffic, it’s important to know that volume-based discounts can cut the price of Speech Recognition to as little as $0.008 per 15 seconds. Full volume tiers can be found here.

How to Get Started

Speech recognition is available to all Twilio developers today. To get started, check out our docs. If you have any questions about moving your traffic or adding Speech Recognition to your Twilio application, don’t hesitate to reach out to our Sales team.

What’s Next: Understand

Speech Recognition is only the beginning for voice-driven interfaces built on Twilio. Coming soon, we will be releasing a new verb: Understand. It’s exactly what you hope it is: an API to analyze text and determine intent during a live call using natural language understanding. Powered by machine learning, Understand will give developers what they need to build intelligent, nuanced human-machine interactions in order to turn freeform text into structured data. It will work natively with both Twilio Programmable Voice and SMS, as well as Amazon Alexa.

Stay tuned for more—we can’t wait to see what you build.

Related Resources

Twilio Docs

From APIs to SDKs to sample apps

API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.

Resource Center

The latest ebooks, industry reports, and webinars

Learn from customer engagement experts to improve your own communication.

Ahoy

Twilio's developer community hub

Best practices, code samples, and inspiration to build communications and digital engagement experiences.

Introducing Speech Recognition – Public Beta Now Open

Introducing Speech Recognition – Public Beta Now Open

<Gather> with Speech

Pricing

How to Get Started

What’s Next: Understand

Related Posts

Related Resources

From APIs to SDKs to sample apps

The latest ebooks, industry reports, and webinars

Twilio's developer community hub