With Media Streams, we are opening up the Twilio Voice platform by providing businesses with real-time access to the raw audio stream of their phone calls. Now businesses can leverage the audio of their calls to improve customer experience by understanding the quality of a call in real-time through sentiment analysis or AI-driven knowledge assistants to enhance agents’ abilities to address customer needs. With just a few lines of code, businesses extend the capabilities of their Twilio voice application in real-time by integrating with their own applications or utilizing third party services.
Why Media Streams?
As advancements in AI and machine learning continue, an increasing number of new technologies and services must be pulled together to create voice experiences that meet customer expectations.
Some ways businesses can take advantage of Media Streams include:
- Resolve difficult customer conversations by using sentiment analysis to flag customers having bad experiences, allowing supervisors to step in and help get things back on track.
- Increase productivity of call center agents by transcribing speech in real time and using AI/ML to recommend relevant knowledge base articles to the agent based on the content of the conversation.
- Reduce fraud and speed up authentication by incorporating voice authentication/biometrics within a call flow.
How to build with Media Streams
Media Streams works in the context of a traditional Twilio voice application, such as an IVR, that serves customers directly, and in contact centers like Twilio Flex where agents are serving consumers.
GLOBO, an early pilot participant, has driven improvements in call center effectiveness by utilizing Media Streams in their call flows.
"Media Streams ensures our customers have great experiences when they call into GLOBO's telephone interpreting services," said Jonathan De Jong, VP of Engineering at GLOBO. "With direct access to raw audio data via Media Streams, we can now use sentiment analysis to detect calls that require urgent attention and take action in real-time. This capability has dramatically increased call center productivity across our thousands of agents and interpreters."
At launch, Twilio is partnering with Google Cloud and Gridspace and working closely with Amazon Web Services to provide easy access to advanced capabilities and technologies like AI and machine learning that allow businesses to build cutting-edge customer engagement experiences.
- Real-time transcription with Google Cloud: Use Google Cloud Speech-To-Text to transcribe conversations and make suggestions to call center agents based on the content of each conversation.
- Call center optimization with Gridspace: Using insight from Gridspace’s analytics and automation platform, businesses can adapt conversations to better respond to the content and tone of the caller.
- Conversational interfaces and real-time transcription with Amazon Web Services: Use Amazon Lex to integrate conversational interfaces and Amazon Transcribe to integrate streaming transcription into voice applications.
These relationships provide developers with the opportunity to work with the services they love and trust in order to get the capabilities that best serve their voice stack needs.
How Media Streams work
Through Media Streams, businesses can fork the media of a phone call in real-time, effectively creating a copy of the initial audio stream that can be routed to their own application or to a third party to power advanced capabilities of their choosing.
How to get started with Media Streams
There are two ways to implement Media Streams: with Gridspace, our exclusive <Stream> Connector launch partner, and via websocket connector to directly build your own applications or integrate with Google Cloud, Amazon Web Services, and more. Additional partners and <Stream> Connectors will be available in the future.
Below, we outline how to get started with Websockets and Gridspace <Stream> Connector.
Getting started with Websockets
Developers can start receiving audio of a phone call in real-time over Websocket by using <Stream> instruction. To get started, return the following TwiML instruction within your voice call:
<Start> <Stream url=”wss://yourstram.ngrok/io/”> <Parameter name=”custom1” value = “custom_value_1” /> <Parameter name=”custom2” value = “custom_value_2” /> </Stream> </Start>
Upon receiving the TwiML instruction, Twilio forks the phone call and starts streaming media in real-time to your Websocket endpoint. To learn more about the API, please navigate to <Stream> API Doc and check out our step-by-step Tutorial.
Note: You can optionally pass custom data that will be sent in the JSON Payload so that you can manage the Websocket connections from your app.
Getting started with Gridspace Connector
The following instructions outline the required steps to use Gridspace Connector.
1) Navigate to <Stream> Connectors console page and select Gridspace.
2) Click on Install button and accept the Terms of Service.
3) Enter a unique name for the configuration, the name is unique across your account. This is the name you will need to provide as a value to connectorName attribute with <Siprec> instruction. Let’s use the name Gridspace_1 for the Unique name.
4) Next provide your Gridspace Account Id & Auth Token and Click Save to save the configuration.
5) At this point your Gridspace connector configuration is done, and simply return the following TwiML instruction when you’d like to start streaming media to Gridspace. You will need to navigate to Gridspace console to access the results of the integration.
<Start> <Siprec connectorName="MyGridspace"> <Parameter name="firstname" value="Xander"/> <Parameter name="lastname" value="Cage"/> <Parameter name="role" value="Agent"/> <Parameter name="agentid" value="X-X-X"/> <Parameter name="other_role" value="Caller"/> <Parameter name="direction" value="both"/> </Siprec> </Start>
Note: Passing Metadata about the customer and the agent to Gridspace makes it easier to identity specific conversations or recordings within Gridspace.
To learn more, see the docs.
Media Streams is priced at $0.004 per minute. You will also be charged for the associated Programmable Voice minutes and phone numbers used during the duration of the call. Programmable Voice pricing can be found here.
Take control of your roadmap with Media Streams
Media Streams is the next step in our journey to provide developers with the ability to control their voice stack. With Media Streams, developers now have the freedom to choose what applications they want to integrate into their Twilio voice stack, both those of their own creation and third-party offering. For more information on how to leverage Media Streams for your custom voice builds, check out the docs.
We can’t wait to see what customer experiences you create with Media Streams!