Using Twilio’s Real-Time Transcriptions Service in Node.js

August 29, 2025
Written by
Reviewed by

Using Twilio’s Real-Time Transcriptions Service in Node.js

Twilio’s Real-Time Transcription service is now generally available bringing streaming speech recognition capabilities straight into your Twilio Voice applications.

In the past, developers often relied on third-party services coupled with websockets or the TwiML <Gather> or <Record> verbs, waiting until after the caller finished speaking to see results. Now, transcriptions arrive in real time during the call which lets you act on the caller's voice the moment they speak.

This service allows you to customize your transcription service such as using Deepgram or Google as the speech-to-text transcription provider while also remaining HIPAA-eligible and PCI-compliant for sensitive use cases. You can use it to power live captions, trigger automated workflows mid-conversation, feed conversational AI, or run live sentiment analysis with Twilio’s Conversational Intelligence without having to build your own transcription pipeline.

In this tutorial, you’ll learn how to stream transcripts via webhooks using Twilio’s Real-Time Transcriptions service in a Node.js application.

Prerequisites

Set up a Node.js application

You’ll start setting up your application by initializing a Node.js project and installing the dependencies you’ll need.

Initialize a Node.js project

Open up your terminal, navigate to your preferred directory for Node.js projects, and enter the following commands to create a folder, change into it, and initialize a Node.js project:

mkdir voice-transcription
cd voice-transcription
npm init -y

Install the required dependencies

Next, run the following command to install the dependencies you’ll need for this project:

npm install --save express twilio
  • express — This is a framework for Node.js that makes it easier to handle incoming HTTP requests while simplifying routing and server logic
  • twilio — The Twilio package will be used to access the Twilio Programmable Voice API to send phone calls, and to construct TwiML that will instruct the call what to do – in this case to start Twilio's transcription service

<Transcription> TwiML Noun

Before building out the application, let's briefly talk about the Transcription TwiML noun. The <Transcription> noun is used within a <Start> block to initiate Twilio’s real-time transcription during an active call. When called, Twilio forks the ongoing call’s audio to a speech-to-text engine and immediately returns text results. This allows you to immediately process speech in your application rather than waiting until the caller has finished speaking.

If you aren't familiar with TwiML, take a look at our docs.

The statusCallbackUrl attribute specifies where Twilio should send transcription events such as transcription-content or transcription-stopped.

You can use the transcriptionEngine to specify if you’d like to use Google or Deepgram as the provider; this is useful to leverage engine-specific features or optimizations that you can specify in the speechModel attribute.

speechModel is used to specify which model you’d like to use. The telephony speech model (default) is optimized for phone call audio and the long speech model is optimized for long-form audio such as lectures and long conversations.Additionally, you can leverage Twilio’s Conversational Intelligence using the intelligenceService attribute to store and analyze real-time transcriptions. When enabled, it’ll persist the live transcripts and run post-call language operators.

Other attributes include:

  • name — To label and reference or stop the transcription
  • languageCode — To specify which language the transcription should be performed in (defaults to "en-US" for American English)
  • track — To specify which audio track should be transcribed; whether to transcribe the caller audio, callee audio, or both.

Integrate the Real-Time Transcription Service

Open the voice-transcription project directory in your preferred IDE or editor and create a file called index.js within the directory. This file is where you’ll be writing the code that will receive calls.

Then, open the index.js file and add the following code to it:

const express = require('express');
const VoiceResponse = require('twilio').twiml.VoiceResponse;
const app = express();
app.use(express.urlencoded({ extended: true }));
// Handle Twilio webhook requests for incoming calls
app.post('/voice', (req, res) => {
    const twiml = new VoiceResponse();
    twiml.say('Hello! Please speak now.');
    const start = twiml.start();
    start.transcription({
        statusCallbackUrl: `https://${req.headers.host}/transcription`,
        name: 'Call Center Transcription',
    });
    // Rest of the voice call logic...
    // For this example, we will just pause for a long time to keep the call open
    twiml.pause({ length: 3600 }); // 1 hour pause
    res.type('text/xml');
    res.send(twiml.toString());
});

The code above sets up an Express server which responds to HTTP POST requests to the /voice route. The route will construct TwiML using Twilio's Node Helper Library and start by responding to the caller. It will then start the transcription service with the statusCallbackUrl attribute being the /transcription route (which we will get to later).

The twiml.pause({ length: 3600 }) instruction is used in this example solely to keep the call active while demonstrating the transcription service. In a production setting, you would replace this pause with actual call logic such as <Gather> to collect user input, <Record> to capture audio, or interactive voice response (IVR) menus.

Finally, the TwiML response is formatted as XML and sent back to Twilio. The response for the above route will look like the following:

<Response>
    <Say>Hello! Please speak now.</Say>
    <Start>
        <Transcription statusCallbackUrl="https://<server_host>/transcription" name="Call Center Transcription"/>
    </Start>
    <Pause length="3600"/>
</Response>

Next, you'll need to capture the transcription events that will be sent to the /transcription route specified in the statusCallbackUrl. At the end of the file, add the following route to handle the transcription callbacks:

// Handle transcription callbacks from Twilio
app.post('/transcription', (req, res) => {
  if(req.body?.TranscriptionEvent === 'transcription-content') {
      console.log('Transcription received:', req.body.TranscriptionData);
  }
  res.status(200).send('OK');
});

This route receives POST requests from Twilio containing transcription data. The code first checks if the event is a transcription-content event, since we're only interested in the actual transcription content. When Twilio sends transcription content data, the code logs the TranscriptionData to the console and responds with an HTTP 200 OK status.The incoming body of the request will look something like the following:

{
  LanguageCode: 'en-US',
  TranscriptionSid: 'GTXXXXXXXXXXXXXXXXXXXXXXX',
  TranscriptionEvent: 'transcription-content',
  CallSid: 'CAXXXXXXXXXXXXXXXXXXXXX',
  TranscriptionData: '{"transcript":"Real time transcriptions with twilio.","confidence":0.799641}',
  Timestamp: '2025-08-15T18:49:55.284035974Z',
  Final: 'true',
  AccountSid: 'ACXXXXXXXXXXXXXXXXXXXX',
  Track: 'inbound_track',
  SequenceId: '2'
}

As you can see the transcription webhook request will contain various data including the event type, call and account SID, where the track is coming from, and transcription data which includes the transcript and its confidence score.

In a production application, you would process this data according to your use case, such as storing it in a database, triggering automated responses, or feeding it to a conversational AI system. For this example, you’ll just log the TranscriptionData to demonstrate how the real-time transcription service works.

Finally, add the last bit of code to the bottom of index.js to spin up your Express server so that it can listen to webhook calls to your server:

// Start the server
app.listen(3000, () => {
    console.log('Server running at http://127.0.0.1:3000/');
});

Connect your application to Twilio

Before you run and test the transcription service, you'll first need to expose your local Express server to the internet using ngrok. Then, you’ll need to connect your ngrok tunnel to your Twilio number, so that incoming calls to your Twilio number are forwarded to your Node application.

Create an ngrok tunnel

Your Express application will run a local server on your computer, which means it isn’t accessible from the public internet by default. To allow Twilio to send webhook requests to your application, you’ll use ngrok to create a secure tunnel. ngrok will generate a public URL that forwards incoming requests directly to your local Express server.

Open a new terminal tab and run the following command to start a tunnel to port 3000 (the port your Express server is listening on):

ngrok http 3000

After running the command, your terminal will look like this

Screenshot of terminal showing ngrok tunnel info. Account info, latency, version, forwarding url and more info are shown.

The Forwarding URL is what you’ll need to connect your Twilio number to. Copy this URL as you’ll need it for the next section.

Add the forwarding URL to your Twilio number

You'll now connect your application to your Twilio phone number using the forwarding URL provided by ngrok. This URL will route all incoming calls to your Node.js application for processing.

Navigate to the Active Numbers section of your Twilio Console. You can access this by clicking Phone Numbers > Manage > Active numbers from the left navigation panel in your Console.

Click on the Twilio phone number you want to use for your transcription service. In the Voice Configuration section, select Webhook for the " A call comes in" dropdown, then enter your ngrok forwarding URL followed by "/voice" in the textbox (for example: https://abc123.ngrok.io/voice). Finally, ensure the HTTP dropdown is set to HTTP POST.

Voice configuration section of a twilio number on the twilio console

Ensure you save the configuration by scrolling down and clicking on the Save configuration button.

Test the application

Return to your terminal where ngrok is running and keep that tunnel active. Then, head over to the first terminal tab where you set up your node application (or launch another terminal tab and navigate to your project directory). Start your Express server with the following command:

node index.js

Your real-time transcription service is now live!

You can test it out by calling your Twilio phone number. After hearing the greeting message, you can speak and the transcription data will appear in your terminal console in real-time. The transcripts will be logged with confidence scores, allowing you to see how accurately the transcription engine processes your voice, such as in the example output below.

Server running at http://127.0.0.1:3000/
Transcription received: {"transcript":"Hello.","confidence":0.8637206}
Transcription received: {"transcript":" Bye.","confidence":0.8224885}

That's how to use Twilio’s Real-Time Transcription Service in Node.js

You've just built a voice application using Twilio's Real-Time Transcription service. Your application can now capture and process spoken words as they happen during phone calls. This opens up possibilities for enhancing customer experiences and automating voice interactions.

Consider expanding your application with Twilio’s Conversational Intelligence, using the intelligenceService attribute to add advanced analytics and insights to your transcriptions. When enabled, you can also store transcripts and run post-call language operators.

Dhruv Patel is a Developer on Twilio’s Developer Voices team. You can find Dhruv working in a coffee shop with a glass of cold brew or he can either be reached at dhrpatel [at] twilio.com.