Add Token Streaming and Interruption Handling to a Twilio Voice Anthropic Integration

May 13, 2025
Written by
Paul Kamp
Twilion

ConversationRelay from Twilio lets you build real time, human-like voice applications with AI Large Language Models of your choice. It opens a WebSocket with your app so you can integrate your choice of AI API with Twilio Voice.

In my previous post, I showed you one way to integrate Anthropic with Twilio Voice using ConversationRelay in Node.js. This setup allowed you to create a basic real-time voice assistant using Anthropic's Claude models. However, there were some weaknesses to that basic app, particularly around latency and handling interruptions during conversations.

In this post, I'll show you how to implement token streaming to reduce voice latency and add local tracking for interruption-aware conversation turn handling. Let’s dive in!

Prerequisites

Before you get started, you’ll need a few accounts (and a few things ready):

And with those things in hand, let’s get started…

This tutorial builds on our previous Quickstart-style integration. If you don’t have it set up, you can clone it from our repository.

Add token streaming

The first thing you are going to address from the basic app is the latency. If you prompt the basic app with something requiring a longer response – for example, “name ten things that happened in 1999” – you’ll notice it takes a few seconds before you hear the AI’s response. Some of that pause is because we’re waiting on the full response from the AI (and you guaranteed a long one with a prompt like that!).

You're going to address some of the latency by streaming tokens from Anthropic’s Claude instead of waiting on a complete response.

What is Token Streaming?

Tokens are the smallest units of a response generated by an AI Language Model. In your basic app, you were waiting for the entire response – including all the tokens – before we sent the AI’s text to our text to speech providers.

But waiting for a whole response isn’t the only mode supported by Anthropic and other providers. You can also receive partial responses from a model and incrementally stream tokens to your app. From there, you can send them along to ConversationRelay as text tokens with the attribute last: true, and a user will hear the AI’s response before it’s even finished coming up with it.

Modify how the AI responds

Our demo repository has branches showing every step we took to enhance the base tutorial. If you get stuck in this post, you can clone Step Two: Streaming Tokens and Step Three: Conversation Tracking from our repo. A future blog post will discuss Tool/Function Calling.

You need to make two changes to our code to add token streaming. You'll first change how you call Anthropic and wait for a response, then modify how you handle the ConversationRelay ’prompt’ message type.

Change the aiResponse() function

Open your server.js and locate the aiResponse function. You’re going to change its name to aiResponseStream – since that’s now more descriptive – and write a little code to handle the streamed tokens.

Here’s the code before:

async function aiResponse(messages) {
  let completion = await anthropic.messages.create({model: claude-3-7-sonnet-20250219", max_tokens: 1024, messages: messages, system: SYSTEM_PROMPT });
  return completion.content[0].text;
}

Once you’ve located that block, change the code to this:

async function aiResponseStream(messages, ws) {
  const stream = await anthropic.messages.stream({
    model: "claude-3-7-sonnet-20250219",
    max_tokens: 1024,
    system: SYSTEM_PROMPT,
    messages: messages
  });
  console.log("Received response chunks:");
  for await (const chunk of stream) {
    if (chunk.type === 'content_block_delta' && chunk.delta.type === 'text_delta') {
      const content = chunk.delta.text;
      // Send each token
      console.log(content);
      ws.send(JSON.stringify({
        type: "text",
        token: content,
        last: false,
      }));
    }
  }
  // Send the final "last" token when streaming completes
  ws.send(JSON.stringify({
    type: "text",
    token: "",
    last: true,
  }));
  console.log("Assistant response complete.");
  return ( stream.finalMessage() );
}

When you use the Anthropic library’s stream() function, Anthropic changes to token streaming mode and sends one token at a time. Your app then forwards tokens to ConversationRelay with last: false so ConversationRelay knows to expect more.

And finally, when the response is done, this code sends an empty text message with last: true and returns the final accumulated response from the assistant using stream.finalMessage().

Change how you handle the ConversationRelay ‘prompt’ message

In server.js, locate the case "prompt": in your switch block in fastify.register(. Since you’re now calling an asynchronous function that returns a Promise, you need to wait for the Promise to return before adding the “assistant” response to the conversation.

Here’s the code before:

case "prompt":
          console.log("Processing prompt:", message.voicePrompt);
          const conversation = sessions.get(ws.callSid);
          conversation.push({ role: "user", content: message.voicePrompt });
          const responseText = await aiResponse(conversation);
          conversation.push({ role: "assistant", content: responseText });
          ws.send(
            JSON.stringify({
              type: "text",
              token: responseText,
              last: true,
            })
          );
          console.log("Sent response:", responseText);
          break;

Change that code to the following:

case "prompt":
          console.log("Processing prompt:", message.voicePrompt);
          const conversation = sessions.get(ws.callSid);
          conversation.push({ role: "user", content: message.voicePrompt });
          await aiResponseStream(conversation, ws).then (response => {
            conversation.push({ role: response.role, content: response.content });
          });
          break;

As you can see, you now await the return from the aiResponseStream function, and push the role and content from the LLM into the local conversation storage.

Add conversation tracking and interruption handling

Nice work so far! If you test the app again – and try your open ended prompt of “name ten things that happened in 1999” – you should notice much improved latency. But if you try to interrupt the AI you might notice another problem: while it sounds ( verbally) like you interrupted its response, the AI doesn’t know how much of the response you heard!

In this step, you’ll track the conversation locally, and show one way to handle these interruptions and keep your AI informed using ConversationRelay’s utteranceUntilInterrupt attribute on interrupt messages and indexing into local conversation storage.

Modify the AI response function again

Head back to your aiResponseStream function. You’re going to update it further to accumulate tokens as they arrive:

async function aiResponseStream(conversation, ws) {
  const stream = await anthropic.messages.create({
    model: "claude-3-7-sonnet-20250219",
    max_tokens: 1024,
    messages: conversation,
    system: SYSTEM_PROMPT,
    stream: true,
  });
  let fullResponse = "";
  for await (const chunk of stream) {
    if (
      chunk.type === "content_block_delta" &&
      chunk.delta.type === "text_delta"
    ) {
      const content = chunk.delta.text;
      // Send each token
      console.log(content);
      ws.send(
        JSON.stringify({
          type: "text",
          token: content,
          last: false,
        })
      );
      fullResponse += content;
    }
  }
  // Send final message to indicate completion
  ws.send(
    JSON.stringify({
      type: "text",
      token: "",
      last: true,
    })
  );
  return fullResponse;
}

You’re now both forwarding tokens to ConversationRelay and accumulating them in the assistantSegments array. When the AI’s response is done, you push the complete response into sessionData.conversation where we are storing all of the turns of the conversation. As you see in a minute, we will index into those conversation turns to determine where we interrupt. But first: more changes to the ’prompt’ message handler.

Change how we manage the ‘prompt’ message

You're yet again changing how you track the conversation. Since aiResponseStream now handles accumulating tokens and adding them to the session conversation history, you can simplify the ’prompt’ message case in your case block.

Change the ’prompt’ case to:

case "prompt":
          console.log("Processing prompt:", message.voicePrompt);
          const sessionData = sessions.get(ws.callSid);
          sessionData.push({
            role: "user",
            content: message.voicePrompt,
          });
          const response = await aiResponseStream(sessionData, ws);
          if (response) {
            sessionData.push({
              role: "assistant",
              content: response,
            });
          }
          break;

… and then move on to ‘interrupt’ messages.

Handle ‘interrupt’ messages

Now, modify the WebSocket handling to better handle the interrupt message from ConversationRelay. Start by calling a new function, handleInterrupt, when we get the interrupt message from ConversationRelay:

case "interrupt":
          console.log("Handling interruption; last utterance: ", message.utteranceUntilInterrupt);
          handleInterrupt(ws.callSid, message.utteranceUntilInterrupt);
          break;

Now, turn your attention to this new function. It’ll take the utteranceUntilInterrupt from the interrupt message along with the callSid which the app is using to index the conversation turns.

Here’s the function to add:

function handleInterrupt(callSid, utteranceUntilInterrupt) {
  let conversation = sessions.get(callSid);
  // Find the last assistant message that contains the interrupted utterance
  const interruptedIndex = conversation.findLastIndex(
    (message) =>
      message.role === "assistant" &&
      message.utteranceUntilInterrupt !== "" &&
      message.content.includes(utteranceUntilInterrupt)
  );
  if (interruptedIndex !== -1) {
    const interruptedMessage = conversation[interruptedIndex];
    const interruptPosition = interruptedMessage.content.indexOf(
      utteranceUntilInterrupt
    );
    const truncatedContent = interruptedMessage.content.substring(
      0,
      interruptPosition + utteranceUntilInterrupt.length
    );
    // Update the interrupted message with truncated content
    conversation[interruptedIndex] = {
      ...interruptedMessage,
      content: truncatedContent,
    };
    // Remove any subsequent assistant messages
    conversation = conversation.filter(
      (message, index) =>
        !(index > interruptedIndex && message.role === "assistant")
    );
  }
  sessions.set(callSid, conversation);
}

As you saw, the handleInterrupt function is only called if a voice interaction is interrupted – that is, when ConversationRelay sends the interrupt message. This code finds the utteranceUntilInterrupt in your local conversation storage and replaces that part of the AI’s turn with what ConversationRelay told your app the user has heard. Then, the next time you pass the conversation to Anthropic (with aiResponseStream(conversation, ws), the AI will ‘know’ how far the caller heard. Pretty cool, right?

If you missed a step and it isn’t working, you can see the complete code for this step, here.

Test your AI application

It’s time for the payoff; you’re ready to test!

You might already have these initial steps memorized (or finished) for the basic tutorial. Feel free to skip the setup if so.

Set up the AI app

In your terminal, open up a connection using ngrok:

ngrok http 8080

Now, update your .env with your new ngrok URL (do not include the scheme (the “https://” or “http://”):

NGROK_URL="your-ngrok-subdomain.ngrok.app"

Now, start up the server:

node server.js

Go into your Twilio console and find the phone number to use. Set the configuration under A call comes in with the Webhook option and HTTP GET in the HTTP field.

For the URL, add your ngrok URL (this time include the “https://”), and add /twiml. For example, https://abcdefgh.ngrok.app/twiml.

A screenshot showing Twilio console call service

Hit Save, you’re ready to test.

Call Claude

Dial that Twilio number you set up. You should soon hear a greeting – talk to your new, enhanced Claude – Anthropic’s AI Assistant!

Try interrupting the AI when it is responding to your prompts. If you’re sick of asking about 1999, ask about various dog breeds, or ask it to come up with a long, complicated sentence with a lot of commas that seems to run on and never end, despite its best intentions, even though long, complicated sentences aren’t usually appropriate for voice conversations. Whatever response you interrupt, test by asking how far the AI got before you interrupted.

Next steps

Congratulations on the new, more powerful AI voice assistant you just built with Node.js, ConversationRelay, Claude, and a few lines of code! In the next post, we’ll explore tool or function calling to expand your AI assistant’s horizons just a little bit further.

Paul Kamp is the Technical Editor-in-Chief of the Twilio Blog. He felt awkward continually interrupting the AI. You can reach Paul – and don’t worry, you aren’t interrupting – at pkamp [at] twilio.com.