Set Up a Native Integration with Conversational Intelligence and ConversationRelay using Node.js

May 14, 2025
Written by
Paul Kamp
Twilion
Jeff Eiden
Twilion

As businesses deploy AI agents and field real-world customer requests, they need to track their performance outside test environments. Observability helps teams ensure their agents act responsibly, meet user needs, and deliver the business value they were designed to provide. AI observability with Twilio is now much simpler with our newly announced integration of ConversationRelay with Conversational Intelligence.

In this tutorial, I'll show you how to add Conversational Intelligence to a ConversationRelay build to enable AI agent observability. You'll create an Intelligence Service and attach several Custom Operators to the service to judge the app's conversational performance. Then, together, we'll start with a pre-built Node.js ConversationRelay and OpenAI integration and create a mock AI operator for Owl HVAC. Finally, we'll call the app on the phone and verify that we're storing transcriptions in Conversational Intelligence… and that the service is extracting insights from our conversation!

By the end of this tutorial, you should be able to replicate the demo below:

 

 

Let’s get started.

Prerequisites

To deploy this tutorial, you will need:

  • Node.js on your machine (we tested using v23.9.0)
  • A Twilio phone number with Voice capabilities (If you haven’t yet, you can sign up for Twilio here)
  • ngrok or another tunneling service
  • An OpenAI Account and API Key
  • A phone to place your outgoing call to Twilio
The Conversational Intelligence and ConversationRelay integration will only work if your account is not in PCI Mode. You need to create a new trial account to test the integration if PCI mode is enabled on your account.

Create and set up an Intelligence Service with Conversational Intelligence

At this point, you've purchased a phone number for your account. (If you haven't yet, follow these instructions and get one with Voice capabilities).

Now, let’s get acquainted with Conversational Intelligence!

From your Twilio Console, locate the Jump to… search box. Type ‘intelligence’, and select Conversational Intelligence Overview.

Screenshot of Twilio dashboard with a welcome message and admin settings highlighted.

Optionally, now find the Conversational Intelligence entry on the sidebar and select the triple dot icon, then Pin to Sidebar. (This would make it simpler to find in the future.)

Context menu with options Pin to sidebar and Go to Docs under Conversational Intelligence.

Set up an Intelligence Service

Under Conversational Intelligence on the left sidebar, select Services. If this is your first time creating a service, a blue button will be centered on the page that says Create a Service - hit it now!

In the next screen, fill out a creative Unique Name and Friendly Name (or, hey, use ‘CAI-CR-Demo’ and ‘Owl HVAC Assistant’, like me!), and select the language you anticipate your callers will be speaking when they call your agent.

Twilio will prompt you about our Predictive and Generative AI/ML Features Addendum if you haven't yet agreed to it. Take some time to review and choose to agree if you'd like to proceed.
Form with fields for Unique Name, Friendly Name, and Language for an Owl HVAC Assistant.

You'll also decide whether you want to redact Personally Identifiable Information (PII), automatically transcribe all new recordings in your account, and enable your data to be used to improve Twilio and our affiliates' products.

(You can also set these for the demo, and change them later.)

Finally, Create the service.

Get the service’s Service SID

Find the service you just created in your Twilio Console's Services page. It'll list a Service SID – copy that and set it aside, we'll need it when we get down to coding.

Service configuration details for CAI-CR-Demo with highlighted Service ID.

Add Language Operators to your service

Conversational Intelligence’s Language Operators use artificial intelligence and machine learning technologies to provide analysis and insights on transcripts. For our simulated HVAC company, operators will give us an idea of how calls are proceeding, without us needing to read each one manually.

AI agent observability is a unique use case for Language Operators. To help you get started, Twilio's Machine Learning (ML) team created a set of six specialized Generative Custom Operator examples designed to help you analyze AI agent conversations and gain insights into their behaviors. These Operators are:

 

Operator Name Description Use Case
Virtual Agent Task Completion Determines if the virtual agent satisfied the customer's primary request. Helps measure the efficacy of the virtual agent in serving the needs of customers. Informs future enhancements to virtual agents for cases where customer goals were not met.
Human Escalation Request Detects when a customer asks to be transferred from a virtual agent to a human agent. Helps measure the efficacy of the virtual agent in achieving self-service containment rates. Useful in conjunction with Task Completion detection above.
Hallucination Detection Detects when an LLM response includes information that is factually incorrect or contradictory. Ensure the accuracy of information provided to customers. Inform fine-tuning of virtual agent to prevent future instances of hallucination.
Toxicity Detection Flags whether an LLM response contains toxic language, defined as language deemed to inappropriate or harmful. Determine whether follow-up with customer is necessary, and/or inform adding virtual agent guardrails to prevent future instances of toxicity. Useful in conjunction with Hallucination Detection above.
Virtual Agent Predictive CSAT Derives a customer satisfaction score from 0-5 based on virtual agent interaction. Measures overall customer satisfaction when engaging with virtual agents. Can serve as a heuristic for a variety of important customer indicators such as brand loyalty, likelihood of repurchase, retention, and churn.
Customer Emotion Tagging Categorize customer sentiment into a variety of identified emotions. Offers more fine-grained and subtle insight into how customers feel when engaging with virtual agents.


For each of the AI Observability Operators, you can find sample prompts, JSON schema, training examples, and example outputs linked above. These examples can help you understand how to use the Operators effectively and get the most out of your AI agent observability efforts. Moreover, these Operators are designed to be used with the Generative Custom Operator type. You must select Generative when creating the Custom Operator in Console and use the provided Operator configuration linked above.

Here’s how it looked after I added them to my Intelligence Service:

Screen capture of Twilio dashboard showing various AI services like sentiment analysis, task completion, and more.

Beyond the specialized AI Observability operators above, you can also apply additional Language Operators based on your use case. Conversational Intelligence offers both pre-built operators – predefined in the Console for common use cases (e.g., summarization), and the ability to build your own tailored LLM tasks with natural language using Generative Custom Operators.

Awesome, your service is now ready – let’s build an AI app and test them out!

Build a virtual agent with ConversationRelay

In the last section, you set up the Intelligence Service and added some Custom Operators to monitor the virtual operator and conversations. In this section, we'll build the scaffolding for a virtual agent.

Now, you're going to build a ConversationRelay integration with OpenAI. ConversationRelay lets you create real-time voice applications with AI Large Language Models (LLMs) that you can call using Twilio Voice.

We'll tell the LLM to pretend it's an agent for Owl HVAC for the purposes of this demo. You'll customize a greeting and have a virtual agent you can call by the end of this section.

Want to skip the step-by-step? You can find all the code you’ll need to clone in my repo.

Want to see how we built the integration? Check out my colleague Amanda Lange’s tutorial series here.

After this launch, we can set up ConversationRelay with one attribute, intelligenceService, which automatically will transcribe our conversation in our Conversational Intelligence Service while running our choice of Language Operators.

Here’s what the underlying TwiML looks like (you’ll also see it in the code, below):

<ConversationRelay 
    url="${WS_URL}" 
    welcomeGreeting="${WELCOME_GREETING}"               
    intelligenceService="${CONVERSATIONAL_INTELLIGENCE_SID}" 
/>

Let’s work on the app.

Initialize the project

Navigate to the parent directory where you want to set up the project. Set up the project scaffolding with:

mkdir cai-cr-demo-openai-node 
cd cai-cr-demo-openai-node 
npm init -y; npm pkg set type="module";

Then, you’ll need to install the dependencies:

npm install fastify ws dotenv @fastify/formbody @fastify/websocket openai

Create the project files

Now, create the .env file:

touch .env

In that file, add the following lines:

OPENAI_API_KEY=<paste your OpenAI API Key>
CONVERSATIONAL_INTELLIGENCE_SID=<paste the Intelligence SID>
NGROK_URL=

For the OPENAI_API_KEY, add your API Key from OpenAI. Then, paste in the CONVERSATIONAL_INTELLIGENCE_SID, which is the Service SID from the Service you created in the first section.

Leave NGROK_URL blank for now, you’ll come back to it.

Then, create the file index.js – that’s where you’ll be writing the code for the integration.

touch index.js

Build the ConversationRelay integration

Now, open index.js in your favorite editor (personally, I prefer ed). It’s time to code up the server.

Step 1: Import dependencies, load our Environment Variable, and initialize Fastify

Add our dependencies at the top of the file, then define a few constants. And don't worry – I'll explain the last two at the end of the section.

import Fastify from "fastify";
import fastifyWs from "@fastify/websocket";
import fastifyFormBody from "@fastify/formbody";
import OpenAI from "openai";
import dotenv from "dotenv";
dotenv.config();
const PORT = process.env.PORT || 8080;
const DOMAIN = process.env.NGROK_URL;
const CONVERSATIONAL_INTELLIGENCE_SID = process.env.CONVERSATIONAL_INTELLIGENCE_SID;
const WS_URL = `wss://${DOMAIN}/ws`;
const WEBHOOK_URL = `https://${DOMAIN}/webhook`; 
const WELCOME_GREETING = "Hi! I am a voice assistant for Owl HVAC powered by Twilio and Open AI . How can I help you today?";
const SYSTEM_PROMPT = 
`You are a virtual customer service assistant for Owl HVAC, a trusted heating and cooling company. You assist customers over voice or chat with a wide range of requests, including emergency A/C repair, scheduling service appointments, and answering general HVAC-related questions. Your tone should be empathetic, helpful, and professional—especially when customers are dealing with urgent issues like a broken A/C during off hours. 
Briefly acknowledge the customer's concern and clearly explain next steps, especially how and when a technician will be dispatched the next available time during business hours or after hours for emergencies. Do not provide any dangerous troubleshooting steps to the customer under any circumstance – you are not a licensed HVAC technician (e.g., do not even suggest 'If it feels safe to do so, try turning the unit off and back on at the breaker').
You do not have access to actual systems, but you should simulate actions such as booking appointments, escalating urgent issues, or dispatching technicians. Confirm these actions using realistic, reassuring language (e.g., 'I've scheduled a technician to come to your home first thing in the morning at 8 AM.'). Don't use any placeholders or generic responses. Instead, provide realistic details that would be relevant to the customer based on their request.
You should also simulate recognizing the customer and accessing their account based on the incoming phone number. Ask them for their name and pretend that you have a record of them because of service you did in the past, and pretend their address is 1234 Main Street, Chicago, IL.
If the customer mentions their HVAC unit, you should simulate having access to the customer's service history and equipment details, including any recent installations or service calls relevant to the request.
If a customer asks to escalate, or speak to a human, you should empathize, but you don't have the ability to do that.
Avoid technical jargon, and speak in a calm, capable, and compassionate tone. If you make a mistake or receive pushback, apologize and correct yourself in a polite, non-defensive manner. Your goal is to build trust, provide reassurance, and create the impression that real action is being taken, even though this is a simulated or demo environment.
This conversation is being translated to voice, so answer carefully. When you respond, please spell out all numbers, for example twenty not 20. Do not include emojis in your responses. Do not include bullet points, asterisks, or special symbols.`;
const sessions = new Map();
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

For a more detailed description of what’s going on here, see Amanda’s tutorial. Let’s concentrate on a couple lines:

  • WELCOME_GREETING - this variable controls the first line the AI will say to the caller after it picks up
  • SYSTEM_PROMPT - this (long) variable controls how the AI will act during the call. Since this is a demo, this sample prompt asks the AI to roleplay as a customer service assistant for a fake HVAC Company (“Owl HVAC”).

You’ll also note a few lines about how the conversation is being translated to voice, and some rules to follow. You can find our prompt engineering best practices here.

Next up, we’ll work on the endpoints for our server.

Step 2: Interface with OpenAI

To improve the assistant's ability to handle interrupts – and improve the callers' latency and experience – we will be streaming tokens from OpenAI.

Below the above code, add a new aiResponseStream( function to handle our interaction with OpenAI:

async function aiResponseStream(conversation, ws) {
  const stream = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: conversation,
    stream: true,
  });
  const assistantSegments = [];
  console.log("Received response chunks:");
  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || "";
    // Send each token
    console.log(content);
    ws.send(JSON.stringify({
      type: "text",
      token: content,
      last: false,
    }));
    assistantSegments.push(content);
  }
  // Send the final "last" token when streaming completes
  ws.send(JSON.stringify({
    type: "text",
    token: "",
    last: true,
  }));
  console.log("Assistant response complete.");
  const sessionData = sessions.get(ws.callSid);
  sessionData.conversation.push({ role: "assistant", content: assistantSegments.join("") });
}

Briefly, here we manage our interaction with the OpenAI API.

We use the gpt-4o-mini model, and after we send a prompt (from the caller's voice), we accumulate the streamed tokens back from the model while also sending them, simultaneously, to ConversationRelay. When the response is complete, we add the full response to our local conversation storage with the assistant role.

Step 3: Define some endpoints

For this bank assistant demo, we need two endpoints: one to provide instructions to Twilio when we receive an inbound call, and another to set up a WebSocket to proxy communications with OpenAI and the caller back and forth with ConversationRelay.

Below the aiResponseStream( function, add the following code:

const fastify = Fastify();
fastify.register(fastifyFormBody);
fastify.register(fastifyWs);
fastify.all("/twiml", async (request, reply) => {
  reply.type("text/xml").send(
    `<?xml version="1.0" encoding="UTF-8"?>
    <Response>
      <Connect>
        <ConversationRelay url="${WS_URL}" welcomeGreeting="${WELCOME_GREETING}" intelligenceService="${CONVERSATIONAL_INTELLIGENCE_SID}" />
      </Connect>
    </Response>`
  );
});
fastify.register(async function (fastify) {
  fastify.get("/ws", { websocket: true }, (ws, req) => {
    ws.on("message", async (data) => {
      const message = JSON.parse(data);
      switch (message.type) {
        case "setup":
          const callSid = message.callSid;
          console.log("Setup for call:", callSid, "Intelligence Service SID:", CONVERSATIONAL_INTELLIGENCE_SID);
          ws.callSid = callSid;
          sessions.set(callSid, {conversation: [{ role: "system", content: SYSTEM_PROMPT }], lastFullResponse: []});
          break;
        case "prompt":
          console.log("Processing prompt:", message.voicePrompt);
          const sessionData = sessions.get(ws.callSid);
          sessionData.conversation.push({ role: "user", content: message.voicePrompt });
          aiResponseStream(sessionData.conversation, ws);
          break;
        case "interrupt":
          console.log("Handling interruption; last utterance: ", message.utteranceUntilInterrupt);
          handleInterrupt(ws.callSid, message.utteranceUntilInterrupt);
          break;
        default:
          console.warn("Unknown message type received:", message.type);
          break;
      }
    });
    ws.on("close", () => {
      console.log("WebSocket connection closed");
      sessions.delete(ws.callSid);
    });
  });
});

Here you can see the endpoints. Briefly, here’s what’s happening:

  • /ws - this path sets up a WebSocket between our app and ConversationRelay. We’re handling three ConversationRelay message types: ‘setup’, ‘prompt’, and ‘interrupt’.
  • /twiml - this path returns instructions to Twilio on an inbound call. They use a language called TwiML, or Twilio Markup Language. We use the <ConversationRelay> noun, and pass it the aforementioned WELCOME_GREETING, path to the above WebSocket, and the CONVERSATIONAL_INTELLIGENCE_SID we defined in the .env file.Adding that Intelligence Service SID is all you’ll need to get Conversational Intelligence working with your ConversationRelay app!

You’re really cooking now. Let’s finish things up by handling any interrupts from the caller and running the server.

Step 4: Handle interrupts and run the server

Below all of the above, paste this last section of code:

function handleInterrupt(callSid, utteranceUntilInterrupt) {
  const sessionData = sessions.get(callSid);
  const conversation = sessionData.conversation;
  let updatedConversation = [...conversation];
  const interruptedIndex = updatedConversation.findLastIndex(
    (message) =>
      message.role === "assistant" &&
      message.content.includes(utteranceUntilInterrupt),
  );
  if (interruptedIndex !== -1) {
    const interruptedMessage = updatedConversation[interruptedIndex];
    const interruptPosition = interruptedMessage.content.indexOf(
      utteranceUntilInterrupt,
    );
    const truncatedContent = interruptedMessage.content.substring(
      0,
      interruptPosition + utteranceUntilInterrupt.length,
    );
    updatedConversation[interruptedIndex] = {
      ...interruptedMessage,
      content: truncatedContent,
    };
    updatedConversation = updatedConversation.filter(
      (message, index) =>
        !(index > interruptedIndex && message.role === "assistant"),
    );
  }
  sessionData.conversation = updatedConversation;
  sessions.set(callSid, sessionData);
}
try {
  fastify.listen({ port: PORT });
  console.log(`Server running at http://localhost:${PORT} and wss://${DOMAIN}/ws`);
} catch (err) {
  fastify.log.error(err);
  process.exit(1);
}

Here we define a function, handleInterrupt, which is called when ConversationRelay detects that we interrupted the AI. In it, we index into our local conversation history and rewrite what the assistant replied with, ensuring that the next time we prompt the LLM it knows (approximately) what the user “heard”.

Finally, we have a bit of boilerplate to launch the server. But don't do that quite yet; we have a couple more steps to take!

Finish the setup and run the server

I know, I feel it too – you’re getting close! Before you run, though, you need to do a few things.

Start ngrok with ngrok http 8080 in a new terminal tab.

There, copy the ngrok URL without copying the scheme (leave out the “https://” or “http://”). Update your .env file by editing the NGROK_URL we left blank earlier:

NGROK_URL="your-ngrok-subdomain.ngrok.app"

This screenshot shows the URL I chose for my environment variable:

Ngrok terminal displaying service status, online status, update info, and connection endpoints.

Back in the original terminal, start the server:

node index

Configure Twilio

Return to your browser, and open your Twilio Console. Now, configure your phone number with the Voice Configuration Webhook for A call comes in set to

https://[your-ngrok-subdomain].ngrok.app/twiml. (Again using the NGROK_URL).

Here’s an example from my Console:

Screen with voice configuration settings highlighting a webhook URL field.

Save your changes to the number. You’re now ready to test.

Call and talk to AI!

You’re now ready to make a phone call. Dial your Twilio number, and talk to the AI!

The best part about the call is that since it's Artificial Intelligence, you don't need a set script. Try asking the assistant to look up your account, open or cancel a card, and escalate to a manager or a human.

When you hang up the call, Twilio will get to work in near real-time and examine the transcript for all of the operators you added. We’ll check if that’s working in a second.

Find your transcript in Conversational Intelligence

Navigate back to your Twilio Console. Select Transcripts under Conversational Intelligence in the sidebar. By now, the transcript should be complete – either select the transcript for the recent call in the list, or wait a few minutes, and return to select it. The source of the transcript in Conversational Intelligence should be set to ConversationRelay as shown below:

Screenshot of a webpage showing a list of call center conversation transcripts with filters and search options.

When you click on your transcript, you’ll notice a few things:

  • Participants have been automatically labeled, including the Virtual Agent
  • The contents of the conversation have been accurately transcribed, with each sentence corresponding to either the customer or the Virtual Agent
  • The Language Operator Results will be shown based on the results of the AI analysis applied to this conversation
A transcript with Conversational Intelligence enabled with Language Operators
A transcript with Conversational Intelligence enabled with Language Operators
A transcript with Conversational Intelligence enabled with Language Operators

Here is where the rubber meets the road with AI Observability. If you used the specialized operators above, you’ll get quantifiable answers to questions like:

  • Did the AI hallucinate?
  • What emotion(s) did the customer feel when engaging with the virtual agent?
  • What tasks was the virtual agent asked to complete, and did it complete them?

…as well as the results of any additional Operator tasks you defined.

Consuming the Operator Results downstream in your apps may be helpful for automations and aggregate analysis. Conversational Intelligence offers webhooks and APIs to consume transcripts and Operator Results.

Getting chatty with Conversational Intelligence

And there you have it, you built an AI Assistant that anyone can call using Twilio Voice and ConversationRelay, in Node.js. Conversational Intelligence automatically runs various operations on your transcripts to help you monitor everything happening with your app and callers. Isn't it amazing what you can do with a few lines of code and a phone?

Now that you've gotten a taste of Conversational Intelligence and ConversationRelay, the sky's the limit. Now, check out this Docs guide for the ConversationRelay and Conversational Intelligence integration.

Paul Kamp is the Technical Editor-in-Chief of the Twilio Blog. He was uncomfortable yelling at the AI while testing the app – but the work needed to be done, so yell he did. Reach Paul at pkamp [at] twilio.com, but try not to yell.

Jeff Eiden is Director of Product for Twilio Conversational Intelligence