Add Function and Tool Calling to a Twilio Voice OpenAI Integration

May 16, 2025
Written by
Reviewed by
Paul Kamp
Twilion

Add Function and Tool Calling to a Twilio Voice OpenAI Integration

ConversationRelay is a product from Twilio that allows you to build real-time, human-friendly voice applications. With ConversationRelay, you can interact with your AI Large Language Model (LLM) of choice. It opens a WebSocket so you can integrate with any AI API, allowing for a fluid, event-based interaction over a fast two-way connection.

In previous tutorials, we talked about getting set up with ConversationRelay and getting your AI application running integrating OpenAI. After getting started with ConversationRelay and OpenAI, the next tutorial updated our application for better interruption handling using Token Streaming.

What's next for our ConversationRelay application? In this chapter of this tutorial series, you will learn how to add Tool Calling to your application. This update will allow you to integrate your AI conversation with additional APIs of your choice, using the example of calling a simple joke retrieval API, jokeapi.dev.

Tool Calling can open up a world of possibilities for your application, allowing you to interact with calendars, databases, or whatever you can imagine. Let's get to the code!

Prerequisites

To deploy this tutorial you will need:

  1. Node.js installed on your machine
  2. A Twilio phone number (Sign up for Twilio here)
  3. Your IDE of choice (such as Visual Studio Code)
  4. The ngrok tunneling service (or other tunneling service)
  5. An OpenAI Account to generate an API Key
  6. A phone to place your outgoing call to Twilio

Write the code

This tutorial is a sequel to our ConversationRelay Quickstart and Better Interruption Handling tutorials in Node.js. For this tutorial you have two options: you can either use a pre-existing tutorial, cloned from this repo, or you can start your application from scratch. Instructions for both are provided here.

To view all of the code for this quickstart, please visit the repo on GitHub. The quickstart is divided into different branches so that you can follow every step of the way. This tutorial is based on the branch Step 4: Tool Calling.

If you're starting this tutorial from scratch, start by creating a new folder for your project.

mkdir conversation-relay-node
cd conversation-relay-node

Next, initiate a new node.js project, and install the prerequisites.

npm init -y
npm install fastify @fastify/websocket @fastify/formbody openai dotenv axios npm pkg set type="module"

If you are, instead, building on the tutorial from our previous post, there's only two new things you have to install:

npm install @fastify/formbody axios

In this build you are using Fastify as your framework. It lets you quickly spin up a server for both the WebSocket you'll need, as well as the route for the instructions you're going to need to provide to Twilio.

One new addition here from previous tutorials is adding @fastify/formbody. The extension will allow your application to process the body of the response from your external API tool, allowing you to POST to the endpoint from your Twilio webhook (the default in the Console). This line also adds axios, a JavaScript library for making HTTP requests. This will be used to communicate with your tool API.

Creating the Environment File

This build will also – as before, if you completed the other steps – require an API key for OpenAI.

If you try to expand this tutorial to call other APIs in the future, you will also need to store keys for those APIs in a safe place. Therefore, it’s best practice to have a project .env file. If you do not have one, create it now in your project folder.

In .env, use the following line of code, replacing the placeholder shown with your actual key from the OpenAI API keys page.

OPENAI_API_KEY="YOUR_OPEN_API_KEY"
If you're going to save your project on GitHub, be sure not to expose any API keys to the internet. Do this by adding your .env file to a .gitignore file, or making sure to blank any API keys out before committing your build as in the provided GitHub example.

You'll add more to this environment file as you work through the project.

Write your server code

If you're building off the old quickstart, you should have a file called server.js. If you don't have this file yet, create it now, in the same folder as your .env file. This is where the primary code for your project server is going to be stored. Create this file in the same directory as your .env file.

At the top of the file are all the necessary imports.

import Fastify from "fastify";
import fastifyWs from "@fastify/websocket";
import fastifyFormBody from '@fastify/formbody';
import OpenAI from "openai";
import dotenv from "dotenv";
import axios from "axios";
dotenv.config();

This code will go immediately beneath the above. This is an alteration from previous tutorial code.

Here, you're going to alter the WELCOME_GREETING prompt slightly from previous tutorials just to let users know what specific functionality your assistant has. You are then changing the SYSTEM_PROMPT to make sure that the OpenAI API understands when it's the appropriate time to access the API. The get_programming_joke function will be called when the user is asking for a programming joke, and this demonstrates the tool calling with the additional API.

const PORT = process.env.PORT || 8080;
const DOMAIN = process.env.NGROK_URL;
const WS_URL = `wss://${DOMAIN}/ws`;
const WELCOME_GREETING =
  "Hi! I am a voice assistant powered by Twilio and Open A I . Ask me anything! Try asking me for a programming joke!";
const SYSTEM_PROMPT = `You are a helpful assistant. This conversation is being translated to voice, so answer carefully. 
When you respond, please spell out all numbers, for example twenty not 20. Do not include emojis in your responses. Do not include bullet points, asterisks, or special symbols.
You should use the 'get_programming_joke' function only when the user is asking for a programming joke (or a very close prompt, such as developer or software engineering joke). For other requests, including other types of jokes, you should use your own knowledge.`;
const sessions = new Map();
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

Adding the getJoke Function

Next, we're adding this getJoke function which allows us to use the jokeapi.dev service when a user requests a programming joke. This is a simple API that does not require any external authentication. For any API you want to call in the future, you might need to add additional API keys, parameters, or security.

Add this code to your server.js.

async function getJoke() {
  // Use jokeapi.dev to fetch a clean joke
  const response = await axios.get(
    "https://v2.jokeapi.dev/joke/Programming?safe-mode"
  );
  const data = response.data;
  return data.type === "single"
    ? data.joke
    : `${data.setup} ... ${data.delivery}`;
}
async function aiResponseStream(conversation, ws) {
  const tools = [
    {
      type: "function",
      function: {
        name: "get_programming_joke",
        description: "Fetches a programming joke",
        parameters: {
          type: "object",
          properties: {},
          required: [],
          additionalProperties: false,
        },
        strict: true,
      },
    },
  ];
  const stream = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: conversation,
    tools: tools,
    stream: true,
  });
  const assistantSegments = [];
  console.log("Received response chunks:");
  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || "";
    const toolCalls = chunk.choices[0].delta.tool_calls || [];
    for (const toolCall of toolCalls) {
      if (toolCall.function.name === "get_programming_joke") {
        const joke = await getJoke();
        // Append tool call request and the result with the "tool" role
        conversation.push({
          role: "assistant",
          tool_calls: [
            {
              id: toolCall.id,
              function: {
                name: toolCall.function.name,
                arguments: "{}",
              },
              type: "function",
            },
          ],
        });
        conversation.push({
          role: "tool",
          tool_call_id: toolCall.id,
          content: joke,
        });
        // Send the final "last" token when streaming completes
        ws.send(JSON.stringify({ type: "text", token: joke, last: true }));
        assistantSegments.push(joke);
        console.log("Fetched joke:", joke);
      }
    }
    console.log("Chunk:", content);
    ws.send(
      JSON.stringify({
        type: "text",
        token: content,
        last: false,
      })
    );
    assistantSegments.push(content);
  }
  ws.send(
    JSON.stringify({
      type: "text",
      token: "",
      last: true,
    })
  );
  console.log("Assistant response complete.");
  const sessionData = sessions.get(ws.callSid);
  sessionData.conversation.push({
    role: "assistant",
    content: assistantSegments.join(""),
  });
  console.log(
    "Final accumulated response:",
    JSON.stringify(assistantSegments.join(""))
  );
}

Setting up Server Configuration

This next block of code will be added below the getJoke function. This code sets up your server to handle the calls. It's unchanged from the previous tutorial, but replicated here to make it easier to follow this tutorial. If you're using the old server.js file, you can leave this as written.

const fastify = Fastify();
fastify.register(fastifyWs);
fastify.register(fastifyFormBody);
fastify.all("/twiml", async (request, reply) => {
  reply.type("text/xml").send(
    `<?xml version="1.0" encoding="UTF-8"?>
    <Response>
      <Connect>
        <ConversationRelay url="${WS_URL}" welcomeGreeting="${WELCOME_GREETING}" />
      </Connect>
    </Response>`
  );
});
fastify.register(async function (fastify) {
  fastify.get("/ws", { websocket: true }, (ws, req) => {
    ws.on("message", async (data) => {
      const message = JSON.parse(data);
      switch (message.type) {
        case "setup":
          const callSid = message.callSid;
          console.log("Setup for call:", callSid);
          ws.callSid = callSid;
          sessions.set(callSid, {
            conversation: [{ role: "system", content: SYSTEM_PROMPT }],
            lastFullResponse: [],
          });
          break;
        case "prompt":
          console.log("Processing prompt:", message.voicePrompt);
          const sessionData = sessions.get(ws.callSid);
          sessionData.conversation.push({
            role: "user",
            content: message.voicePrompt,
          });
          aiResponseStream(sessionData.conversation, ws);
          break;
        case "interrupt":
          console.log(
            "Handling interruption; last utterance: ",
            message.utteranceUntilInterrupt
          );
          handleInterrupt(ws.callSid, message.utteranceUntilInterrupt);
          break;
        default:
          console.warn("Unknown message type received:", message.type);
          break;
      }
    });
    ws.on("close", () => {
      console.log("WebSocket connection closed");
      sessions.delete(ws.callSid);
    });
  });
});

Interruption Handling

This is the code we used to handle interruptions in the previous tutorial. That code is better explained there. Since there is no need to remove this functionality from the updated conversation bot, this code is left in.

Use this code to finalize the bot. It handles interruptions and finishes creating your server.

function handleInterrupt(callSid, utteranceUntilInterrupt) {
  const sessionData = sessions.get(callSid);
  const conversation = sessionData.conversation;
  let updatedConversation = [...conversation];
  const interruptedIndex = updatedConversation.findLastIndex(
    (message) =>
      message.role === "assistant" &&
      message.content &&
      message.content.includes(utteranceUntilInterrupt)
  );
  if (interruptedIndex !== -1) {
    const interruptedMessage = updatedConversation[interruptedIndex];
    const interruptPosition = interruptedMessage.content.indexOf(
      utteranceUntilInterrupt
    );
    const truncatedContent = interruptedMessage.content.substring(
      0,
      interruptPosition + utteranceUntilInterrupt.length
    );
    updatedConversation[interruptedIndex] = {
      ...interruptedMessage,
      content: truncatedContent,
    };
    updatedConversation = updatedConversation.filter(
      (message, index) =>
        !(index > interruptedIndex && message.role === "assistant")
    );
  }
  sessionData.conversation = updatedConversation;
  sessions.set(callSid, sessionData);
}
try {
  fastify.listen({ port: PORT });
  console.log(
    `Server running at http://localhost:${PORT} and wss://${DOMAIN}/ws`
  );
} catch (err) {
  fastify.log.error(err);
  process.exit(1);
}

Run and test

It's time to test your application. If you've followed previous tutorials, these steps should be familiar. Start by using ngrok to open a tunnel to your locally running application.

ngrok http 8080

You will need to keep the ngrok url for use in two places: in the Twilio console, and in your environment files.

Get the URL for your file and add it to the .env file using this line:

NGROK_URL="1234abcd.ngrok.app"

Replace the beginning of this placeholder with the correct information from your ngrok url. Note that you do not include the scheme (the “https://” or “http://”) in the environment variable.

Now you are ready to run your server. Return to your code and type the following into the console:

node server

Go into your Twilio console, and look for the phone number that you registered.

Set the configuration under A call comes in with the Webhook option as shown below.

In the URL space, add your ngrok URL (this time including the “https://”), and follow that up with /twiml for the correct routing. Everything else can be left as default, including HTTP POST.

Showing the webhook setup in the console

When a call is connected, Twilio will first speak the greeting message that you provided. Then it will use the provided ngrok URL to connect directly to the websocket. That websocket connection will open up the line for you to have a conversation with OpenAI.

Save your configurations in the console. Now dial up the number on your phone.

If everything is hooked up correctly, you should hear your customized greeting.

To test the tool calling portion of your application, try asking the AI assistant to respond with a joke about programming. This interaction should call the appropriate API and retrieve that for the body. Then the programming joke it retrieves will be included in your response!

Example screenshot showing a working programming joke from the API

What else can you do with ConversationRelay?

Thank you for reading our series on ConversationRelay. With these tutorials you have learned how to:

  • Set up an application for real time phone conversations with an AI assistant
  • Use Token Streaming to decrease latency with the AI conversation
  • Interrupt your AI conversation and allow the AI to have knowledge about where that interruption occurred
  • Call Tools that allow the AI to interact with different APIs.

Imagine the possibilities you can open up building with ConversationRelay! Let's build something amazing that helps you be more productive and have fun too.

My colleagues have built some awesome sample applications and demos on top of ConversationRelay. Here’s a few more articles that you can check out:

Amanda Lange is a .NET Engineer of Technical Content. She is here to teach how to create great things using C# and .NET programming. She can be reached at amlange [ at] twilio.com.