Answer Questions about Twilio Voice Recordings with AssemblyAI and LangChain.js

August 19, 2023
Written by
Reviewed by

Blog header AssemblyAI, LangChain, JS, Twilio Voice

With Natural Language Processing (NLP), you can chat with your own documents, such as a text file, a PDF, or a website–I previously wrote about how to do that via SMS in Python. You can also, however, apply LLMs to spoken audio. Read on to learn how to use AI to answer questions from a Twilio Programmable Voice Recording with LangChain.js, AssemblyAI, and Twilio Assets using TypeScript and Node.js.

LangChain is an open-source tool that wraps around many large language models (LLMs) and tools. It is one of the easiest ways (if not the easiest way) to interact with LLMs and build applications around LLMs.

AssemblyAI offers transcription models to transcribe audio files to text so they can be used in LangChain. This tutorial was inspired by an AssemblyAI blog post written by my former sister-teammate Niels Swimberghe.

Do you prefer learning via video more? Check out this TikTok summarizing this tutorial!

Prerequisites

  1. A Twilio account - sign up for a free Twilio account here
  2. A Twilio phone number with Voice capabilities - learn how to buy a Twilio Phone Number here
  3. Node.js (version 18 or above) installed - download Node.js here
  4. OpenAI account and API key – make an OpenAI account here and get an OpenAI API Key here
  5. AssemblyAI account and API key – make an AssemblyAI account here and grab an AssemblyAI API Key here
  6. ngrok, a handy utility to connect the development version of our Python application running on your machine to a public URL that Twilio can access.

ngrok is needed for the development version of the application because your computer is likely behind a router or firewall, so it isn’t directly reachable on the Internet. You can also choose to automate ngrok as shown in this article.

Setup your Node.js project

Make a directory for this project, run npm init, and accept all the defaults.

mkdir langchain-audio-rec
cd langchain-audio-rec
npm init

Add a .env file containing your OpenAI and AssemblyAI API keys:

OPENAI_API_KEY= {REPLACE-WITH-YOUR-OPENAI-API-KEY}
ASSEMBLYAI_API_KEY = {REPLACE-WITH-YOUR-ASSEMBLYAI-API-KEY}

Record a phone call with Twilio Programmable Voice

Install some required packages you'll use:

npm install -S express
npm install -S twilio
npm install -S dotenv
npm install -S langchain

In a file called record.js, add the following code to record an inbound phone call to a Twilio phone number and transcribe the call:

"use strict";
const express = require('express');
const res = require('express/lib/response');
const VoiceResponse = require('twilio').twiml.VoiceResponse;
const bodyParser = require('body-parser')
const app = express();
app.use(bodyParser.urlencoded({ extended: false }));

// Returns TwiML which prompts the caller to record a message
app.post('/record', (request, response) => {
    // Use the Twilio Node.js SDK to build an XML response
    const twiml = new VoiceResponse();
    twiml.say('Hello. Please leave a message after the beep.');
    // Use <Record> to record the caller's message
    twiml.record();
    
    // End the call with <Hangup>
    twiml.hangup();
    // Render the response as XML in reply to the webhook request
    response.type('text/xml');
    response.send(twiml.toString());
});

// Create an HTTP server and listen for requests on port 3000
app.listen(3000);

Run the file with node record.js and in another terminal tab, run ngrok http 3000. Grab that forwarding URL so you can configure your purchased Twilio phone number to send a request to your web application.

This callback mechanism is called a webhook and its URL will be added to the number's configuration page as shown below.

Twilio Phone Call Voice Configuration with ngrok URL

Click Save configuration. Now call your purchased Twilio number and the call will be recorded! I recorded myself saying Herb Brooks' speech in the movie Miracle.

miracle-herb-brooks gif

 

Afterwards, you can view and hear the recording in your Incoming Calls Log.

 

Click on the Call SID and then download the file as an mp3 file.

Call recording with listen button and downloads and recording SID

Upload the mp3 file to Twilio Assets

Twilio Assets is a handy tool to host static assets like .txt and .mp3 files.

Make a new Service called voice-recording under Twilio Serverless here.

In your new Service, click the blue Add + button followed by Upload File.

 

Add+ upload file to Twilio Assets Service

Select your mp3 Call Recording and then set the Visibility to Public.

Upload Asset as Public

Click Upload followed by Deploy All.

Deploy all button under Assets and Settings and More

Now it's time to create a JavaScript application with LangChain.js to answer questions about the Call Recording file.

Use LangChain.js to Answer Questions about the Call Recording

Add the following line to your package.json above scripts:.

"type": "module",

In a new file called handle_transcription.js, add the following code importing OpenAI so we can use their models and AssemblyAI to transcribe the phone call recording. We also import LangChain's loadQAStuffChain (to make a chain with the LLM) and Document so we can create a Document the model can read from the audio recording transcription:

import 'dotenv/config';
import { OpenAI } from "langchain/llms/openai";
import { loadQAStuffChain } from 'langchain/chains';
import { AudioTranscriptLoader } from 'langchain/document_loaders/web/assemblyai';

(async () => {
  const llm = new OpenAI({temperature:0.7}); //temperature optional parameter for randomness/creativity
  const chain = loadQAStuffChain(llm);

  const loader = new AudioTranscriptLoader({
    // You can also use a local path to an audio file, like ./audio_file.mp3
    audio_url: "YOUR-TWILIO-ASSET-URL",
    language_code: "en_us"
  });
  const docs = await loader.load();

  const response = await chain.call({
    input_documents: docs,
    question: "Great moments are born from what?", //insert question about call recording
  });
  console.log(response.text);
})();

Now, running the file (containing the speech from the movie Miracle) with node handle_transcription.js should yield the following output:

Running `node handle_transcription.js` in the terminal and getting output of "Great moments are born from opportunity and hardwork"

Alternatively, you could use Twilio Programmable Voice Recording to transcribe the Twilio Programmable Voice call recording instead of AssemblyAI.

What's Next for Twilio AssemblyAI, and LangChain.js

LangChain.js makes it easy to build apps around LLMs–you could easily replace the OpenAI LLM with another from, say, Hugging Face or Cohere. There's also a lot of fun to be had with Twilio Programmable Voice data. You could store phone call recordings in databases, analyze them, take note of patterns, et cetera, and use AssemblyAI to perform summarization, content moderation, topic detection, and more. Developers could perform Retrieval Augmented Generation (RAG) question-answering (QA) on podcasts, lectures, interviews, and so many other recordings. I can't wait to see what you build–let me know online what you're working on with AI!