Answer Questions about Twilio Voice Recording Transcriptions with LangChain.js

August 21, 2023
Written by
Reviewed by

header image Answer Questions about Twilio Voice Recording Transcriptions with LangChain.js

With Natural Language Processing (NLP), you can chat with your own documents, such as a text file, a PDF, or a website–I previously wrote about how to do that via SMS in Python. You can also, however, apply LLMs to spoken audio! Read on to learn how to answer questions from a Twilio Programmable Voice Recording Transcription with LangChain.js.

Thank you to my sister-teammate Craig Dennis for pair programming with me in the office!

What is LangChain?

LangChain is an open-source tool that wraps around many large language models (LLMs) and tools. It is the easiest way (if not one of the easiest ways) to interact with LLMs and build applications around LLMs.

Prerequisites

  1. A Twilio account - sign up for a free Twilio account here
  2. A Twilio phone number with Voice capabilities - learn how to buy a Twilio Phone Number here
  3. Node.js (version 18 or above) installed - download Node.js here
  4. OpenAI account and API key – make an OpenAI account here and get an OpenAI API Key here
  5. ngrok, a handy utility to connect the development version of our Python application running on your machine to a public URL that Twilio can access.

ngrok is needed for the development version of the application because your computer is likely behind a router or firewall, so it isn’t directly reachable on the Internet. You can also choose to automate ngrok as shown in this article.

Setup your Node.js project

Make a directory for this project, run npm init, and accept all the defaults.

mkdir langchain-audio-rec
cd langchain-audio-rec
npm init

Add a .env file containing your OpenAI API key:

OPENAI_API_KEY= {REPLACE-WITH-YOUR-OPENAI-API-KEY}

Record a phone call with Twilio Programmable Voice

Install some required packages that you'll use:

npm install -S express
npm install -S twilio

In a file called record.js, add the following code to record an inbound phone call to a Twilio phone number and transcribe the call:

"use strict";
const express = require('express');
const res = require('express/lib/response');
const VoiceResponse = require('twilio').twiml.VoiceResponse;
const bodyParser = require('body-parser')
const app = express();
app.use(bodyParser.urlencoded({ extended: false }));

let recentTranscription;
// Returns TwiML which prompts the caller to record a message
app.post('/record', (request, response) => {
    // Use the Twilio Node.js SDK to build an XML response
    const twiml = new VoiceResponse();
    twiml.say('Hello. Please leave a message after the beep.');
    // Use <Record> to record the caller's message, transcribe and pass to /handle_transcription
    twiml.record({
        transcribe: true,
        transcribeCallback: '/handle_transcription'});
    
    // End the call with <Hangup>
    twiml.hangup();
    // Render the response as XML in reply to the webhook request
    response.type('text/xml');
    response.send(twiml.toString());
});
app.post('/handle_transcription', (request, response) => {
    recentTranscription = request.body.TranscriptionText; //could store this in a database

})
app.get('/recent_transcription', (request, response) => {
    return response.json({recentTranscription});
})
// Create an HTTP server and listen for requests on port 3000
app.listen(3000);

Run the file with node record.js and in another terminal tab, run ngrok http 3000. Grab that forwarding URL so you can configure your purchased Twilio phone number to send a request to your web application.

ngrok terminal with forwarding URL

This callback mechanism is called a webhook and can be done in the number's configuration page as shown below.

Twilio phone number Voice configuration with ngrok URL

Click Save configuration. Now call your purchased Twilio number and the call will be recorded! I recorded myself saying Herb Brooks' speech in the movie Miracle.

Miracle gif of Herb Brooks speech with text "Great moments are born from great opportunity."

Afterwards, you can view and hear the recording in your Incoming Calls Log.

arrow pointing to Call SID showing the call recording in the Twilio console

In your ngrok tab, you can see the /record and /handle_transcription endpoints were hit.

ngrok terminal with HTTP requests: Post requests to /record twice with 200 ok, a POST /handle_transcription with no HTTP code, and a GET to recent_transcription with 200 ok

Now go to your local web server in the browser and append /recent_transcription (http://localhost:3000/recent_transcription). You can see the audio recording transcription there in the browser!

localhost:3000/recent_transcription viewed in the browser with an object "recentTranscription" whose key is the transcription of the Twilio phone call--in this case, it&#x27;s the Miracle speech
"type": "module",

In a new file called handle_transcription.js, add the following code importing OpenAI so we can use their models, LangChain's loadQAStuffChain to make a chain with the LLM, and Document so we can create a Document the model can read from the audio recording transcription:

import 'dotenv/config'; //"type": "module", in package.json
import { OpenAI } from "langchain/llms/openai";
import { loadQAStuffChain } from 'langchain/chains';
import { Document } from "langchain/document";

(async () => {
  const llm = new OpenAI({temperature:0.7}); //temperature optional parameter
  const chain = loadQAStuffChain(llm);
  const url = "http://localhost:3000/recent_transcription"; //your local server with the transcription
  const res = await fetch(url);
  const json = await res.json();
  const doc = new Document({ pageContent: json.recentTranscription });  

  const response = await chain.call({
    input_documents: [doc],
    question: "Great moments are born from what?", //insert question about audio recording transcription
  });
  console.log(response.text);
})();

Now, running the file (containing the speech from the movie Miracle) with node handle_transcription.js should yield the following output:

my terminal where I ran "node handle_transcription.js" and it prints "Great moments are born from opportunity and hard work"

Alternatively, you could upload the transcription as a .txt file to Twilio Assets to host it with a publicly-accessible URL.

The complete code can be found here on GitHub.

What's Next for Twilio Voice Transcriptions, Recordings, and LangChain.js

LangChain.js makes it easy to build apps around LLMs–you could easily replace the OpenAI LLM with another from, say, Hugging Face or Cohere. There's also a lot of fun to be had with Twilio Programmable Voice data. You could store phone call recordings in databases, analyze them, take note of patterns, et cetera. This Twilio blog post answers questions about Twilio Voice Recordings with LangChain.js and performs transcription with AssemblyAI.

Developers could perform Retrieval Augmented Generation (RAG) question-answering (QA) on podcasts, lectures, interviews, and so many other recordings. I can't wait to see what you build–let me know online what you're working on with AI!