Build a Question-Answering over Docs SMS Bot with LangChain in Python

June 15, 2023
Written by
Twilion
Reviewed by
Twilion

header image for langchain question answering over docs sms bot

With Natural Language Processing (NLP), you can chat with your own documents, such as a text file, a PDF, or a website. Read on to learn how to build a generative question-answering SMS chatbot that reads a document containing Lou Gehrig's Farewell Speech using LangChain, Hugging Face, and Twilio in Python.

me asking questions about Lou Gehrig's Farewell Speech like "what is a blessing" and the bot answering "When you have a father and a mother who work all their lives so you can have an education and build your body"

LangChain Q&A

LangChain is an open-source tool that wraps around many large language models (LLMs) and tools. It is the easiest way (if not one of the easiest ways) to interact with LLMs and build applications around LLMs.

LangChain makes it easy to perform question-answering of those documents. Picture feeding a PDF or maybe multiple PDF files to a machine and then asking it questions about those files. This could be useful, for example, if you have to prepare for a test and wish to ask the machine about things you didn’t understand.

Prerequisites

  1. A Twilio account - sign up for a free Twilio account here
  2. A Twilio phone number with SMS capabilities - learn how to buy a Twilio Phone Number here
  3. Hugging Face Account – make a Hugging Face Account here
  4. Python installed - download Python here
  5. ngrok, a handy utility to connect the development version of our Python application running on your machine to a public URL that Twilio can access.

ngrok is needed for the development version of the application because your computer is likely behind a router or firewall, so it isn’t directly reachable on the Internet. You can also choose to automate ngrok as shown in this article.

Configuration

Since you will be installing some Python packages for this project, you will need to make a new project directory and a virtual environment.

If you're using a Unix or macOS system, open a terminal and enter the following commands:

mkdir lc-qa-sms 
cd lc-qa-sms 
python3 -m venv venv 
source venv/bin/activate 
!pip install langchain
!pip install requests
pip install flask
pip install faiss-cpu
pip install sentence-transformers
pip install twilio
pip install load_dotenv

If you're following this tutorial on Windows, enter the following commands in a command prompt window:

mkdir lc-qa-sms  
cd lc-qa-sms 
python -m venv venv 
venv\Scripts\activate 
pip install langchain
pip install requests
pip install flask
pip install faiss
pip install sentence-transformers
pip install twilio
pip install load_dotenv

Set up Hugging Face Hub

The Hugging Face Hub offers over 120k models, 20k datasets, and 50k demos people can easily collaborate in their ML workflows. As mentioned earlier, this project needs a Hugging Face Hub Access Token to use the LangChain endpoints to a Hugging Face Hub LLM. After making a Hugging Face account, you can get a Hugging Face Access Token here by clicking on New token. Give the token a name and select the Read Role.

 

hugging face hug webpage to get access token

Alternatively, you could use models from, say, OpenAI.

On the command line in your root directory, run the following command on a Mac to set the token as an environment variable.

export HUGGINGFACEHUB_API_TOKEN=replace-with-your-huggingfacehub-token

For the Windows command, check out this blog post on environment variables.

Now let's build that LangChain question-answering chatbot application.

Answer Questions from a Doc with LangChain via SMS

Inside your lc-qa-sms directory, make a new file called app.py.

At the top of the file, add the following lines to import the required libraries.

import requests
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains.question_answering import load_qa_chain
from langchain import HuggingFaceHub
from flask import Flask, request, redirect
from twilio.twiml.messaging_response import MessagingResponse

Beneath those import statements, make a helper function to write to a local file from a URL, and then load that file from the local file storage with LangChain's TextLoader library. Later the file will be passed over to the split method to create chunks.

def loadFileFromURL(text_file_url):
    output_file = "lougehrig.txt"
    resp = requests.get(text_file_url)
    with open(output_file, "w",  encoding='utf-8') as file:
      file.write(resp.text)

    # load text doc from URL w/ TextLoader
    loader = TextLoader('./'+output_file)
    txt_file_as_loaded_docs = loader.load()
    return txt_file_as_loaded_docs

In this tutorial, the file will be Lou Gehrig's famous speech, which is why the output_file is named "lougehrig.txt".

lou gehrig farewell speaking in front of camera on field yankees

You could alternatively use a local file.

Next, add the following code for a helper function to split the document into chunks. This is important because LLMs can't process inputs that are too long. LangChain's CharacterTextSplitter function helps us do this–setting chunk_size to 1000 and chunk_overlap to 10 keeps the integrity of the file by avoiding splitting words in half.

def splitDoc(loaded_docs):
    # split docs into chunks
    splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=10)
    chunked_docs = splitter.split_documents(loaded_docs)
    return chunked_docs

Now convert the chunked document into embeddings (numerical representations of words) with Hugging Face and store them in a FAISS Vector Store. Faiss is a library for efficient similarity search and clustering of dense vectors.

def makeEmbeddings(chunked_docs):
    # Create embeddings and store them in a FAISS vector store
    embedder = HuggingFaceEmbeddings()
    vector_store = FAISS.from_documents(chunked_docs, embedder)
    return vector_store

That makeEmbeddings function will make it more efficient to retrieve and manipulate the stored embeddings to conduct a similarity search to get the most semantically-similar documents to a given input which the LLM needs to best answer questions in the following helper function.

def askQs(vector_store, chain, q):
    # Ask a question using the QA chain
    similar_docs = vector_store.similarity_search(q)
    resp = chain.run(input_documents=similar_docs, question=q)
    return resp

The final helper function defines and loads the Hugging Face Hub LLM to be used with your Access Token and starts the request on a similarity search embedded with some input question to the selected LLM, enabling a question-and-answer conversation.

def loadLLM():
    llm=HuggingFaceHub(repo_id="declare-lab/flan-alpaca-large", model_kwargs={"temperature":0, "max_length":512})
    chain = load_qa_chain(llm, chain_type="stuff")
    return chain

declare-lab/flan-alpaca-large is a LLM. You could use others from Hugging Face Hub such as google-flan-t5-xl. They are trained on different corpuses and this tutorial uses flan-alpaca-large because it's faster.

Lastly, make a Flask app using the Twilio REST API to call the helper functions and respond to inbound text messages with information pulled from the text file.

app = Flask(__name__)
@app.route('/sms', methods=['POST'])
def sms():
    resp = MessagingResponse()
    inb_msg = request.form['Body'].lower().strip() #get inbound text body
    chain = loadLLM()
    LOCAL_ldocs = loadFileFromURL('https://raw.githubusercontent.com/elizabethsiegle/qanda-langchain-sms-lougehrig/main/lougehrig.txt')
    LOCAL_cdocs = splitDoc(LOCAL_ldocs) #chunked
    LOCAL_vector_store = makeEmbeddings(LOCAL_cdocs)
    LOCAL_resp = askQs(LOCAL_vector_store, chain, inb_msg)
    resp.message(LOCAL_resp)
    return str(resp)

if __name__ == "__main__":
    app.run(debug=True)

Your complete app.py file should look like this:

import requests
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains.question_answering import load_qa_chain
from langchain import HuggingFaceHub
from flask import Flask, request, redirect
from twilio.twiml.messaging_response import MessagingResponse

def loadFileFromURL(text_file_url): #param: https://raw.githubusercontent.com/elizabethsiegle/qanda-langchain-sms-lougehrig/main/lougehrig.txt
    output_file = "lougehrig.txt"
    resp = requests.get(text_file_url)
    with open(output_file, "w",  encoding='utf-8') as file:
      file.write(resp.text)

    # load text doc from URL w/ TextLoader
    loader = TextLoader('./'+output_file)
    txt_file_as_loaded_docs = loader.load()
    return txt_file_as_loaded_docs

def splitDoc(loaded_docs):
    # split docs into chunks
    splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=10)
    chunked_docs = splitter.split_documents(loaded_docs)
    return chunked_docs

def makeEmbeddings(chunked_docs):
    # Create embeddings and store them in a FAISS vector store
    embedder = HuggingFaceEmbeddings()
    vector_store = FAISS.from_documents(chunked_docs, embedder)
    return vector_store

def askQs(vector_store, chain, q):
    # Ask a question using the QA chain
    similar_docs = vector_store.similarity_search(q)
    resp = chain.run(input_documents=similar_docs, question=q)
    return resp

def loadLLM():
    llm=HuggingFaceHub(repo_id="declare-lab/flan-alpaca-large", model_kwargs={"temperature":0, "max_length":512})
    chain = load_qa_chain(llm, chain_type="stuff")
    return chain


app = Flask(__name__)
@app.route('/sms', methods=['GET', 'POST'])
def sms():
    resp = MessagingResponse()
    inb_msg = request.form['Body'].lower().strip() #get inbound text body
    chain = loadLLM()
    LOCAL_ldocs = loadFileFromURL('https://raw.githubusercontent.com/elizabethsiegle/qanda-langchain-sms-lougehrig/main/lougehrig.txt')
    LOCAL_cdocs = splitDoc(LOCAL_ldocs) #chunked
    LOCAL_vector_store = makeEmbeddings(LOCAL_cdocs)
    LOCAL_resp = askQs(LOCAL_vector_store, chain, inb_msg)
    resp.message(LOCAL_resp)
    return str(resp)

if __name__ == "__main__":
    app.run(debug=True)

On the command line, run python app.py to start the Flask app.

Configure a Twilio Number for the SMS Chatbot

Now, your Flask app will need to be visible from the web so Twilio can send requests to it. ngrok lets you do this. With ngrok installed, run ngrok http 5000 in a new terminal tab in the directory your code is in.

ngrok http 5000 returns URLs

 

You should see the screen above. Grab that ngrok Forwarding URL to configure your Twilio number: select your Twilio number under Active Numbers in your Twilio console, scroll to the Messaging section, and then modify the phone number’s routing by pasting the ngrok URL with the /sms path in the textbox corresponding to when A Message Comes In as shown below:

configure twilio phone # with ngrok url

 

Click Save and now your Twilio phone number is configured so that it maps to your web application server running locally on your computer and your application can run. Text your Twilio number a question relating to the text file and get an answer from that file over SMS!

sms example where i ask "what would lou gehrig give to beat the new york giants" and the bot responds with "lou gehrig would give his right arm to beat the New York Giants"
langchain logo

 

There's so much you can do with document-based question-answering. You can use a different LLM, use a longer document than a text file containing Lou Gehrig's famous speech, use other types of documents like a PDF or website (here's LangChain's docs on documents), store the embeddings elsewhere, and more. Other than a SMS chatbot, you could create an AI tutor, search engine, automated customer service agent, and more.

Let me know online what you're building!