Build an AI SMS Chatbot with Replicate, LLaMA 2, and LangChain

September 12, 2023
Written by
Reviewed by

Recently, Meta and Microsoft introduced the second generation of the LLaMA LLM (Large Language Model) to help developers and organizations to build generative AI-powered tools and experiences. Read on to learn how to build an AI SMS chatbot that answers questions like Ahsoka (from Star Wars) using LangChain templating, LLaMa 2, Replicate, and Twilio Programmable Messaging!

SMS example where I ask "Write me an inspirational limerick pertaining to Star Wars, please" and the response is "May the force be with you, young Padawan! Here's an inspirational limerick for you: "A long time ago, in a galaxy far away, A hero's journey began, with a brave young soul. With wisdom and courage, they faced each test, And the force, it did make their spirit whole." Remember, the force is with you, always. May you find your own inner strength and wisdom on your own journey."

Do you prefer learning via video more? Check out this TikTok summarizing this tutorial in 1 minute!


  1. A Twilio account - sign up for a free Twilio account here
  2. A Twilio phone number with SMS capabilities - learn how to buy a Twilio Phone Number here
  3. Replicate account to host the LlaMA 2 model – make a Replicate account here
  4. Python installed - download Python here
  5. ngrok, a handy utility to connect the development version of our Python application running on your machine to a public URL that Twilio can access.

ngrok is needed for the development version of the application because your computer is likely behind a router or firewall, so it isn’t directly reachable on the Internet. You can also choose to automate ngrok as shown in this article.


Replicate offers a cloud API and tools so you can more easily run machine learning models, abstracting away some lower-level machine learning concepts and handling infrastructure so you can focus more on your own applications. You can run open-source models that others have published, or package and publish your own, either publicly or privately.


Since you will be installing some Python packages for this project, you will need to make a new project directory and a virtual environment.

If you're using a Unix or macOS system, open a terminal and enter the following commands:

mkdir replicate-llama-ai-sms-chatbot  
cd replicate-llama-ai-sms-chatbot  
python3 -m venv venv 
source venv/bin/activate 
pip install langchain replicate flask twilio

If you're following this tutorial on Windows, enter the following commands in a command prompt window:

mkdir replicate-llama-ai-sms-chatbot  
cd replicate-llama-ai-sms-chatbot   
python -m venv venv 
pip install langchain replicate flask twilio
Replicate API token console screenshot

On the command line run

export REPLICATE_API_TOKEN={replace with your api token}

Now it's time to write some code!


Make a file called and place the following import statements at the top.

from flask import Flask, request
from langchain import LLMChain, PromptTemplate
from langchain.llms import Replicate
from langchain.memory import ConversationBufferWindowMemory
from twilio.twiml.messaging_response import MessagingResponse

Though LLaMA 2 is tuned for chat, templates are still helpful so the LLM knows what behavior is expected of it. This starting prompt is similar to ChatGPT so it should behave similarly.

template = """Assistant is a large language model.

Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.

Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.

Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist. 

I want you to act as Ahsoka giving advice and answering questions. You will reply with what she would say.
SMS: {sms_input}

prompt = PromptTemplate(input_variables=["sms_input"], template=template)

Next, make a LLM Chain, one of the core components of LangChain. This allows us to chain together prompts and make a prompt history. The model is formatted as the model name followed by the version–in this case, the model is LlaMA 2, a 13-billion parameter language model from Meta fine-tuned for chat completions. max_length is 4096, the maximum number of tokens (called the context window) the LLM can accept as input when generating responses.

sms_chain = LLMChain(
    llm = Replicate(model="a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5"), 
    llm_kwargs={"max_length": 4096}

Finally, make a Flask app to accept inbound text messages, pass that to the LLM Chain, and return the output as an outbound text message with Twilio Programmable Messaging.

app = Flask(__name__)

@app.route("/sms", methods=['GET', 'POST'])
def sms():
    resp = MessagingResponse()
    inb_msg = request.form['Body'].lower().strip()
    output = sms_chain.predict(sms_input=inb_msg)
    return str(resp)

if __name__ == "__main__":

On the command line, run python to start the Flask app.

Configure a Twilio Number for the SMS Chatbot

Now, your Flask app will need to be visible from the web so Twilio can send requests to it. ngrok lets you do this. With ngrok installed, run ngrok http 5000 in a new terminal tab in the directory your code is in.


You should see the screen above. Grab that ngrok Forwarding URL to configure your Twilio number: select your Twilio number under Active Numbers in your Twilio console, scroll to the Messaging section, and then modify the phone number’s routing by pasting the ngrok URL with the /sms path in the textbox corresponding to when A Message Comes In as shown below:

configure phone #

Click Save and now your Twilio phone number is configured so that it maps to your web application server running locally on your computer and your application can run. Text your Twilio number a question relating to the text file and get an answer from that file over SMS!

SMS example where I ask "A poem about Yoda, please" and the response SMS back is "Greetings, young Padawan! Ahsoka Tano here, ready to assist. What would you like to know or discuss? As for a poem about Yoda, how about this: "A master of the Force, wise and old Yoda stands, with a heart of gold His wisdom and teachings, we all hold dear A Jedi Master, without fear" What do you think? Would you like me to elaborate or provide more information on Yoda or the Jedi Order?"

You can view the complete code on GitHub here.

What's Next for Twilio, LangChain, Replicate, and LLaMA 2?

There is so much fun for developers to have around building with LLMs! You can modify existing LangChain and LLM projects to use LLaMA 2 instead of GPT, build a web interface using Streamlit instead of SMS, fine-tune LLaMA 2 with your own data, and more! I can't wait to see what you build–let me know online what you're working on!