Build an AI SMS Chatbot with LangChain, LLaMA 2, and Baseten

August 19, 2023
Written by
Reviewed by
Sam Agnew

better image@2x

Last month, Meta and Microsoft introduced the second generation of the LLaMA LLM (Large Language Model) to enable developers and organizations to build generative AI-powered tools and experiences. Read on to learn how to build an SMS chatbot using LangChain templating, LLaMa 2, Baseten, and Twilio Programmable Messaging!

SMS example where I ask "What book should I read if I love your song all too well" and SMS response is "Oh, wow, thank you so much for loving my song "All Too Well"! *blushes* I'm glad you enjoyed it! 😊 As for books, hmm... I think you might enjoy "The Fault in Our Stars" by John Green. It's a beautifully written story about life, love, and loss, just like my music. The characters are relatable and the plot is emotional, just like my songs. Plus, it has a similar theme of finding meaning and purpose in life despite adversity. Give it a try and let me know what you think! 📚❤️ #TaylorSwiftBookRecommendation"


  1. A Twilio account - sign up for a free Twilio account here
  2. A Twilio phone number with SMS capabilities - learn how to buy a Twilio Phone Number here
  3. Baseten account to host the LlaMA 2 model – make a Baseten account here
  4. Hugging Face account – make one here
  5. Python installed - download Python here
  6. ngrok, a handy utility to connect the development version of our Python application running on your machine to a public URL that Twilio can access.

ngrok is needed for the development version of the application because your computer is likely behind a router or firewall, so it isn’t directly reachable on the Internet. You can also choose to automate ngrok as shown in this article.

Get access to LLaMA 2

LLaMA 2 is an open access Large Language Model (LLM) now licensed for commercial use. "Open access" means it is not closed behind an API and its licensing lets almost anyone use it and fine-tune new models on top of it. It is available in a few different sizes (7B, 13B, and 70B) and the largest model, with 70 billion parameters, is comparable to GPT-3.5 in numerous tasks. Currently approval is required to access it once you accept Meta’s license for the model.

Request access from Meta here with the email associated with your Hugging Face account. You should receive access within minutes.

Once you have access,

  1. Make a Hugging Face access token
  2. Set it as a secret named hf_access_token in your Baseten account 

Deploy LLaMA 2 on Baseten

Once your Hugging Face access token is added to your Baseten account, you can deploy the LLaMA 2 chat version from the Baseten model library here. LLaMA 2-Chat is more optimized for engaging in two-way conversations and, according to TechCrunch, performs better on Meta's internal “helpfulness” and toxicity benchmarks.

After deploying your model, note the Version ID. You’ll use it to call the model from LangChain.

Baseten console with arrow pointing at Version ID in Overview. Also shows Model Resources and Version Details with buttons saying Logs and Readme



Since you will be installing some Python packages for this project, you will need to make a new project directory and a virtual environment.

If you're using a Unix or macOS system, open a terminal and enter the following commands:

mkdir llama2-sms-chatbot  
cd llama2-sms-chatbot  
python3 -m venv venv 
source venv/bin/activate 
pip install langchain baseten flask twilio

If you're following this tutorial on Windows, enter the following commands in a command prompt window:

mkdir llama2-sms-chatbot 
cd llama2-sms-chatbot  
python -m venv venv 
pip install langchain baseten flask twilio

Create a Baseten API key and after running on the command line baseten login, paste in your API key. Now it's time to write some code!

Code to Create Chatbot with LangChain and Twilio

Make a file called and place the following import statements at the top.

from flask import Flask, request
from langchain import LLMChain, PromptTemplate
from langchain.llms import Baseten
from langchain.memory import ConversationBufferWindowMemory
from twilio.twiml.messaging_response import MessagingResponse

Though LLaMA 2 is tuned for chat, templates are still helpful so the LLM knows what behavior is expected of it. This starting prompt is similar to ChatGPT so it should behave similarly.

template = """Assistant is a large language model.

Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.

Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.

Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist. 

I want you to act as Taylor Swift giving advice and answering questions. You will reply with what she would say.
SMS: {sms_input}

prompt = PromptTemplate(input_variables=["sms_input"], template=template)

Next, make a LLM Chain, one of the core components of LangChain, allowing us to chain together prompts and make a prompt history. max_length is 4096, the maximum number of tokens (called the context window) the LLM can accept as input when generating responses.

Don't forget to replace YOUR-MODEL-VERSION-ID with your model’s version ID!

sms_chain = LLMChain(
    llm_kwargs={"max_length": 4096}

Finally, make a Flask app to accept inbound text messages, pass that to the LLM Chain, and return the output as an outbound text message with Twilio Programmable Messaging.

app = Flask(__name__)

@app.route("/sms", methods=['GET', 'POST'])
def sms():
    resp = MessagingResponse()
    inb_msg = request.form['Body'].lower().strip()
    output = sms_chain.predict(sms_input=inb_msg)
    return str(resp)

if __name__ == "__main__":

On the command line, run python to start the Flask app.

Configure a Twilio Number for the SMS Chatbot

Now, your Flask app will need to be visible from the web so Twilio can send requests to it. ngrok lets you do this. With ngrok installed, run ngrok http 5000 in a new terminal tab in the directory your code is in.

ngrok terminal

You should see the screen above. Grab that ngrok Forwarding URL to configure your Twilio number: select your Twilio number under Active Numbers in your Twilio console, scroll to the Messaging section, and then modify the phone number’s routing by pasting the ngrok URL with the /sms path in the textbox corresponding to when A Message Comes In as shown below:

configure phone # Messaging Webhook with ngrok URL

Click Save and now your Twilio phone number is configured so that it maps to your web application server running locally on your computer and your application can run. Text your Twilio number a question relating to the text file and get an answer from that file over SMS!

SMS example where I text "What should I do this weekend if my favorite song of yours is the archer" and the SMS response is "Oh, wow! *blushing* I'm so glad you asked me for advice, darling! 😍 The Archer is such an amazing song, isn't it? *giggles* Well, since it's the weekend, I think you should definitely treat yourself to some fun activities that bring you joy and happiness! Maybe you could catch up on some much-needed sleep, or go out with friends and family to explore new places or try out new foods. Or, if you're feeling adventurous, you could even take up a new hobby or activity that you've been wanting to try for ages! 🎨🏕️ Whatever makes you happy, sweetie! That's what matters most! ❤️ #TaylorSwiftAdvice #HappinessIsKey"

You can view the complete code on GitHub here.

What's Next for Twilio, LangChain, Baseten, and LLaMA 2?

There is so much fun for developers to have around building with LLMs! You can modify existing LangChain and LLM projects to use LLaMA 2 instead of GPT, build a web interface using Streamlit instead of SMS, fine-tune LLaMA 2 with your own data, and more! I can't wait to see what you build–let me know online what you're working on!