How to Start an Instagram Food Account Using Twilio WhatsApp API, OpenAI's GPT-3 Engine, and Clarifai API

November 16, 2020
Written by
Diane Phan
Twilion
Reviewed by

header - How to Start an Instagram Food Account Using Twilio WhatsApp API, OpenAI's GPT-3 Engine, and Clarifai API

The camera eats first - it's the motto that fuels a lifestyle for some people, especially those on Instagram. Nowadays, people make separate social media accounts dedicated to the food they eat - whether it's homemade, fine dining, or simply snacking. Many people often seek a huge following on their food-dedicated Instagram accounts too because that gives them the opportunity to gain potential sponsorships, shoutouts, and just the satisfaction of knowing that people are interested in sharing the love for food.

However, there's often pressure of crafting the perfect post. Social media influencers go through great lengths to capture the most appealing photo or to come up the wittiest and most eye-catching caption to go with the photo. There's a lot of creative effort that goes into maintaining a social media presence. Wouldn't it be nice if you had someone else write the captions for you so that you can focus on eating delicious food?

In this article, we'll walk through how you can develop a functional Python program that generates Instagram worthy captions for you to post along with your picture using OpenAI's GPT-3 engine. This app will take pictures that are sent to Twilio Programmable WhatsApp's API and classified using Clarifai's API to determine what kind of caption to give the picture.

gif demo of sending a picture to WhatsApp and generating a caption

Wow, look who remembered Twilio's birthday week ;)

Tutorial Requirements

  • Python 3.6 or newer. If your operating system does not provide a Python interpreter, you can go to python.org to download an installer.
  • An OpenAI API key. Request beta access here.
  • A free or paid Twilio account. If you are new to Twilio get your free account now! (If you sign up through this link, Twilio will give you $10 credit when you upgrade.)
  • ngrok, a handy utility to connect the development version of our Python application running on your system to a public URL that Twilio can connect to. This is necessary for the development version of the application because your computer is likely behind a router or firewall, so it isn’t directly reachable on the Internet. You can also choose to automate ngrok as shown in this article.
  • A Clarifai account. Sign up for a free account to generate an API key.

Configuration

Since we will be installing some Python packages for this project, we will need to make a new project directory and a virtual environment.

If you are using a Unix or Mac OS system, open a terminal and enter the following commands to do the tasks described above:

$ mkdir foodiecaptioner
$ cd foodiecaptioner
$ python3 -m venv venv
$ source venv/bin/activate
(venv) $ pip install openai twilio flask python-dotenv clarifai

For those of you following the tutorial on Windows, enter the following commands in a command prompt window:

$ md foodiecaptioner
$ cd foodiecaptioner
$ python -m venv venv
$ venv\Scripts\activate
(venv) $ pip install openai twilio flask python-dotenv clarifai

The last command uses pip, the Python package installer, to install the five packages that we are going to use in this project, which are:

Set the OpenAI API Key

As mentioned above, this project requires an API key from OpenAI. During the time of this article, the only way to obtain the API key is by being accepted into their private beta program.

If you have access to the Beta page, the API key can be found in the Authentication tab in the Documentation.

OpenAI Beta Documentation Authentication page with API key

The Python application will need to have access to this key, so we are going to create a .env file where the API key will be safely stored. The application we write will be able to import the key as an environment variable later.

Create a .env file in your project directory (note the leading dot) and enter a single line of text containing the following:

OPENAI_KEY=<YOUR-OPENAI-KEY>

Make sure that the OPENAI_KEY is safe and that you do not expose the .env file in a public location.

Configure the Twilio WhatsApp Sandbox

We'll be setting up a webhook to the Twilio WhatsApp Sandbox as we go through the tutorial in order to see if the WhatsApp message data makes it into the database. If you haven't already, log onto the Twilio Dashboard to view your Programmable Messaging dashboard. There is a section on the page that says "Building with WhatsApp? Get started here". Click on the link to learn how to set up your sandbox.

The sandbox is provided by Twilio, however, once you complete your app, you can request production access for your Twilio phone number.

Twilio Sandbox for WhatsApp

Use your smartphone to send a WhatsApp message of the phrase to your assigned WhatsApp number. If you are successful, you should receive a message as shown below.

Twilio sandbox confirmation message

Authenticate against Twilio and Clarifai Services

Next, we need to safely store some important credentials that will be used to authenticate against the Twilio and Clarifai services.

Create a file named .env in your working directory and paste the following text:


TWILIO_ACCOUNT_SID=<your Twilio account SID>
TWILIO_AUTH_TOKEN=<your Twilio auth token>
CLARIFAI_API_KEY=<your Clarifai API Key>

Look for the TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN variables on the Twilio Console and add it to the .env file.

Twilio Account Credentials

To use the Clarifai API, you need to make an account and create an application in order to generate an API key for your project. Once your account is created, add the API key to the .env file as well.

Set up a development Flask server

Make sure that you are currently in the virtual environment of your project directory. Since we will be utilizing Flask throughout the project, we will need to set up the development server. Add a .flaskenv file (make sure you have the leading dot) to your project with the following lines:

FLASK_APP=app.py
FLASK_ENV=development

These incredibly helpful lines will save you time when it comes to testing and debugging        your project.

  • FLASK_APP tells the Flask framework where our application is located
  • FLASK_ENV configures Flask to run in debug mode

These lines are convenient because every time you save the source file, the server will reload and reflect the changes.

Then, type flask run in your terminal to start the Flask framework.

terminal showing the output of "flask run" command. flask is running with environment on development

The screenshot above displays what your console will look like after running the command flask run. The service is running privately on your computer’s port 5000 and will wait for incoming connections there. You will also notice that debugging mode is active. When in this mode, the Flask server will automatically restart to incorporate any further changes you make to the source code.

Set up a webhook with Twilio

Since this is a tutorial to create a WhatsApp chat bot, we will need to use a webhook (web callback) to allow real-time data to be delivered to our application by Twilio.

Open up another terminal window and navigate to the "foodiecaptioner" project directory if you are not already there. While Flask is running in one terminal window, start ngrok with the following command to temporarily enable the Flask service publicly over the Internet:

$ ngrok http 5000

Ngrok is a great tool because it allows you to create a temporary public domain that redirects HTTP requests to our local port 5000.

image showing the output of running the "ngrok http 5000" command with forwarding URLS

Your ngrok terminal will now look like the picture above. As you can see, there are URLs in the “Forwarding” section. These are public URLs that ngrok uses to redirect requests into our Flask server.

Copy the URL starting with https:// to the clipboard and then return to the Twilio Console. Navigate to the Programmable Messaging dashboard and look at the sidebar for Programmable Messaging to find WhatsApp Sandbox Settings under the Settings option. This is where we tell Twilio to send incoming message notifications to this URL.

Paste the URL copied from the ngrok session into the “WHEN A MESSAGE COMES IN” field and append /webhook, since that is going to be the endpoint that we will write later in the Python application. Here is my example for reference:

screenshot of ngrok URL inside the text field for the Twilio WhatsApp sandbox

The URL from ngrok in my example is https://ad7e4814affe.ngrok.io/webhook but again, yours will be different.

Before you click on the “Save” button at the very bottom of the page, make sure that the request method is set to HTTP POST.

Integrate Clarifai API to your application

This project is a fun opportunity to test out the Clarifai API and see how it works against the user inputs. Using computer vision and artificial intelligence, the Clarifai API is able to scrape and analyze the image to return the tags or "concepts" associated with the image. This API will be used to help our app identify what's going on in the picture so that we can generate a relevant Instagram caption for it.

With that said, let’s create a new Python file. I created image_classifier.py to store the code that uses Clarifai’s API. Copy the following code into the file you just created:

import os
from dotenv import load_dotenv
from clarifai.rest import ClarifaiApp

load_dotenv()
CLARIFAI_API_KEY = os.environ.get('CLARIFAI_API_KEY')
app = ClarifaiApp(api_key=CLARIFAI_API_KEY)

def get_picture_tags(image_url):
    response_data = app.tag_urls([image_url])
    relevant_tags = {}   
    for concept in response_data['outputs'][0]['data']['concepts']:
        relevant_tags[concept['name']] = 1
    return relevant_tags.keys()

The get_picture_tags is a function that will make a request to the Clarifai API so that the picture sent in through WhatsApp can be analyzed. The response_data is parsed so that only the tags for the picture are saved in the relevant_tags list. These descriptive tags will have a 1 set to them however the value doesn't really matter. The most important part is the key which will be passed into another file that handles the requests made to OpenAI's GPT-3 engine so that a caption can be generated from the picture. Alternatively, you can use another data structure to store all the tags however using a dictionary allows you to expand on the project if you need to, especially if you need to detect a particular word. 

Let's try out the Clarifai code really quick so you can see how impressive it is! Start the Python shell directly in your terminal and find a URL of a picture on the Internet that you would like to test the API on. Copy all the code above in the image_classifier.py file but call the get_picture_tags function at the end, outside of the definition.

Here's an example of an image I used and what the Python shell looks like:

>>> import os
>>> from dotenv import load_dotenv
>>> from clarifai.rest import ClarifaiApp
>>> 
>>> load_dotenv()
True
>>> CLARIFAI_API_KEY = os.environ.get('CLARIFAI_API_KEY')
>>> app = ClarifaiApp(api_key=CLARIFAI_API_KEY)
>>> 
>>> def get_picture_tags(image_url):
...     response_data = app.tag_urls([image_url])
...     relevant_tags = {}   #dictionary data structure for faster lookup time 
...     for concept in response_data['outputs'][0]['data']['concepts']:
...         relevant_tags[concept['name']] = 1
...     return relevant_tags.keys()
... 
>>> get_picture_tags("https://s3-media0.fl.yelpcdn.com/bphoto/sRyW5Go5dJLJjTEvvjC76g/o.jpg")
dict_keys(['dinner', 'soup', 'food', 'no person', 'lunch', 'meat', 'curry', 'pork', 'vegetable', 'dish', 'meal', 'bowl', 'hot', 'chicken', 'rice', 'chili', 'delicious', 'broth', 'parsley', 'cooking'])

Impressive classification huh? The Clarifai API was able to recognize the image of bun rieu from Pho Saigon's Yelp page and even included the tag "delicious" in the list of relevant tags which is definitely true.

Write captions with OpenAI GPT-3's engine  

The OpenAI playground allows users to explore GPT-3 (Generative Pre-trained Transformer 3), a highly advanced language model that is capable of generating written text that sounds like an actual human worked on it. This powerful model can also read a user's input and learn about the context of the prompt to determine how it should generate more text in the same writing style.

How the GPT-3 engine will work in this case is that we will have to provide the AI with some material to work with. This is the time for the foodies and social media experts to pull up your favorite posts on Instagram. Look for popular hashtags and pictures of influencers and feed the captions you really like into the engine.

Here is an example of a string of text that will be passed to the engine. A variable named session_prompt will provide an instructional sentence before showing the format of how you, the user, will interact with the app.  

session_prompt="""
Below are some witty fun descriptions for Instagram pictures based on the tags describing the pictures.

The tags for this picture are: { }
Fun description: There is no picture.

The tags for this picture are:  {'food', 'sweet', 'chocolate', 'sugar', 'cake', 'milk', 'delicious', 'cup', 'candy', 'no person', 'breakfast', 'baking', 'party', 'cream', 'vacation', 'Christmas', 'coffee', 'table', 'color', 'cookie'}
Fun description: Loving this delicious Christmas dessert platter this year! Happy Holidays everyone! 

The tags for this picture are:  {'food', 'sweet', 'chocolate', 'sugar', 'cake', 'milk', 'delicious', 'cup', 'candy', 'no person', 'breakfast', 'baking', 'party', 'cream', 'vacation'}
Fun description: Took myself on vacation to enjoy some fancy chocolate. A girl's best friend!

The tags for this picture are:  {'food', 'sweet', 'chocolate', 'sugar', 'cake', 'milk', 'delicious', 'breakfast', 'baking', 'party', 'cream', 'vacation'}
Fun description: A perfectly small cake that I baked for my friends birthday!

The tags for this picture are:  {'hot', 'cheetos', 'snack', 'yummy', 'junk', 'food', 'delicious', 'vacation'}
Fun description: I was so delighted when I found these cheetos that tasted exactly like pepperoni pizza!
"""

As you can see, the input is presented on the line that says "The tags for this picture are:". Since the format of the tags created from the Clarifai API are stored in list format with curly brackets and delimiters, we have to show the engine some examples of tags in the same exact format. Feel free to create random tags based on your favorite food picture, or a nice picture that you saw on social media. Be sure to also write an example of how you want the "Fun description" should say below so that OpenAI GPT-3's engine can have an idea of what future captions should look like based on the particular tags provided.

Great! It's time to put this information into code. Create a file named caption_generator.py and copy and paste the following code:

from dotenv import load_dotenv
import os
from random import choice
import openai
from flask import Flask, request

load_dotenv()
openai.api_key = os.environ.get('OPENAI_KEY')
completion = openai.Completion()

start_sequence = "\nFun description:"
restart_sequence = "\n\nThe tags for this picture are:"
session_prompt=<INSERT_YOUR_OWN>

def generate_caption(picture_tags):
    prompt_text = f'{session_prompt}{restart_sequence}: {picture_tags}{start_sequence}:'
    response = openai.Completion.create(
      engine="davinci",
      prompt=prompt_text,
      temperature=0.7,
      max_tokens=64,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0.3,
      stop=["\n"],
    )
    caption = response['choices'][0]['text']
    return str(caption)

Be sure to replace session_prompt with the one provided earlier or make up your own. It is essential that you keep the format of the example session_prompt.

Notice that the values for the variables start_sequence and restart_sequence match up with the ones in the session_prompt. As mentioned, this is how you maintain interaction with the OpenAI GPT-3 engine. You, the user, will send in a picture through WhatsApp. That picture will be sent to the Clarifai API to describe a list of tags related to the picture. Those tags are sent to the generate_caption function and will follow the conventions of prompt_text which essentially puts all the variables together in order to generate content.

After setting the value for session_prompt, this function calls the openai.Completion.create() method on the OpenAI client and passes to it a series of arguments that customize the engine’s response, including the new prompt. The max_tokens variable, which stands for either a word or punctuation mark, was set to 64 so that the length of the caption will be appropriate - not too long, not too short. You can read more about the GPT-3 customization options in the Ultimate Guide to OpenAI-GPT3 Language Model or explore the OpenAI Playground for yourself.

If you're curious to see how the code works, you can start the Python shell in the terminal and copy and paste the entire code from caption_generator.py there. You should also take the tags from the previous section with Clarifai and pass them into the generate_caption function. I won't copy the entire OpenAI GPT-3 code since it's above, but this an example of what should be at the bottom of the code, and a sample output that was generated:

>>> pic_tags = ['dinner', 'soup', 'food', 'no person', 'lunch', 'meat', 'curry', 'pork', 'vegetable', 'dish', 'meal', 'bowl', 'hot', 'chicken', 'rice', 'chili', 'delicious', 'broth', 'parsley', 'cooking']
>>> generate_caption(pic_tags)
' I am fascinated by the meticulous process that goes into making such a simple dish like chicken soup.'

Seems like OpenAI GPT-3's engine thinks this is a chicken soup instead of a Vietnamese soup dish, but at least the computer created a caption for us! If you want to train OpenAI to understand cultural foods, make sure to add more information to the `session_prompt` variable. 

Now this brings us to the last part - connecting all these files together.

Build the main Instagram caption generator app

At this point we have two files - caption_generator.py and image_classifier.py - that define very important functions for the app to work. In order to call the functions in our main file, we will need to import them over.

Create a file named app.py and copy and paste the following code in order to import the functions and necessary modules to run the Flask app:

import os
from dotenv import load_dotenv
from flask import Flask, request
from twilio.twiml.messaging_response import MessagingResponse
from twilio.rest import Client
from image_classifier import get_picture_tags
from caption_generator import generate_caption

load_dotenv()
app = Flask(__name__)
client = Client()

It's time to write out the functions and webhook to make this application come to life. Here's the code that should be added under the client object:

def respond(message):
    response = MessagingResponse()
    response.message(message)
    return str(response)

@app.route('/webhook', methods=['POST'])
def reply():
    sender = request.form.get('From')
    media_msg = request.form.get('NumMedia')  
    message = request.form.get('Body').lower()
    if media_msg == '1':
        pic_url = request.form.get('MediaUrl0')  
        relevant_tags = get_picture_tags(pic_url)
        caption = generate_caption(relevant_tags)
        return respond(caption)
    else:
        return respond(f'Please send in a picture.')

As you can see, a new function respond() is created and called throughout the project. This function sends a response to the user. By calling this function, it also helps our app return the output to the user.

The webhook is short - the user will text in a picture that they want to generate a caption for. The pic_url is passed to the        get_picture_tags function defined from the image_classifier.py file. The results from that function are stored in relevant_tags which are then passed to the generate_caption function defined in the caption_generator.py file and finally returned to the user over WhatsApp.

Run the WhatsApp Picture Sharing App

It’s time to wrap things up and start generating captions for your social media account! You can check out my GitHub repository to make sure you have the full project.

Make sure you have one tab running flask and one tab running ngrok. If you closed it for any reason, start it again now with the following commands in their respective tabs.

bash
(venv) $ flask run

And in the second tab:

bash
$ ngrok http 5000

Furthermore, make sure that your ngrok webhook URL is updated inside the Twilio Sandbox for WhatsApp. Each time you restart ngrok, the URL changes, so you will have to replace the URL. Remember to add the /webhook at the end of the ngrok forward URL.

And now the fun begins! Get your WhatsApp enabled mobile devices and text your WhatsApp number. Be careful, the OpenAI GPT-3 engine might not be the best and it might generate a caption that you don't like. If that happens, you can submit the same picture again until you find an idea for a caption that is Instagram-worthy!

Here's an example of my dinner - think this will get 2,000 likes?

caption generator app creating an Instagram caption for the picture of aloha curry

If you liked seeing the tags that the Clarifai API had to offer, you can parse the strings from the relevant_tags and append them to a hashtags array so that they are included in your social media posts. Here's an example of what your app can also do:

screenshot of the caption generator app analyzing a picture of the curry and creating a caption for it along with relevant hashtags

Conclusion: Building a Caption Generator Application

Congratulations on finishing this caption generator application, and most importantly, good luck on your journey to 1,000+ likes! Who knows, maybe the captions from the app aren't the best but it can inspire you to create your own creative captions.

This simple WhatsApp tutorial is just one of the many fun projects you can do using Twilio API, Clarifai, OpenAI GPT-3, and of course, Python and Flask tools.

Perhaps you can take this project a step further by playing around with the Clarifai API and make sure to detect and reject NSFW photos or make your application work for only food pictures instead of anything else.

What’s next for OpenAI GPT-3 projects?

If you're hungry to build more, try out these ideas:

Let me know what's cooking in your kitchen or on your computer by reaching out to me over email!

Diane Phan is a Developer for technical content on the Twilio Voices team. She loves to help beginner programmers get started on creative projects that involve fun pop culture references. She can be reached at dphan [at] twilio.com or LinkedIn.