Fun with Markov Chains, Python, and Twilio SMS

September 23, 2016
Written by

sms-python

One of the many allures of Twitter is that you can tweet at your favorite celebrity and (maybe) get a response. Still though, tweeting isn’t quite as intimate as trading text messages. So we thought it’d be fun to use Markov Chains, Programmable SMS, and Python to create a bot that impersonates your favorite Twitter personality. 

We could use the code below to create an SMS chat bot that sounds like anyone with a Twitter account. But to show off it’s true potential, we need to someone with a distinct and recognizable tweeting style. Someone with a huge personality. Someone who has the best words.

Someone like Donald Trump.

Ever wish that you could debate Trump? Drop him a text at: 847-55-TRUMP (847-558-7867).

texting-trump

 

There are three steps to create this bot:

  1. Download the tweets for a given user to create a corpus of text.
  2. Use the corpus to generate a sentence in the style of the Tweeter.
  3. Reply to a text message with that sentence.

To follow along, you’ll need Python, a Twitter account, and a free Twilio account.

Download All Tweets from a User

Before we get started, let’s give credit to Filip Hráček whose Automatic Donald Trump was the inspiration for this idea. Check out his post for an excellent explanation on how to implement Markov chains in Dart.

Markov chains begin with a corpus — a library of text to train your model. We’ll use a modified version of this tweet_dumper script to pull in down tweets from the Twitter API.

To get started, create and activate a new virtual environment:

virtualenv markov-venv
source markov-venv/bin/activate

Then go to wherever you keep your code, make a directory for your project and create a file called get_tweets.py .

mkdir markov-bot
cd markov-bot
touch get_tweets.py

Install the tweepy package to connect to the Twitter API:

pip install tweepy

In get_tweets.py, import tweepy, csv, and the regular expression packages:

import tweepy
import csv
import re

We’ll need Twitter credentials to access the API. Create a new app on the Twitter Application Manager. You can fill in any ol’ domain for the website and leave the callback URL blank. Once created, generate your credentials:

twitter-creds

Add those credentials to the bottom of your file (you’ll want to extract those creds to environment variables if you deploy this script, but hard-coding works for now):

consumer_key = "YOURCONSUMERKEY"
consumer_secret = "YOURCONSUMERSECRET"
access_key = "YOURACCESSKEY"
access_secret = "YOURACCESSSECRET"

We’ll create a function that pulls down all tweets for a given screen name. We have to do this iteratively, as the Twitter API only allows 200 tweets at a time. Also, we can only retrieve the 3,024 most recent tweets. Once we hit that limit, our new_tweets array will be blank and we’ll know to stop iterating.

Until then, we keep querying Twitter and adding to all_tweets. To make sure we’re getting the words straight from Trump’s fingertips, we’ll only keep the ones where tweet.source == 'Twitter for (delete that conditional if you create a bot for other tweeters).

To do all this, add this code to the bottom of get_tweets:

def get_all_tweets(screen_name):
    all_tweets = []
    new_tweets = []
 
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    client = tweepy.API(auth)
    new_tweets = client.user_timeline(screen_name=screen_name, count=200)
 
    while len(new_tweets) > 0:
        for tweet in new_tweets:
            if tweet.source == 'Twitter for Android':
                all_tweets.append(tweet.text.encode("utf-8"))
 
        print( "We've got %s tweets so far" % (len(all_tweets)))
        max_id = new_tweets[-1].id - 1
        new_tweets = client.user_timeline(screen_name=screen_name,
                                          count=200, max_id=max_id)
 
    return all_tweets

Add a function at the bottom of your file to strip out some miscellaneous text and make our replies feel more like text messages and less like tweets:

def clean_tweet(tweet):
    tweet = re.sub("https?\:\/\/", "", tweet)   #links
    tweet = re.sub("#\S+", "", tweet)           #hashtags
    tweet = re.sub("\.?@", "", tweet)           #at mentions
    tweet = re.sub("RT.+", "", tweet)           #Retweets
    tweet = re.sub("Video\:", "", tweet)        #Videos
    tweet = re.sub("\n", "", tweet)             #new lines
    tweet = re.sub("^\.\s.", "", tweet)         #leading whitespace
    tweet = re.sub("\s+", " ", tweet)           #extra whitespace
    tweet = re.sub("&", "and", tweet)       #encoded ampersands
    return tweet

The add a function to:

  • Take an array of raw tweets as a parameter
  • Open a CSV file for writing
  • Iterate through each tweet
  • Write each non-blank clean tweet to the file

 

def write_tweets_to_csv(tweets):
    with open('tweets.csv', 'wb') as f:
        writer = csv.writer(f)
        for tweet in tweets:
            tweet = clean_tweet(tweet)
            if tweet:
                writer.writerow([tweet])

Finally, add code to the bottom of your file to retrieve the tweets and write them to a CSV:

if __name__ == "__main__":
    tweets = get_all_tweets("realdonaldtrump")
    write_tweets_to_csv(tweets)

Run your script with python get_tweets.py. About half of Trump’s 3,200 tweets make it past our filters, so you should end up with around 1,600 rows.

Generate Sentences With a Markov Chain

You may not realize it, but you see Markov chains every day — they’re what power the auto-suggest feature on your phone’s keyboard. When it comes to sentence generation, Markov chains ask, “Based on the last word you typed and all the phrases you’ve typed in the past, what are you most likely to type next?” For an in-depth explanation on the mechanics of Markov chains, check out Filip’s post or Victor Powell’s excellent Markov Chains Explained Visually.

Fortunately, we don’t need to do the Markov calculations by hand. Jeremy Singer-Vine’s markovify package abstracts the generation of text-based Markov chains, letting you generate sentences from a corpus in just two lines.

Install the makovify package:

pip install markovify

Create a file called app.py. Paste this code to import markovify, create a model based on the corpus, and print a short Trumpov chain:  

import markovify



if __name__ == "__main__":
    with open("tweets.csv") as f:
        text = f.read()
    model = markovify.Text(text)
    print(model.make_short_sentence(100))

Run python app.py and marvel at either the plausibility or absurdity of this computer generated statement. Now we just need to hook that code up to a phone number.

Reply to an SMS with Python and Twilio

When someone texts our Twilio number, Twilio makes an HTTP request to our app. In return, it expects an HTTP response in the of TwiML – a simple set of XML tags that tell Twilio what to do next.

We’ll use Flask, the Python microframework, to handle that POST request. We’ll use the Twilio helper library to generate a TwiML response that sends a reply message.

Install those two packages:

pip install flask twilio

Delete everything in app.py and replace it with:

import markovify
import twilio.twiml
from flask import Flask, request, redirect


app = Flask(__name__)

@app.route("/message", methods=['GET', 'POST'])
def message():
    resp = twilio.twiml.Response()
    resp.message(model.make_short_sentence(100))
    return str(resp)

if __name__ == "__main__":
    with open("tweets.csv") as f:
        text = f.read()

    model = markovify.Text(text)
    app.run(debug=True)

Start your app with  python app.py.

Assuming you’re working on your local machine, you’ll need a publicly accessible URL to localhost so that Twilio can access your script. Fastest way to do this is with ngrok.

In a separate terminal window, start ngrok and copy the URL it gives you to your clipboard (check out the GIF below):

./ngrok http 5000

Sign up for a free Twilio account if you don’t have one. Buy a phone number, then setup your  number. Scroll down to the Message section and find the A Message Comes In field. Paste your ngrok URL and append the /message endpoint. Then save your configuration and text your burning question to your great Twilio phone number.

buy-trump-number

What’s Next?

Nice work! With just a few lines of Python you just:

  • Mined Twitter data using the Twitter API
  • Created Markov Chains in Python
  • Replied to an SMS in Python using Twilio

Armed with those skills, you’ll probably come up with a creation far more useful than a bot that pretends to be Donald Trump. To aid you in that endeavor, here are some resources that may be helpful:

If this post inspires you to build something cool, or if you have any questions, I’d love to hear about it. Drop me a line at gb@twilio.com or find me on Twitter at @greggyb.

PS – Please vote. You can register via HelloVote by texting HELLO to 384-387.

Many thanks to Ricky and Matt for the reviews.