One of the many allures of Twitter is that you can tweet at your favorite celebrity and (maybe) get a response. Still though, tweeting isn’t quite as intimate as trading text messages. So we thought it’d be fun to use Markov Chains, Programmable SMS, and Python to create a bot that impersonates your favorite Twitter personality.
We could use the code below to create an SMS chat bot that sounds like anyone with a Twitter account. But to show off it’s true potential, we need to someone with a distinct and recognizable tweeting style. Someone with a huge personality. Someone who has the best words.
Someone like Donald Trump.
Ever wish that you could debate Trump? Drop him a text at: 847-55-TRUMP (847-558-7867).
There are three steps to create this bot:
- Download the tweets for a given user to create a corpus of text.
- Use the corpus to generate a sentence in the style of the Tweeter.
- Reply to a text message with that sentence.
To follow along, you’ll need Python, a Twitter account, and a free Twilio account.
Download All Tweets from a User
Before we get started, let’s give credit to Filip Hráček whose Automatic Donald Trump was the inspiration for this idea. Check out his post for an excellent explanation on how to implement Markov chains in Dart.
Markov chains begin with a corpus — a library of text to train your model. We’ll use a modified version of this tweet_dumper script to pull in down tweets from the Twitter API.
To get started, create and activate a new virtual environment:
virtualenv markov-venv source markov-venv/bin/activate
Then go to wherever you keep your code, make a directory for your project and create a file called get_tweets.py .
mkdir markov-bot cd markov-bot touch get_tweets.py
Install the tweepy package to connect to the Twitter API:
pip install tweepy
In get_tweets.py, import tweepy, csv, and the regular expression packages:
import tweepy import csv import re
We’ll need Twitter credentials to access the API. Create a new app on the Twitter Application Manager. You can fill in any ol’ domain for the website and leave the callback URL blank. Once created, generate your credentials:
Add those credentials to the bottom of your file (you’ll want to extract those creds to environment variables if you deploy this script, but hard-coding works for now):
consumer_key = "YOURCONSUMERKEY" consumer_secret = "YOURCONSUMERSECRET" access_key = "YOURACCESSKEY" access_secret = "YOURACCESSSECRET"
We’ll create a function that pulls down all tweets for a given screen name. We have to do this iteratively, as the Twitter API only allows 200 tweets at a time. Also, we can only retrieve the 3,024 most recent tweets. Once we hit that limit, our new_tweets array will be blank and we’ll know to stop iterating.
Until then, we keep querying Twitter and adding to all_tweets. To make sure we’re getting the words straight from Trump’s fingertips, we’ll only keep the ones where tweet.source == 'Twitter for (delete that conditional if you create a bot for other tweeters).
To do all this, add this code to the bottom of get_tweets:
def get_all_tweets(screen_name): all_tweets =  new_tweets =  auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_key, access_secret) client = tweepy.API(auth) new_tweets = client.user_timeline(screen_name=screen_name, count=200) while len(new_tweets) > 0: for tweet in new_tweets: if tweet.source == 'Twitter for Android': all_tweets.append(tweet.text.encode("utf-8")) print( "We've got %s tweets so far" % (len(all_tweets))) max_id = new_tweets[-1].id - 1 new_tweets = client.user_timeline(screen_name=screen_name, count=200, max_id=max_id) return all_tweets
Add a function at the bottom of your file to strip out some miscellaneous text and make our replies feel more like text messages and less like tweets:
def clean_tweet(tweet): tweet = re.sub("https?\:\/\/", "", tweet) #links tweet = re.sub("#\S+", "", tweet) #hashtags tweet = re.sub("\.?@", "", tweet) #at mentions tweet = re.sub("RT.+", "", tweet) #Retweets tweet = re.sub("Video\:", "", tweet) #Videos tweet = re.sub("\n", "", tweet) #new lines tweet = re.sub("^\.\s.", "", tweet) #leading whitespace tweet = re.sub("\s+", " ", tweet) #extra whitespace tweet = re.sub("&", "and", tweet) #encoded ampersands return tweet
The add a function to:
- Take an array of raw tweets as a parameter
- Open a CSV file for writing
- Iterate through each tweet
- Write each non-blank clean tweet to the file
def write_tweets_to_csv(tweets): with open('tweets.csv', 'wb') as f: writer = csv.writer(f) for tweet in tweets: tweet = clean_tweet(tweet) if tweet: writer.writerow([tweet])
Finally, add code to the bottom of your file to retrieve the tweets and write them to a CSV:
if __name__ == "__main__": tweets = get_all_tweets("realdonaldtrump") write_tweets_to_csv(tweets)
Run your script with python get_tweets.py. About half of Trump’s 3,200 tweets make it past our filters, so you should end up with around 1,600 rows.
Generate Sentences With a Markov Chain
You may not realize it, but you see Markov chains every day — they’re what power the auto-suggest feature on your phone’s keyboard. When it comes to sentence generation, Markov chains ask, “Based on the last word you typed and all the phrases you’ve typed in the past, what are you most likely to type next?” For an in-depth explanation on the mechanics of Markov chains, check out Filip’s post or Victor Powell’s excellent Markov Chains Explained Visually.
Fortunately, we don’t need to do the Markov calculations by hand. Jeremy Singer-Vine’s markovify package abstracts the generation of text-based Markov chains, letting you generate sentences from a corpus in just two lines.
Install the makovify package:
pip install markovify
Create a file called app.py. Paste this code to import markovify, create a model based on the corpus, and print a short Trumpov chain:
import markovify if __name__ == "__main__": with open("tweets.csv") as f: text = f.read() model = markovify.Text(text) print(model.make_short_sentence(100))
Run python app.py and marvel at either the plausibility or absurdity of this computer generated statement. Now we just need to hook that code up to a phone number.
Reply to an SMS with Python and Twilio
When someone texts our Twilio number, Twilio makes an HTTP request to our app. In return, it expects an HTTP response in the of TwiML – a simple set of XML tags that tell Twilio what to do next.
Install those two packages:
pip install flask twilio
Delete everything in app.py and replace it with:
import markovify import twilio.twiml from flask import Flask, request, redirect app = Flask(__name__) @app.route("/message", methods=['GET', 'POST']) def message(): resp = twilio.twiml.Response() resp.message(model.make_short_sentence(100)) return str(resp) if __name__ == "__main__": with open("tweets.csv") as f: text = f.read() model = markovify.Text(text) app.run(debug=True)
Start your app with python app.py.
Assuming you’re working on your local machine, you’ll need a publicly accessible URL to localhost so that Twilio can access your script. Fastest way to do this is with ngrok.
In a separate terminal window, start ngrok and copy the URL it gives you to your clipboard (check out the GIF below):
./ngrok http 5000
Sign up for a free Twilio account if you don’t have one. Buy a phone number, then setup your number. Scroll down to the Message section and find the A Message Comes In field. Paste your ngrok URL and append the /message endpoint. Then save your configuration and text your burning question to your great Twilio phone number.
Nice work! With just a few lines of Python you just:
- Mined Twitter data using the Twitter API
- Created Markov Chains in Python
- Replied to an SMS in Python using Twilio
Armed with those skills, you’ll probably come up with a creation far more useful than a bot that pretends to be Donald Trump. To aid you in that endeavor, here are some resources that may be helpful:
- Matt Makai’s How to Build an SMS Slack bot
- The Twilio and Python Quickstarts
- The Twilio Tutorials (which feature clone-able, production ready apps)
PS – Please vote. You can register via HelloVote by texting HELLO to 384-387.
Many thanks to Ricky and Matt for the reviews.