How Patrick McKenzie (patio11) Builds Twilio Apps

You may know Patrick McKenzie as patio11 on the interwebs. He’s the:

Patrick’s also self-described as “Twilio’s number one fan.” In 2010 he started Appointment Reminder which, you guessed it, sends appointment reminders using Twilio SMS and voice calls.

Along the way, he’s learned a lot about building production-grade Twilio apps. Patrick stopped by 1871 while back in his hometown of Chicago over Christmas break and he was kind enough to sit down and drop some knowledge on us.

Below the video you’ll find a full transcript with — and this is the really cool part — code snippets from Patrick that help power his Twilio based business.

One other note: there are few emails I enjoy receiving more than the ones I get from being on Patrick’s mailing list. If you’re not already on it, you should sign up here.

Patrick’s first Twilio App

My first Twilio app was something I called my automated secretary. I’d been living in Japan for the last 10 years and my mother has never quite gotten the hang of international dialogue. Sometimes she would call at say 2:30 AM in the morning and obviously my phone is on in the middle night for emergency purposes, so pick up and see it’s from mom. Say “Is everything all right?” And she would say “Hey, how is it going. How was today?”

I thought it would be really useful if there was some way to intercept a call and say “Hey it’s 2:30 AM in the morning, if you want this call to go through hit ‘1’, otherwise just call back some other time when you will not wake me up.”

That was literally about 15 minutes of just banging away and sent out a script for Twilio. It worked, it was those magic programming mobiles where it actually runs the first time you run it. Worked flawlessly, so I gave it out to my parents and all my brothers and sisters that would call me at that number for the last four or five years now.

Having another future Twilio was often just a matter of having another URL that does something different and then plugging it into a new number, the same number or plugging it in to a Twimlet or something. The thing that I did, when I got married, my family would love to be able to talk to my wife, but my wife does not have a US phone number available, so I just bought another number from Twilio and put out another endpoint that forwards to her cell phone and then… boom done!

I’m currently in the United States now, but my wife is at home in Japan taking care of our daughter. I have a Japanese cell phone in my pocket, but I’m calling on a Japanese phone that is roaming in America back to Japan is ridiculously expensive. I have an American burner phone in my pocket which can only make calls to US numbers, but I have her phone number saved in that. I call that number, punch in my pass code and boom we can talk. Just talked this morning for a good thirty minutes.

I love it. It’s so easy to do. I’m really not trying to sell it to you guys, but it’s so easy to get stuff up and running. I’ve been at hotels before in countries that I don’t even reside in, like say Germany. I think well I could make an international phone call from this hotel phone to Japan that’s going to be ridiculously expensive, or I could use the Twilio rest API and bridge the call between the Japanese cell phone and the German number, this phone, without ever having a German number available. Then boom that happens. I love it so much.

Patrick’s first Twilio app: 

 

Moving from your first app to a business based on Twilio

In terms of character, that the personal secretary app, might not do the same things as everybody’s Twilio app, but it’s organized in the same way as everybody’s first Twilio app: one quick and dirty script, like one file. Not a whole lot of QA involved. No monitoring, yadda yadda. It doesn’t have to be all that rigorous because it works for its one or two single defined purposes. There’s no product development roadmap for it. If it goes down in the middle of the night, family sends me an e-mail and asks, “Hey, Patrick, why weren’t we able to call on your phone number?” No business burns to the ground as a result of that.

Contrast that with Appointment Reminder. On any given Monday between the minutes of 8:45 am and 9:00 am American time, Appointment Reminder will get used by several hundred customers to send up to a couple hundred appointment reminder phone calls or SMS messages per minute.

Some of them are appointments for hair salons and if someone doesn’t get a reminder about their appointment with the hair salon, that’s unfortunate for them and unfortunate for the hair salon, but isn’t something that will make me feel like a failure at life. But some of them are like appointments for a little old lady to come in and see her doctor or that sort of thing. It is very much a problem if Appointment Reminder goes down for a few minutes. It’s mission critical in all the senses of the word. It has to be very reliable. I have to not break it. I have to be able to extend the app in ways that it’s not originally envisioned to go.

That sort of quantum leap in complexity, both versus the app that I originally made for Twilio and also versus what a lot of the Twilio documentation assumes that you’re going to be building. I think we’ll talk through some of the decisions that I made and how I’m able to run this in production in a way that is reasonably reliable, even though I’m the only technical person at the company.

State Machines

One of the things that you often do with Twilio apps is to create some sort of phone tree. You’re going to play a message to someone saying, “Press one to talk to sales. Press two to talk to customer support. Press three to talk to blah blah blah.” It’s called a phone tree or an IVR, in industry parlance.

I believe that the way that the documentation often describes doing this is first you play that message. Then you take the user’s input and you use a case switch statement or a bunch of “if, thens” to do the business logic and produce a result that pressing one or pressing two or pressing three should work.

You can do it that way. That will work the first time. But you don’t want to do it that way. This gets independently rediscovered by everyone who does serious Twilio apps. You probably want to model each individual work flow that goes through your application or your IVR system as a state machine.

What is a state machine? State machines are a computer science-y kind of concept where’s there a bunch of states of the application that you can be in, or states that a particular call can be in. There is transitions between states that are triggered by input. You know that if I’m currently in start call state, then pressing one goes to the talk to the support team state. Pressing two goes to the talk to the sales team state.

You basically use the input to the application to drive those transitions. Then based on what transition has just happened, you produce output to Twilio to say “Okay, and now you want to do this.”

Why do we do things in a state machine? Number one, it allows you to separate great application logic from your view layer logic. Producing TwiML is typically a view later kind of concern. One often wants to be able to test that the control flow logic does want things it does. For example, pushing ‘1’ activates the sales function rather than the support function without testing the view logic at the same time.

If you model the interaction as a state machine, you can write unit tests proving that the state machine has the proper transitions in it and that it does the right things to the right business objects and that that is persisted correctly to the data base without necessarily having to involve the view level TwiML code in the mix as well.

It also allows you to test the views so that you can say, “I don’t have to mock up an entire phone call. We’ll just assert that the state is this, and thus the view that should be generated is this. Now, given that assertion, prove that the TwiML that is actually outputted by the application matches the TwiML that they expect, that it’s valid XML, and prove that Twilio will not die with the infamous ‘An application error has occurred’ message.

Another reason to use state machines, if you use massive case statements, it’s very easy when you’re updating the code to add one more thing to the if-then statement to break it in some subtle fashion. I’ve done this more times than I can count.

If you’re doing state machines, typically they’re done in a very data driven fashion. You just add another state and the transitions between that state and the pre-existing states. Then you add the test for those transitions and you’re done.

Whereas often, back in the day when I was running things with if-then statements, when there three functions, I would have to test three functions by hand. When I added a fourth function, I would now have to test four functions by hand over again. If I needed a tweak to any one of those, I would have to test everything over again just to make sure that there was no subtle dependency in the code that broke things. Whereas a state machine, the states are provably, logically separate from each other. They shouldn’t be correcting the results of the other states.

How to test Twilio Apps

I test the application logic. I have a Reminder class: a reminder starts in the scheduled state it’s supposed to, fires after a particular time, which is saved the reminder object. Then it moves into the processing state. Once something is in the processing state, the queue workers can grab it and attempt to send out SMS messages or send out a phone call as the case may be.

They have unit tests in the typical Ruby on Rails developer fashion to test that, “Okay, if something is in the scheduled states and the time is after the fire-after time that’s saved on the object, and we executed the method that’s supposed to pick up things for processing, then it should return this object and it is one of the ones that should be processed now. ”

Okay, that works in isolation. Now we can assume that that building block works and move on to testing other things independently of that rather than having to tests manually and create lots of these objects and then make sure that the phone actually rings. You don’t have to worry that the phone actually rings. If I know that it’s getting picked up by the queue workers at the right time then that can be assumed to work. I can test that the phone rings in isolation when the Reminder.deliver_phone method is called.

I actually haven’t found a great way of doing this in any totally automated Selenium-esque kind of fashion. I do it in two ways.

Number one, a lot of smoke testing. Early in the life cycle of Appointment Reminder, I spent an awful lot of time with my keyboard in one hand and my cell phone in the other. I heard an awful lot of “An application error have occurred.” That can occasionally get told to people in production, which there’s a way to turn that off, but that’s neither here nor there.

Basically if a customer hears that I’ve screwed up in a serious fashion. In addition to doing smoke testing manually with myself by the phone, I use Twilio to test Twilio. For example if I’m doing an IVR kind of prompt where my user will be driving the functionality of the application by typing stuff into the phone, I have a bot which has Twilio call the phone number, then play key tone presses. Then, “Okay, the application should then do this.” Then obviously it’s very hard to unit test stuff, “Verify that it plays the following Mp3 file,” because comparison of Mp3 file with something that’s coming over a phone line is a little difficult. But I can at least verify that no error happened on the call. That’s something I can query in the API. I can also verify that my script played the press ‘1’. I assume that will fire the confirmed method so this should get marked as confirmed in the data base. I have access to the data base. Check to see if it was marked as confirmed in the data base.

How to build high-availability Twilio apps

Appointment Reminder has to be pretty high availability. Any time the queue workers went down was basically an emergency for my business immediately. I have a lot of defensive depth there.

I used the Scout Monitoring System. Wonderful app, by the way. If the delayed job is down when Scout checks in, which Scout checks in every minute, it immediately sends an SMS to me and an e-mail alert to get that up and running.

Then, what happens is I realize my existing monitoring setup isn’t adequate and I add more things on top of it. Occasionally I miss those SMS messages, the “Red alert, the monitoring queues are down!”

Well, all right… I’ll have a way for the computer to restart the monitoring queues automatically. I used a process monitor thing that’s written in Ruby called God. Don’t know why they picked that name for a processing monitoring gem. Maybe God knows when any sparrow dies and God also knows when any delayed job instance dies. It’s supposed to restart delayed job if delayed job doesn’t work.

If I today code pushed in a way that killed delayed job for some reason, in such a way that delayed job could not cleanly restart, or if I exhausted memory available on the box and it couldn’t cleanly restart, then it could still die. Scout would be sending me a lot of SMS messages, but if I was ignoring iPhone in the middle of the night or it ran out of power (I have more outages that start with, “So first I ran out of juice on my iPhone and then…” than I know what to do with).

Alright, so what’s the third layer of defense? The third layer of defense is there’s a filter that fires one percent of the time for any HTTP request that’s going into Appointment Reminder. That’s one percent of the time, I query the state the database for the state of the queue. If the queue has more than a few items on it, instantiate the queue worker within the web process — slow down the existing request that isn’t nearly as important as making sure that these phone calls that are on queue go out.

It grabs a few jobs from the queue and then fires it off, and then does the HTTP request that was actually asked of it. It could be totally unrelated to an HTTP request. Someone could be logging in to their dashboard and the web service basically is like, “Well sure, I’ll serve up your dashboard HTML at the moment, but I got to send these phone calls out right now so you’re going to wait.”

In the best of all worlds, we have separation of concerns, but in the world where choices between delaying people who are using the website, who are my customers generally, and inconveniencing my customers’ customers by not sending them their reminders on time, inconvenience my customers. Typically, that’s the right trade off for them.

When I inconvenience my customers’ customers, people get very, very mad. There was a moment once which caused my customers” customers to have their phone lines get basically DDOS attacked launched by me against home phones. That was very unhappy both because obviously I don’t love blowing up peoples’ phones. Also don’t love that when I blew up peoples’ phones, the message that was getting played was from their service providers. That caused a problem in my customers’ relationship with their clients.

One of my customers was a martial arts teacher in I think Toledo. He instructed me very seriously that I am never to come to Toledo because he will enact vengeance upon me for telling people about their karate class 40 times in a one minute period.

Anyhow. Luckily that bug happened very early in Appointment Reminder’s life so it inconvenienced maybe 60 people rather than 60,000 people. Still, it very much hurt when it happened, which lit a fire under me to work out my reliability story a lot.

Here’s how Patrick helps prevent downtime:

 

Logging for Twilio Apps

I’ve been using Twilio for a very long time now. When I started using it back in early 2010, it didn’t really have all that great visibility into what was happening with past API requests. If you personally were seeing the API request go out and there was some error, you could catch that. Given that very rarely there’s a human like actually sitting behind the curtain watching API requests go out in what is for me the middle of the night, if an error happens, I would lose the context of what the error was: what had caused it, what the state of the application was around that error.

Twilio has since developed some very impressive tools within the Twilio dashboard under the Developer Tools. It will show any HTTP errors that you got from Twilio that was an urgency for you. Which by the way any HTTP error from Twilio is an urgency for you. You should probably set a notification for that, which happily they will do these days.

If I have any sort of error involved with a Twilio request… or actually, not even errors… even when I do successful requests to Twilio or the REST API, I take the entire response that was given by the API, serialize it as a string, and throw it in a Redis instance. It’s thrown in there for a period of two weeks. I guess theoretically one could log it into a log file, but some of this information is sensitive and while I could theoretically have a log rotator that shredded the logs after two weeks, I decided instead to just put it in Redis instance, which makes it a little more query-able.

I put it in Redis for two weeks and if somebody says, “Hey, what happened with regards to the call to Cindy yesterday?” I look into my database. Figure out what Cindy’s I.D. is in the database. Find the correct place in Redis where I expect the logs of conversations with her to be, or the logs of Twilio and API requests, these facilitate conversations. Then step through those and see “Oh okay. Here’s a non-obvious thing that could potentially screw up a phone call if someone specifies the url of a mp3 file that they wanted to play on phone calls to their customers.” They’re hosting the mp3 file and their hosting account expires, or they delete the Mp3 file and they don’t realize that’s referenced by your stuff.

If getting that mp3 file 404s, then the Twilio application is going to die with the “Application error has not occurred.” This could possibly happen in a way that is not visible to your app in any fashion. You can catch that happened and surface that to end user. For example, you could theoretically send them an e-mail and say hey, this mp3 file that you had configured to use on your phone calls is not active so you need to replace that ASAP. In practice, I think it’s a better use experience, if the user gives you any mp3 file, to slurp that and save it into an S3 bucket somewhere and then host it on their behalf. Often users, especially non-technical users, don’t appreciate the consequence of having to have high-availability of something that they might have originally hosted out of their Dropbox, for example.

Store your own Twilio usage information

Twilio hosts a, as far as I know, a limited record of all the SMS messages you have already sent or received, all the phone calls you have made, all the recordings you have made. I have never had a data durability problem with Twilio, but I still persist most of the information on my end of things even though I have a reasonable expectation that Twilio will still have that information when I go looking for it.

I wanted to do this for simple speed reasons. Twilio exists in the cloud somewhere. My stuff exists in the cloud somewhere. My database is much closer to me cloud-wise than the Twilio cloud is to me. That can be a difference in speed which matters to the user if they’re doing some process live on the website and want to get it resolved quickly.

For example, there’s certain things that one would potentially want to look up. Like “search through all of my messages which are associated with my account that contain something.” I might expose that feature to a user. The way that get implemented on the backend, because I run a multi-tented application under one Twilio account, I would have to query all the SMS messages that were under my Twilio account I.D., whittled down to only the ones that were associated with a particular user, then display that to the user in an HTTP request-response cycle. My target for getting stuff done in an HTTP request- response cycle is 500 milliseconds or so. If things take longer than 500 milliseconds, I need to figure out some way to pre-calculate that to get it under that target.

Now, querying the Twilio remote API, parsing that and displaying the result to the user might have been viable when I first started writing the Twilio application four years ago. These days I’m sending several hundred thousand SMS messages a year. Twilio will allow you to paginate them, one hundred messages at a time. It is highly unlikely that I’ll be able to do 5,000 HTTP requests, collect the data from them, and then do the regex sorting and whatnot in a 500 millisecond time frame. So rather than doing that, I have it all persisted on my side and use my database as the sole source of truth. If, knock on wood, my database should ever fail, I can regenerate those tables from Twilio API in a process that will probably take at least a significant amount of several minutes to run.

The time I accidentally bought 300 numbers

Obviously I have automated testing procedures. I use testing like many Rails developers these days. I use something called auto-test, which every time I save the file, it immediately reruns my unit test. I love this because I hit command-S on my keyboard and if all my tests are green, I get the nice little visceral experience of the Growl notification that has a green light in it that says “400 of 400 tests passed.” Yay!

I once added a unit test which, unbeknownst to me, accidentally exercised the functionality for my account to buy a phone number for a new account as it was getting created on Appointment Reminder. Every time there was a change to my files, it provisioned a new account and provisioned a phone number for that account. Since Twilio is very, very reliable, I got the green light every time. I was just doing my development for 30 minutes, probably hit the command-S button about 300 times in 30 minutes, and then realized “Uh oh, I have bought $300 worth of phone numbers.” Good thing it wasn’t in a for-loop.

I’ve since tweaked something now. Obviously, it’s bad news if you buy phone numbers, but that’s not the worst thing in the world. Twilio is very understanding if you just send them an e-mail about that. That affects your pocketbook, but it doesn’t affect any customers. But let’s say you have a habit of logging into your developing environment and just typing in random numbers to verify that something that’s ten digits long actually works. If, unbeknownst to you, your test data was actually causing actual phone calls to go out to the real POTS network, and to ring actual people’s phones 300 times in 30 minutes, that would be an entirely different level of “whoops.”

Realizing that would be a problem, I did something to defang Twilio in non-production environments. I have a whitelist where I load in phone numbers which I personally own. Five of them. I monkey patched the Twilio helper library such that if it attempts to make a phone call or SMS message to any phone number that is not on the whitelist, it immediately throws an exception and kills the rest of that process. An exception will generally bubble up and throw an error in the test rather than do something that could potentially affect the real world. That has a nice side effect of preventing a for-loop in a unit test sending out $10,000 worth of SMS messages.

Here’s how Patrick monkey-patched the Ruby Twilio helper library: 

 

Twimlets and Zapier

Twilio has this website called Twimlets which gives you very basic Twilio applications that you can basically create with a form that it gives you a URL and you can just copy-paste url onto a Twilio phone number. I’ve gotten an unreasonable amount of value from Twimlets over the years. I use them for all sorts of things, but running my company’s voice mail to use them for monitoring task. For example, if my queue workers are down, there’s a cron job that checks for that every five minutes. If it detects that state of affairs, it fires a phone call against a Twimlet, which does a very simple thing that says “The queue workers are down. The queue workers are down. The queue workers are down.”

Theoretically I can host that myself, but one can imagine circumstances where the reason the queue workers are down is something that also brings down my web infrastructure. But there’s virtually nothing that can cause my queue workers to go down at the same time as the Twilio Twimlets service go down. Having that logic be somewhat external to my own application makes it a little more reliable. Diversifying away from single points of failure is pretty important to my reliability story.

That’s something I would encourage you to do, just to go look at Twimlets. They’re easy to get and they provide a surprising amount of functionality if you use them creatively. You can also chain them together, which is really fun. They have a simultaneous ring system. If nobody picks up, then send them to another Twimlet and then another Twimlet. Send an SMS message to someone who called but you hadn’t gotten to them before. That sort of thing.

Another fun thing… If you use any sort of cloud service, you really need to look at Zapier. It’s a company that helps plug APIs into each other. The notion is that you have one API which is the source of events, then one API which is the source of action which should take an action in an event happening with the first API. Typically Twilio is not the source of an event, but rather it’s the action I want to take. Typically I want Twilio to send an SMS in response to something happening. The sky is the limit in terms of what APIs you can hook up via Zapier.

For example, when a new lead is added to my CRM system, send an SMS to my sales representative saying, “Hey a new lead was added to the CRM system. Click here on your iPhone to open that up.” Or when we get an e-mail which matches a certain regex, send an SMS. When a new ticket is added to our customer support system, send an SMS to our customer support folks. When a new ticket that with priority greater than X was added to the customer support system, send an SMS to me. The sky is basically the limit.

As Zapier integrates with hundreds of APIs, if there’s anything where you ever thought “Man, it would be great if this cloud service and this cloud service played together,” Zapier makes it happen. Often the surface I want to play together is “Twilio plus everything else in the world.” I really truly believe after you have the box of capabilities that Twilio opens up to you, you see Twilio apps everywhere. My friends make fun of me for this, because my answer to everything is, “I could probably make that better with a Twilio app.” They say, “That’s your answer to everything. Your answer to everything is a Twilio app.”

Twilio makes magical experiences

It’s such a magical experience. It gets into some deep cultural anthropology, but we have a very deep relationship with our cell phones these days. I’ve never seen so much giddiness as when I’m demoing a boring, B2B software application on my iPad and I say, “Hey, give me your phone number. I’ll type it in here.”

Doo-tee-doo-tee-doo. Hit go. The phone rings. I’m like, “Whoa, that’s fun. What is it?” Then they slide it down and are like, “Hello?” Then have a voice actress who recorded a short mp3 file for me that just plays in their ear. They’re going, “She’s talking to me! She’s talking to me! Like the computer is talking to me!” They light up with excitement.

“You want to see something real exciting? Hit ‘1’ on that phone call.”

They hit ‘1’ and it shows up immediately on the iPad. It’s obviously gone from Twilio, Twilio has hit a webhook on my site which has pushed something via socket.io to the web page that’s opened on the iPad. It shows a notification within a fraction of a second after the button press happens on their cell phone. “Wow, that’s amazing!” I’ve done the demo from my application. I’ve done other folks applications. Even with hardened, “I have seen it all, done it all” software folk, giggles happen.

Use Voice Actors instead of the <Say> verb

When you’re doing Twilio apps, you’ll be using the <say> verb a lot. The <say> verb has a text to speech option to read out things. They’re good enough for development so that you can verify that the logic and flow control is working correctly.

Once you’re exposing these phone calls to customers, you typically don’t want a very robotic phone call to be representing your business or your customers’ businesses. Say for an appointment reminder, you don’t want someone’s first point of contact with their dentist to be a robotic voice that says, “This is an automated Appointment Reminder from Happy Teeth Dental. Your appointment with Dr. Benedict is at 4:47 pm on Tuesday the 12th.” It’s just not a great experience.

Instead, to the maximum extent possible, after we know what text we’re going to use, I like to have that recorded by an actual human. There’s a few ways you can do this. There’s a service called Voice Bunny, which actually has an API available. It’s a little expensive, because they’re using professionals for this. Worth the money for things which you’re going to be reusing a lot.

If you want to do things a little cheaper, for $5 you can go on Fiverr where you get basically a bored college student to record the same thing for you. The online demo of my application, which represents my company and has for the last four years, was recorded by someone in two Fiverr engagements, for a total of $10. It’s closed tens of thousands of dollars in business for us. That’s obviously an option.

One subtlety about that is that typically it’s economically non-viable to record a unique mp3 file for every call. You’ll often stitch together things via a Frankensteiny mix of different mp3 files together. For the logic for my application we basically wanted to mix and match a selection of introductions: the body of the appointment details, the date and time, and the selection of instructions to use: “Press ‘1’ to confirm your appointment. Press ‘5’ to cancel your appointment.” Or if a customer doesn’t want the cancellation, you have a seperate mp3 file that we can play which doesn’t mention the cancellation option.

Now, the part in the middle there where there’s dynamic information getting generated: “4:47 pm on Tuesday the 12th.” As as a stop gap measure, you can have the computer narrate that. But here’s a little bit of trick: if there’s a restricted range of data that can possibly be, you just pay your voice actress a little bit of extra money and have her record an mp3 file which says “One, two, three, four… twelve, fifteen, thirty, forty-five, o’clock.” Splice it up into a bunch of mp3 files and then you do some Frankenstein stitching on the fly. Since Twilio will play any number of mp3 files in a row, it won’t sound completely natural, but it will definitely sound like human talk with perhaps a little bit of weird pauses. Depending on how good you are at mp3 editing. Folks actually think that it is a real human leaving all the messages. It’s kind of awesome.

Again, we have a very personal relationship with our phones. People feel a sense of intrusion when they feel like an automated processor, somebody who doesn’t really care about them, is intruding on their personal space on their phone. You know, it’s intrusion both into someone’s psychological personal space, their iPhone in the pocket, in their hand. It’s also intrusion on their time. That’s pushing a phone call to someone and demanding they pick up and listen to it right now. It is, by its nature, an intrusive action.

Now, it doesn’t mean we should never do it, obviously. People who have an appointment with their doctor tomorrow to get their cancer drugs adjusted should really hear about that. That is important for them. But, we should be conscience of the fact that that interaction is a personal one and give them give them the true impression that we as a business really care about them, even if it’s an automated interaction.

That means doing human niceties. You would never have a conversation with someone like you would have a conversation with an API. There’s a Japanese word for it but I don’t know it in English: to just jump directly into the business part of the conversation without having some human niceties like, “Hey, how are you doing? How’s your day today?” Just like one would never have a solely goal directed conversation with somebody, we should also have the human niceties in the phone call and, dealing with the UX challenges with that, using a human voice to the maximum extent possible.

Dealing with voicemail

You should think of how your phone call will be played in different environments. Here’s a funny Twilio story for you….

I was not conscious of the fact that many of my phone messages that I was leaving would be played on answering machines. Let’s say you leave someone a message which says, on the assumption that someone’s live on the call, “Press ‘1’ to confirm your appointment, press ‘5’ to cancel your appointment, press ‘9’ to have your service provider contact you about your appointment.”

Someone presses ‘5’. The next thing they hear, is “This message has been erased.”

Someone is pushing buttons against their voice mail system, not against my application. My application didn’t know that. The user, who has no clue what’s going on, feels very bad. They ask their dentist about it. My dentist doesn’t know what’s going on. This was causing problems for me for months before I realized that was happening.

Twilio has a way to detect whether you’re speaking with a answering machine or not. Unfortunately it’s not 100 percent accurate. It’s been getting a little bit better over the years. Ball park, it’s maybe an 80 percent solution. I’d investigate turning that on. I would also, if you’re running a multi-tenet application with Twilio such that you have different clients, you might have different client populations, give people the ability to turn on answering machine detection per account basis.

Also, give them the ability to do the following trick on a per account basis. Different client populations have different handsets they use, have different voice mail systems that they use. Often, a local business has a particular carrier that’s hegemonic in that region, or a particular socio-economic group that’s associated with that business that tends to use faily similar handsets and those hand sets have fairly similar limitations. I found that the settings that work for some clients don’t work as well for others. It’s good for me to be able to offer choices that I can tweak per client, rather than making an assertion across my customer base that is incorrect.

If you’re using the Twilio thing to detect whether you’re talking to a voicemail or not, Twilio has to guess the point in the voice mail that’s “Leave your message after the beep,” guess when the beep happens, and then start playing your pre-recorded message. Unfortunately, it doesn’t guess it correctly 100 percent of the time. A symptom that your user could get as a result of that is Twilio starts playing a message early but the machine only starts recording after the beep is that your message starts half-way through.

They’re typically not too happy about that. But, fun fact about most voice mail messages: they will immediately skip and start recording as soon as you hit something on your keyboard. You can skip to the beep by pressing one. An option that I give my customers is “We will play the tone for a button press immediately on starting every call,” which skips immediately to the recorded part of the voice mail prompt so that they almost always get 100 percent of the message recorded.

That’s a trade off I give my customers, too. Why wouldn’t you do that all the time? Well, if it’s on a live phone call, the live phone call would be like “BEEP! This is an automated message from your dentist….” When folks say, “My customers typically use who complain about the message being cut off.” Then I say, “Well, okay, we have this option for you. Here’s the trade off. Are you willing to take that trade off?” If it’s important to them, they are.

Another thing you can do, just like we build reliability into our systems and assume that queue workers will fail some of the time, we might assume that the message fails some of the time. One of the things I do is I play messages in a loop, at least twice. Let’s say a typical message is, “This is an automated message from your doctor, your appointment is at… Press ‘1’….”

Let’s say the first half of that message gets cut off. The message which somebody actually hears might be “… December 24th, press ‘1’ to confirm. Press ‘2 to cancel. Press ‘9 to have us contact you.”

That’s not a very useful message. But if you repeated the original message and say, “This is an automated Appointment Reminder from the doctor” at the end of the cut off part. “Your appointment is at 10:00 o’clock on December 24th. Press ‘1’ to do this, press ‘2’ to do this, press ‘3’ to do this,” then that’s a much more comprehensible message.

The level of complaints my customers were getting back unintelligible messages went way down after I put them in a simple for i=1; i<=2; i++ to repeat the message text.

How Patrick got a Twilio track jacket

This is actually my favorite jacket in the world, I wear it almost every day. Twilio is very protective of their jackets. You have to either be an employee or perform them a conspicuous service. The conspicuous service that I performed for Twilio back in the day was there was, there a security vulnerability in the Twilio application.

They’ve got a team of crack engineers, but everybody has a problem once in a while. I ran into a bug, which as I was debugging this with the very helpful support folks they said something, which really piqued my radar.

Well, if that thing they said is just true, this is very, very bad news for us because it allowed at the time — they fixed it three years ago so don’t worry about it — but at that time would have allowed any user of my application to control my applications use of Twilio. Which would be bad because it would allow someone to send phone calls to any number with any sort of content that they wanted, drain my Twilio account, harass people, get information out of my application that they shouldn’t get. Very bad news.

After dealing with the Twilio folks and realizing that this bug was a symptom of an issue which would allow someone to hijack Twilio applications, I reported that to the Twilio team. No lie, after I CC’ed the issue to their security contact, three minutes later I got an e-mail from the CEO of the company. One minute after that I got an e-mail from the CTO of the company that they were on it. They fixed it within probably an hour. I was very happy with their responsiveness there.

I’m generally pretty happy with Twilio’s responsiveness to everything over the last couple years. They’ve been very, very good to me. I strongly suggest that you use them. They’re far and away my favorite company that I do business with.

I love talking Twilio and all sorts of things to people in the software industry. My e-mail address is [patrick at kalzumeus dot com] or [patrick at appointmentreminder dot org]. They’re both the same and will send me an e-mail about the same time. Send me an email anytime, I love talking about this stuff. If there’s ever anything I can do for you, drop me a line.

Photo credit: The Business of Software Conference

  • Jeff

    Super long to read. But what a rest article.

  • Hardy Jonck

    Excellent article – thanks Patrick. I agree with the State Machine and liked your idea of using Twilio to test Twilio Apps :-)