Build the future of communications.
Start building for free

How to build an On-Demand Translation Service Using Twilio MMS, CamFind and Microsoft Translator


Picture the scene: you’re out on a romantic evening with your partner and you find yourself staring at a beautiful sunset. You suddenly think you could gain some brownie points by describing the scene in French, but you don’t know how to speak French! If only you could send a picture message to a handy on-demand translator service, you’d know that the mots d’amour would be “une magnifique coucher de soliel”!

Translated Sunset

Thankfully, using an MMS-enabled Twilio number and a few APIs (CamFind and Microsoft Translator) it is pretty easy to build this type of translation service. After I show you how to build the basic translator you’ll be well equipped to whip out the romantic French phrases, or phrases in any language for that matter!

What you will need

How it works

Step 1

Step 2

The initial version of the app will support translating an image into five different languages: Spanish, French, Italian, German and Klingon. The user will send an MMS message to our Twilio number containing a photo and a language to translate the description of that image into. We will also allow the user to send in the word “list” to receive a list of supported languages.

As soon as we receive a valid MMS message from the user we will send back a response indicating that we are performing analysis on the image and a translation will be coming soon. The image URL from the MMS message is sent off to the CamFind API for analysis. The results of the CamFind API call are sent to the Microsoft Translator API to be translated to the user’s requested language.

Once we have all of the pieces in place a text message is sent back to the user stating the best guess for what is in the image and how to say that in the target language.

Try it now by sending a picture to:

(202) 800-1180

Make sure to include text with your picture to let the translator know which language to translate your picture into.

The full project code is available if you want to follow along: Github

Setting up the project

To get started on this project you are going to need to have Ruby and Ruby Gems installed. If you are on a Mac, this should already be the case. For Windows users, I would recommend checking out Ruby Installer. For the Linux users that might need a refresher on package management, here’s a guide for using apt-get in Ubuntu.

Now that we have that prerequisite squared away, open up a terminal window and create a new folder to hold our app:

mkdir translator_mms

Change into this newly created directory so we can install some Ruby Gems:

cd translator_mms
[sudo] gem install sinatra twilio-ruby unirest sinatra-run-later bing_translator

Next we’ll create the file that will hold our application:

touch app.rb

We will build the translation buddy using Sinatra which is a lightweight Ruby web framework. We’ll also use the Twilio Ruby gem to interact with the Twilio APIs just to make things a little easier to work with. I’m using the Unirest gem to make REST API calls to the CamFind API, but you are free to make those REST API calls another way if you have a preference (e.g. Rest Client).

Now that our project structure is set up, let’s translate some pictures!

Building the translation service

Our server code will only require one endpoint to handle incoming MMS messages so this project is a great fit for a lightweight framework like Sinatra. Keeping it simple on the server-side allows us to focus on the logic needed to translate our images.

Let’s get started by setting up the Sinatra server and the various dependencies we’ll need:

require 'sinatra'
require 'twilio-ruby'
require 'unirest'
require 'sinatra/run-later'
require 'bing_translator'

post '/translate' do
  # Server code will go here...

The endpoint we set up at /translate will be called by Twilio when a message is received by our Twilio number. Before the request we’ll set up some variables in a before filter that we can use throughout the app:

before do
  # Set up some variables to use in the run_later code.
  @requested_language = params[:Body].strip
  @picture_url = params[:MediaUrl0]
  @incoming_number = params[:From]

Here we are storing the text the user sends with the message. This will either be the language the user wishes to translate to or the “list” keyword. We also store their phone number and the URL of the picture they sent. The first thing we need to check is whether or not the user sent a language. If the user doesn’t send a language we will default the translator to French:

if @requested_language.nil? || @requested_language.empty?
  # Default to French
  @requested_language = "French"

Next, we’ll check if the text the user sent is the keyword “list”. In this case we want to return a text message to the user indicating valid languages for translation:

if @requested_language.downcase == "list"
  # Return the allowed language list...
  twiml = do |r|
    r.Message "Supported languages for translation are: Spanish, French, German, Italian, and Klingon. Please send one of these with a picture and I'll translate it for you! Default language is French if one is not specified."

  return twiml.text

Now that we’ve determined whether or not there is text in the incoming message we also need to check whether or not there is a picture. If there isn’t, we won’t have anything to translate so we’ll alert the user with a response text message:

if @picture_url.nil? || @picture_url.empty?
  twiml = do |r|
    r.Message "No image sent. Please send a picture with text indicating a supported translation language."

  return twiml.text

The next thing we have to do is cross reference the language sent in by the user with a list of languages that we support. Add the following method to the top of app.rb right below the require statements:

def check_language(language)
  case language
  when /spanish/
    return "es"
  when /french/
    return "fr"
  when /german/
    return "de"
  when /italian/
    return "it"
  when /klingon/
    return "tlh"
    return nil

The check_language method compares the target language against our supported languages. If it is a match we return the shorthand format Microsoft Translator will be expecting in the translation process. Now we need to call this method from our /translate endpoint with the requested language:

# Check language
@language_format = check_language(@requested_language.downcase)

If @language_format is nil, the language the user requested is not supported. Let’s inform the user of this and let them know what is supported:

if @language_format.nil?
  twiml = do |r|
    r.Message "#{@requested_language} is not a supported translator language. Supported languages for translation are: Spanish, French, German, Italian, and Klingon. Please send one of these along with a picture and I'll translate it for you!"

  return twiml.text

One last thing to do before we do the heavy lifting of translating the image is to let the user know this might take a little bit of time:

content_type "text/xml"

# Provide a quick response before processing the image with Camfind.
twiml = do |r|
  r.Message "Analyzing your image...then I'll translate it. This may take a few..."


Since our image analysis and translation process will take about a minute we want to make sure we do this work after the user has been informed the process has started. The Sinatra gem sinatra-run-later allows us to run code after our /translate endpoint has returned. Add the following code to the top of the /translate endpoint:

# This block will execute after the /translate endpoint returns
run_later do
  # Code to execute after /translate returns goes here...

Image analysis with the CamFind API takes a little bit of time. Some computer vision analysis is done on the image and then, more often than not, a person looks at the image and describes it. When we make the request to the API we will be given a token we can use to request the image analysis details at a later time. We’ll store that token and then wait a minute to allow the analysis process to happen:

token_response = "",
  "X-Mashape-Key" => mashape_key
   "image_request[locale]" => "en_US",
   "image_request[remote_image_url]" => @picture_url

token = token_response.body['token']

# Need to wait for image analysis

At this point we can request the results from the CamFind API:

# Get the details from the analysis
image_response = Unirest.get "{token}",
headers:{"X-Mashape-Key" => mashape_key}

description = image_response.body['name']

The description variable will have CamFind’s description of the picture in English. This is exactly what we need to pass to the Microsoft Translator API to help the user say it in another language. Let’s use the bing_translator gem to make a request to Microsoft Translator with our English text and the target language the user specified:

translator =, bing_translate_secret)
translated = translator.translate description, :from => 'en', :to => @language_format

We now have the final translated version of what the user requested. We can now use the twilio-ruby gem to make a REST API call to send a text message back to the user with the translated text. For comparison purposes we’ll make sure to let them know what CamFind thought the image was in English as well:

client = twilio_accountsid, twilio_authtoken
  to: @incoming_number, 
  from: '+12028001180', 
  body: "Got it, I think your picture contains: #{description}. In #{@requested_language} that would be: #{translated}"

That’s it, now you can translate any picture into a description in another language just by sending an MMS. You should deploy your server code somewhere publicly accessible so that Twilio will be able to contact your server. I recommend Heroku for this and this guide will show you how to deploy your Sinatra app to Heroku. You can view the full project on Github.

Hooking up our app to Twilio

To get our translation service working we need to connect our app to Twilio so that incoming texts will be routed to our app logic. Log in to Twilio and head over to the numbers portal. Click on the number you wish to use for the the translator and configure the Messaging URL to point at your newly deployed Sinatra server:

Number config

Your on-demand translation buddy is good to go! Send in a picture and a language to translate it to and pretty soon you’ll be describing the world around you in multiple languages.

Next steps

I think the idea of having an on-demand translation buddy in your pocket is an awesome thing, but I’m even more stoked to see how you extend it. Here are some ideas for extending what we’ve built here:

  • Create a language learning flash card game based on your translation results
  • Make a voice call using Twilio to read out the results instead of receiving a text message

So far I think we have just scratched the surface of what is possible with Twilio MMS. I think the best use cases will be the ones you amazing developers will create. Please don’t hesitate to share what you build with me. You can email me at or hit me up on Twitter @brentschooley.

Sign up and start building
Not ready yet? Talk to an expert.