Using WhatsApp, Twilio and Azure to Generate Photo Alt-text in Java

March 29, 2019
Written by

People programming together

AI services like Computer Vision (CV) are getting easier and easier to play with, and we can have some fun by making them available to use from our cellphones. In this post, we will use Java to connect the Twilio API for WhatsApp with Azure’s CV APIs to create a bot that can describe photos. It would be neat to use this for generating alt-text to help make your images more accessible online, for example.

We will need the following to get started with this post

Overview of our app

How it works

When Twilio receives a WhatsApp message it will send an HTTP request to a URL we provide.

Our mission is to create an app in Java which can handle those requests. The app will take the URL of any photo in the WhatsApp message and pass it to the Azure CV API which will generate a description of whatever is in the picture. The app will then grab Azure’s caption and use it as a reply to the original message on WhatsApp.

Are You Ready?

Let's Get Started!

 

Create a new project

If you would like to check out the completed code, it can be found on my GitHub repo. Or you can follow along with the post where we will be building a fresh app using Maven.

mvn archetype:generate \
 -DarchetypeGroupId=pl.org.miki \
 -DarchetypeArtifactId=java8-quickstart-archetype \
 -DarchetypeVersion=1.0.0 \
 -DtestLibrary=none

This command will prompt for a groupId and artifactId. If you’re not sure what those are then check the Maven naming guide - I used lol.gilliard and twilio-whatsapp-azure. We can accept the defaults for version and package. The project will be created in a subdirectory with the same name as the artifactId. Open the project in your favourite IDE.

Create an HTTP server to listen for Twilio webhooks

SparkJava is a microframework for creating web applications in Java - it doesn’t need much code to get started so it’s perfect for this project.

Add SparkJava to the <dependencies> section of pom.xml

<dependency>
    <groupId>com.sparkjava</groupId>
    <artifactId>spark-core</artifactId>
    <version>2.7.2</version>
</dependency>

Create an App.java file in src/main/java and add a main method that configures SparkJava to respond to HTTP requests:

import static spark.Spark.*;

public class App {

    public static void main(String[] args) {
        get("/", (req, res) -> "Hello \uD83D\uDC4B");
    }
}

Run the project from the IDE and browse to http://localhost:4567. You will see the following in the browser:

image of localhost

Now that the app is up and running, add the endpoint which will respond to Twilio’s webhooks. Add these handy error messages to the top of your class:

private static final String NO_IMAGE_MESSAGE =
       "<Response><Message>I can't help if you don't send an image \uD83D\uDE09</Message></Response>";
private static final String NO_DESCRIPTION_MESSAGE =
       "<Response><Message>Sorry, I couldn't describe that \uD83D\uDE23</Message></Response>";

Then put the following code inside the main method, after the get call we wrote previously:

post("/msg", (req, res) -> {

    String mediaUrl = req.queryParams("MediaUrl0");
    if (mediaUrl == null) return NO_IMAGE_MESSAGE;

    String description = getAzureCVDescription(mediaUrl);
    if (description == null) return NO_DESCRIPTION_MESSAGE;;

    // Return TwiML to send the description back to WhatsApp
    return "<Response><Message>It’s " + description + "</Message></Response>";
});

The XML returned here is Twilio Markup Language (TwiML). For something this small TwiML can be written by hand, but there is also a comprehensive Java helper library for Twilio which can generate TwiML.

The IDE will show an error as we haven’t written the getAzureCVDescription method yet.

Call the Azure CV API

To use Azure’s APIs we will need a free Azure account. Note that to sign up you will need a credit card but everything in this tutorial is available from Azure’s free trial.

Microsoft has written a great quickstart for calling the Azure CV API from Java. We can use the code they provide with a couple of modifications to be able to call it from our own main method, and to extract the image caption from the response.

Add the following Maven dependencies next to spark-core in the pom.xml:

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
    <version>4.5.6</version>
</dependency>

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpcore</artifactId>
    <version>4.4.10</version>
</dependency>

<dependency>
    <groupId>org.json</groupId>
    <artifactId>json</artifactId>
    <version>20180813</version>
</dependency>

Add these imports to App.java:

import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.client.utils.URIBuilder;
import org.apache.http.entity.StringEntity;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClientBuilder;
import org.apache.http.util.EntityUtils;
import org.json.JSONObject;

import java.net.URI;

Also add the getAzureCVDescription method body, underneath the main method. Don’t forget to add the CV API subscription key. There are Azure docs on getting a subscription key:

private static String getAzureCVDescription(String mediaUrl) {

    // Replace <Subscription Key> with your valid subscription key.
    // SECURITY WARNING: Do NOT commit this to GitHub or put it anywhere public
    String subscriptionKey = "<Subscription Key>";

    String uriBase = "https://westcentralus.api.cognitive.microsoft.com/vision/v2.0/analyze";

    CloseableHttpClient httpClient = HttpClientBuilder.create().build();

    try {
        URIBuilder builder = new URIBuilder(uriBase);

        // Request parameters. All of them are optional
        builder.setParameter("visualFeatures", "Description");
        builder.setParameter("language", "en");

        // Prepare the URI for the REST API method.
        URI uri = builder.build();
        HttpPost request = new HttpPost(uri);

        // Request headers.
        request.setHeader("Content-Type", "application/json");
        request.setHeader("Ocp-Apim-Subscription-Key", subscriptionKey);

        // Request body.
        StringEntity requestEntity = new StringEntity("{\"url\":\"" + mediaUrl + "\"}");
        request.setEntity(requestEntity);

        // Call the REST API method and get the response entity.
        HttpResponse response = httpClient.execute(request);
        HttpEntity entity = response.getEntity();

        if (entity != null)
            // Format and display the JSON response.
           String jsonString = EntityUtils.toString(entity);
           JSONObject json = new JSONObject(jsonString);

           // This extracts the caption from the JSON returned by Azure CV
           return json
                 .getJSONObject("description")
                 .getJSONArray("captions")
                 .getJSONObject(0)
                 .getString("text");
        }
    } catch (Exception e) {
        // Display error message.
        System.out.println(e.getMessage());
    }
    return null;
}

App.java should now look like the final code on GitHub.  With this new code added, restart the app in your IDE.

Expose your SparkJava server via ngrok

Ngrok is a great tool for helping to develop webhooks. It provides a temporary internet-accessible URL for your development environment. Install it and run

ngrok http 4567

Ngrok will start, and part of the output will show the hostname that ngrok has created for our web server:

image of ngrok port forwarding

Set up Twilio Sandbox for WhatsApp

Once logged into your Twilio account, visit the Twilio Sandbox for WhatsApp to add your phone number to the sandbox. My sandbox code is salmon-finally but yours will be different:

image of sandbox participants invitation

Configure the sandbox with the ngrok URL and a path of /msg, to be called when a message comes in:

image of sandbox configuration

Save the settings, and we are ready to go.

Play with your IRL Alt-text generator

Everything is up and running. Twilio Sandbox for WhatsApp is configured to call a webhook when it gets a message. The Java app will receive HTTP requests from Twilio, will extract and forward the MediaUrl to the Azure CV API and will respond with TwiML that sends back a description of the photo in your message. Nice!

Try it out:

image of whatsapp chat

What Next?

There is so much you can make...

I can’t wait to see what you build - let me know about it by email or on Twitter:

mgilliard@twilio.com

@MaximumGilliard

 

The image at the top of this post is a modified version of an image from WOCINTECH stock photos