Using WhatsApp, Twilio and Azure to Generate Photo Alt-text in Java
Time to read:
 
 AI services like Computer Vision (CV) are getting easier and easier to play with, and we can have some fun by making them available to use from our cellphones. In this post, we will use Java to connect the Twilio API for WhatsApp with Azure’s CV APIs to create a bot that can describe photos. It would be neat to use this for generating alt-text to help make your images more accessible online, for example.
We will need the following to get started with this post
Overview of our app
How it works
When Twilio receives a WhatsApp message it will send an HTTP request to a URL we provide.
Our mission is to create an app in Java which can handle those requests. The app will take the URL of any photo in the WhatsApp message and pass it to the Azure CV API which will generate a description of whatever is in the picture. The app will then grab Azure’s caption and use it as a reply to the original message on WhatsApp.
Are You Ready?
 
 
Create a new project
If you would like to check out the completed code, it can be found on my GitHub repo. Or you can follow along with the post where we will be building a fresh app using Maven.
This command will prompt for a groupId and artifactId. If you’re not sure what those are then check the Maven naming guide - I used lol.gilliard and twilio-whatsapp-azure. We can accept the defaults for version and package. The project will be created in a subdirectory with the same name as the artifactId. Open the project in your favourite IDE.
Create an HTTP server to listen for Twilio webhooks
SparkJava is a microframework for creating web applications in Java - it doesn’t need much code to get started so it’s perfect for this project.
Add SparkJava to the <dependencies> section of pom.xml
Create an App.java file in src/main/java and add a main method that configures SparkJava to respond to HTTP requests:
Run the project from the IDE and browse to http://localhost:4567. You will see the following in the browser:
 
 Now that the app is up and running, add the endpoint which will respond to Twilio’s webhooks. Add these handy error messages to the top of your class:
Then put the following code inside the main method, after the get call we wrote previously:
The XML returned here is Twilio Markup Language (TwiML). For something this small TwiML can be written by hand, but there is also a comprehensive Java helper library for Twilio which can generate TwiML.
The IDE will show an error as we haven’t written the getAzureCVDescription method yet.
Call the Azure CV API
To use Azure’s APIs we will need a free Azure account. Note that to sign up you will need a credit card but everything in this tutorial is available from Azure’s free trial.
Microsoft has written a great quickstart for calling the Azure CV API from Java. We can use the code they provide with a couple of modifications to be able to call it from our own main method, and to extract the image caption from the response.
Add the following Maven dependencies next to spark-core in the pom.xml:
Add these imports to App.java:
Also add the getAzureCVDescription method body, underneath the main method. Don’t forget to add the CV API subscription key. There are Azure docs on getting a subscription key:
App.java should now look like the final code on GitHub.  With this new code added, restart the app in your IDE.
Expose your SparkJava server via ngrok
Ngrok is a great tool for helping to develop webhooks. It provides a temporary internet-accessible URL for your development environment. Install it and run
Ngrok will start, and part of the output will show the hostname that ngrok has created for our web server:
 
 Set up Twilio Sandbox for WhatsApp
Once logged into your Twilio account, visit the Twilio Sandbox for WhatsApp to add your phone number to the sandbox. My sandbox code is salmon-finally but yours will be different:
 
 Configure the sandbox with the ngrok URL and a path of /msg, to be called when a message comes in:
 
 Save the settings, and we are ready to go.
Play with your IRL Alt-text generator
Everything is up and running. Twilio Sandbox for WhatsApp is configured to call a webhook when it gets a message. The Java app will receive HTTP requests from Twilio, will extract and forward the MediaUrl to the Azure CV API and will respond with TwiML that sends back a description of the photo in your message. Nice!
Try it out:
 
 What Next?
There is so much you can make...
- Check out the Twilio API for WhatsApp docs
- See what else you can do with Twilio in our Java Quickstarts and blog posts
- Have a look at the other Azure Cognitive Services APIs and see where your imagination takes you - how about using the handwriting recognition API to recognize phone numbers in text, for example?
I can’t wait to see what you build - let me know about it by email or on Twitter:
The image at the top of this post is a modified version of an image from WOCINTECH stock photos
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.
 
     
    