Build a Text to Image Service via SMS

Developers on a laptop testing out a text-to-image service
October 25, 2022
Written by
Reviewed by
Paul Kamp
Twilion

This past summer, OpenAI announced that Dall•E is open for public beta. Dall•E is a “text to image” service. In it, you tell Dall•E what you want to see, and within a few minutes you receive AI generated images. You’ve probably seen memes where creative users have instructed Dall•E to build silly images like “Yoda as a waiter” or “aliens at the Eiffel Tower”.

At the moment of writing this blog post, Dall•E doesn’t have a public API exposed. However, there are multiple competitors in this space, and some of them have public APIs. In this tutorial we will use a similar API, Stable Diffusion, to generate images when we text in.

Generating "Fancy Cakes" with MMS and Stable Diffusion

Prerequisites

To complete the tutorial, you will need the following:

  • Twilio Account SID and Auth Token. If you don’t already have a Twilio account, you can create one for free here.
  • Twilio API Key. You can create one programmatically or within the Twilio console.
  • Twilio Phone Number. You can buy a local number in seconds either programmatically, or within the console.
  • Dream Studio API Key. Dream Studio offers a suite of generative media tools. It’s free to create an account, and when you sign up, you get some free testing credits. 🥳
  • Ngrok. Ngrok allows you to publish applications quickly and easily. You can create an ngrok account for free on their website.
  • Docker. Docker is a platform designed to help developers build, share, and run modern applications. Download and install the correct Docker version for your operating system.
  • Optional to deploy to the cloud. An account with Render (paid).

Bannerbear

  1. Create a Bannerbear Account.
  2. After creating the account, add this template to your project.

BannerBear interface showing the template
  1. Take a note of the Template ID. You will need this later.
  2. Take a note of the Project API Key. You can find the API Key in your Project → Settings page.

Setup your SMS to image service

Ngrok

Execute the following command in a terminal window:

ngrok http 8080

Take a note of the Forwarding URL.

ngrok interface showing locally developed Stable Diffusion app

Twilio

  1. Navigate to your Twilio Phone Numbers.
  2. Select one of your phone numbers.
  3. Scroll down to the Messaging section.
  4. Add your Forwarding URL with /sms append to it. It should look something like this: https://1337.ngrok.io/sms
Adding a Twilio messaging webhook
  • Click Save.
  • Docker

    1. Execute the following commands in your CLI:

$ git clone https://github.com/anthonywong555/Stable-Diffusion-SMS
$ cd Stable-Diffusion-SMS
$ npm install
$ cp .env-example .env

2. Open the .env file in your favorite text editor and update your credentials accordingly.

3. Execute the following command:

docker compose -f "docker-compose.dev.yml" up -d --build

4. Start texting to your phone number!

How does the text to AI image app work?

Here is a high level diagram on how the whole application works:

Architecture diagram of a text-to-AI image program

Now let’s take a closer look at the code. The main code is in index.js.

app.post('/sms', async (req, res) => {
 try {
   const {headers, body = ''} = req;
   const twilioSignature = headers['x-twilio-signature'];
   const url = `${process.env.PRODUCTION_BASE_URL}/sms`;
   const requestIsValid = twilio.validateRequest(
     process.env.TWILIO_AUTH_TOKEN,
     twilioSignature,
     url,
     body
   );
    
   if (process.env.NODE_ENV === 'production') {
     if(!requestIsValid) {
       return res.status(403).send('Forbidden');
     }
   }

   if(body.Body.length) {
     res.send('<Response><Message>Processing...</Message></Response>');
     await driver(body);
   } else {
     res.send('<Response><Message>Please send a detailed description of what you want to see.</Message></Response>');
   }
 } catch (e) {
   console.error(`An error has occurred: \n${e}`);
   return res.status(500).send('Internal Server Error');
 }
});

When Twilio receives the SMS, it will forward the response to our Node JS Application. This endpoint will check to see if that request came from Twilio by validating the x-twilio-signature. Once we have confirmed that the request came from Twilio, we will move forward to the main logic.

We take the user's responses and forward it off to the unofficial stability-ts sdk.

const generateStableDiffusionImages = async (prompt) => {
 return new Promise((resolve, reject) => {
   const samples = process.env.STABLE_DIFFUSION_SAMPLES ? parseInt(process.env.STABLE_DIFFUSION_SAMPLES) : 1;
   const results = [];

   const stabilityClient = generate({
     prompt,
     samples,
     apiKey: process.env.DREAMSTUDIO_API_KEY,
     width: process.env.BANNER_BEAR_IMAGE_TEMPLATE_IMAGE_WIDTH,
     height: process.env.BANNER_BEAR_IMAGE_TEMPLATE_IMAGE_HEIGHT,
     outDir: 'public'
   });


   stabilityClient.on('image', ({buffer, filePath}) => {
     results.push({buffer, filePath});

     if(results.length === samples) {
       resolve(results);
     }
   });

   stabilityClient.on('end', (response) => {
     if(!response.isOk) {
       reject(response);
     }
   });       
 });
}

After getting all the images from Dreamstudio, we want to stitch the image with the text together so it will be easier to share. We use the service Bannerbear to do this.

const generateBannerBearImage = async (image_url, text, templateId) => {
 return await bbClient.create_image(templateId, {
   modifications: [
     {
       image_url,
       name: "image",
     },
     {
       text,
       name: "title",
     }
   ]
 }, true);
}

This function is pretty straightforward, we pass in the public facing url of the images, the text from the user, and our Bannerbear Template Id.

Once we get a response back from Bannerbear, we clean up the results to get a list of PNG URLs and UIDs.

const bannerBearImages = await Promise.all(bannerBearPromies);
const bannerBearImageURLs = bannerBearImages.map((aBBImage) => aBBImage.image_url_png);
const bannerBearImageUIDs = bannerBearImages.map((aBBImage) => aBBImage.uid);

Next, we need to iterate through the array of PNG URLs and start sending SMS:

// Generate Twilio SMS Promies
const TwilioPromies = bannerBearImageURLs.map(async (mediaUrl) => {
  return await twilioClient.messages.create({
    mediaUrl,
      to: From,
      from: To
    });
});

const twilioSMSResponses = await Promise.all(TwilioPromies);

And we are done! The final flow should look like this:

const driver = async (twilioRequest) => {
 try {
   const {To, From, Body} = twilioRequest;
   const generatedImages = await generateStableDiffusionImages(Body);
   const generatedImagesLocalURLs = generatedImages.map((aGeneratedImage) => aGeneratedImage.filePath);
  
   console.log(`generatedImagesLocalURLs: ${generatedImagesLocalURLs}`);

   // Generate BannerBear Promies
   const bannerBearPromies = generatedImagesLocalURLs.map(async(localURLs) => {
     const filePath = localURLs.replace('/home/node/app/', '');
     const fullURL = `${process.env.PRODUCTION_BASE_URL}/${filePath}`;
     return await generateBannerBearImage(fullURL, Body, process.env.BANNER_BEAR_IMAGE_TEMPLATE_ID);
   });

   const bannerBearImages = await Promise.all(bannerBearPromies);
   const bannerBearImageURLs = bannerBearImages.map((aBBImage) => aBBImage.image_url_png);
   const bannerBearImageUIDs = bannerBearImages.map((aBBImage) => aBBImage.uid);

   console.log(`bannerBearImageUIDs: ${bannerBearImageUIDs}`);

   // Generate Twilio SMS Promies
   const TwilioPromies = bannerBearImageURLs.map(async (mediaUrl) => {
     return await twilioClient.messages.create({
       mediaUrl,
       to: From,
       from: To
     });
   });

   const twilioSMSResponses = await Promise.all(TwilioPromies);
   const twilioSMSSIDs = twilioSMSResponses.map((aSMSResponse) => aSMSResponse.sid);
    
   console.log(`twilioSMSSIDs: ${twilioSMSSIDs}`);
 } catch (e) {
   throw e;
 }
}

Optional: Deploy the app to the Cloud

In the Github, I also offer a one click deploy to your Render. Render is an alternative to Heroku. The only thing you should keep in mind is it will cost you money. 💸

  1. Create a Render Account.
  2. Click Deploy To Render Button.
  3. Set your Service Group Name to be “stable-diffusion-sms” and click on the Create New Resource button.
Create a new Render service

How to set environment variables in Render
  • Click on Environment
  • Add all the environment variables values from the .env-example to here, and click the Save Changes button.
Setting environment variables in Render
  • Copy the URL, append the /sms at the end, and paste in the Twilio Phone Number SMS Webhook
Public Render URL to paste into Twilio webhook

Conclusion

Congratulations! Now you can share this phone number with all your friends and family to try out. There are two points I want you to take note of:

  1. As more “text-to-images” services become available you can swap out the existing service with a different one and keep the existing logic.
  2. Since we didn’t hard code the Twilio Phone Number, we can buy additional Phone Numbers to connect to our application. This way you can add your own custom logic to limit who can access this service.

If you want to learn more about integrating AI services with Twilio, please check out the following blog posts:

Anthony Wong is a Principal Solutions Engineer at Twilio. He’s focused on building cool and fun demos using Twilio. He is best known for his Salesforce expertise. He can be found at anwong [at] twilio.com