Build an AI Buddy with Go, OpenAI, Retell AI, and Twilio Programmable Voice

May 27, 2024
Written by
Quadri Sheriff
Contributor
Opinions expressed by Twilio contributors are their own
Reviewed by

Build an AI Buddy with Go, OpenAI, Retell AI, and Twilio Programmable Voice

In this tutorial, you will learn how to build an AI buddy that you can call and chat with about how your day went. You will be using OpenAI to set up the chatbot, Twilio Programmable Voice to initiate voice calls, and Retell AI API to convert responses from your chatbot responses to human-like speech.

Prerequisites

To follow along with this tutorial, you need the following:

Create the Go app

The first step in this tutorial is to set up your project’s directory and install the required dependencies. First, create a new folder for the project.

mkdir go-ai-buddy
cd go-ai-buddy

Then, initialize a new Go module.

go mod init go-ai-buddy

Install the required packages

Now, you need to install four Go packages for this tutorial. These are:

  • Gin: An HTTP web framework for Go
  • Gorilla WebSocket: A Go package for implementing the WebSocket protocol
  • Go OpenAI: An OpenAI wrapper for Go
  • Twilio-go: A Go package for interacting with Twilio APIs
  • GoDotEnv: A Go package for loading environment variables from a .env file.

To install the packages, run the following command in your terminal.

go get github.com/gin-gonic/gin github.com/gorilla/websocket github.com/sashabaranov/go-openai github.com/twilio/twilio-go/twiml github.com/joho/godotenv

Retrieve the required environment variables

Then, create a .env file in your project’s root folder and add the following code to the file.

OPENAI_API_KEY=<ENTER_OPENAI_SECRET_KEY> RETELL_API_KEY=<ENTER_RETELL_AI_SECRET_KEY>

Replace <ENTER_OPENAI_SECRET_KEY> with your OpenAI API key. Then, replace <ENTER_RETELL_AI_SECRET_KEY> with your Retell AI API key.

To retrieve your OpenAI API key, log in to your OpenAI home page, select API Keys in the sidebar, and click "+ Create new secret key". Enter a name for your secret key, set the permissions, and then click Create secret key. Finally, copy your secret key in the modal that appears.

To retrieve your Retell AI API key, log in to your Retell AI dashboard, navigate to API Keys, and copy your API key under Credentials.

Write the go code

Following that, create a main.go file, which will serve as the entry point to your application, in your project’s root folder, and add the following code to the file.

package main

import (
	"log"
	"os"

	"github.com/joho/godotenv"
)

func main() {
	err := godotenv.Load()
	if err != nil {
		log.Fatal("cannot retrieve env file")
	}
}

func GetRetellAISecretKey() string {
	return os.Getenv("RETELL_API_KEY")
}

func GetOpenAISecretKey() string {
	return os.Getenv("OPENAI_API_KEY")
}

The godotenv.Load() command in the main() function loads your environment variables from your .env file to your application’s environment using GoDotEnv. The GetOpenAISecretKey() function will be used to retrieve your OpenAI secret key, while the GetRetellAISecretKey() function will be used to retrieve your Retell AI secret key.

Set up the Twilio Programmable Voice webhook

The next step is to set up a webhook handler for Twilio phone calls. Webhooks are HTTP handlers that allow for event-driven communications between two applications, i.e., in this project, communications between Twilio and your AI server.

To enable them, add the following code to the bottom of your main.go file.

type RegisterCallRequest struct {
   AgentID                string `json:"agent_id"`
   AudioEncoding          string `json:"audio_encoding"`
   AudioWebsocketProtocol string `json:"audio_websocket_protocol"`
   SampleRate             int    `json:"sample_rate"`
}

type RegisterCallResponse struct {
   AgentID                string `json:"agent_id"`
   AudioEncoding          string `json:"audio_encoding"`
   AudioWebsocketProtocol string `json:"audio_websocket_protocol"`
   CallID                 string `json:"call_id"`
   CallStatus             string `json:"call_status"`
   SampleRate             int    `json:"sample_rate"`
   StartTimestamp         int    `json:"start_timestamp"`
}

func Twiliowebhookhandler(c *gin.Context) {
   agent_id := c.Param("agent_id")

   callinfo, err := RegisterRetellCall(agent_id)
   if err != nil {
       c.JSON(http.StatusInternalServerError, "cannot handle call atm")
       return
   }

   twilloresponse := &twiml.VoiceStream{
       Url: "wss://api.retellai.com/audio-websocket/" + callinfo.CallID,
   }

   twiliostart := &twiml.VoiceConnect{
       InnerElements: []twiml.Element{twilloresponse},
   }

   twimlResult, err := twiml.Voice([]twiml.Element{twiliostart})
   if err != nil {
       c.JSON(http.StatusInternalServerError, "cannot handle call atm")
       return
   }

   c.Header("Content-Type", "text/xml")
   c.String(http.StatusOK, twimlResult)
}

func RegisterRetellCall(agent_id string) (RegisterCallResponse, error) {
   request := RegisterCallRequest{
       AgentID:                agent_id,
       AudioEncoding:          "mulaw",
       SampleRate:             8000,
       AudioWebsocketProtocol: "twilio",
   }

   request_bytes, err := json.Marshal(request)
   if err != nil {
       return RegisterCallResponse{}, err
   }

   payload := bytes.NewBuffer(request_bytes)

   request_url := "https://api.retellai.com/register-call"
   method := "POST"

   var bearer = "Bearer " + GetRetellAISecretKey()

   client := &http.Client{}
   req, err := http.NewRequest(method, request_url, payload)
   if err != nil {
       return RegisterCallResponse{}, err
   }

   req.Header.Add("Authorization", bearer)
   req.Header.Add("Content-Type", "application/json")
   res, err := client.Do(req)
   if err != nil {
       return RegisterCallResponse{}, err
   }
   defer res.Body.Close()

   body, err := io.ReadAll(res.Body)
   if err != nil {
       return RegisterCallResponse{}, err
   }

   var response RegisterCallResponse

   json.Unmarshal(body, &response)

   return response, nil
}

Then, update the import list to match the following:

import (
   "bytes"
   "encoding/json"
   "io"
   "log" 
   "net/http"
   "os"

   "github.com/gin-gonic/gin"
   "github.com/joho/godotenv"
   "github.com/twilio/twilio-go/twiml"
)

The code defines two structs. The first (RegisterCallRequest) helps pass the required request parameters in the request to Retell AI. The second (RegisterCallResponse) helps marshall the JSON in the body of the response from the request to Retell AI. 

Then, there's an HTTP handler that receives the webhook request from Twilio after someone makes a call to your Twilio phone number.  The handler registers a call with Retell AI, setting the call encoding algorithm(AudioEncoding) to mulaw and the call sample rate (SampleRate) to 8000 as required by Twilio Media Streams, and receives the call's ID. 

The call ID is then used to configure a websocket URL that will be returned to Twilio via the webhook request. The websocket server will handle all audio exchanges between Twilio and your backend AI server via Retell AI.

Also, to allow for bidirectional audio exchange between Twilio and Retell AI over the websocket, the websocket URL is connected to Twilio using the TwiML markup language Voice Stream instruction wrapped in a <Connect> verb.

Next up, update the main() function in main.go to match the following. It registers your webhook route and configures the application to listen on port 8081.

func main() {
   err := godotenv.Load()
   if err != nil {
   	log.Fatal("cannot retrieve env file")
   }

   app := gin.Default()
   app.POST("/twilio-webhook/:agent_id", Twiliowebhookhandler)
   app.Run("localhost:8081")
}

Set up your backend AI server

The next step is to set up the backend AI server. When Twilio connects to your Retell AI agent via the Retell AI WebSocket URL returned in the Twilio voice webhook, Retell AI connects the call to your backend AI server using the WebSocket connection that you are about to create.

Add the following code to the bottom of your main.go file to create a websocket route.

type Transcripts struct {
   Role    string `json:"role"`
   Content string `json:"content"`
}

type Request struct {
   ResponseID      int           `json:"response_id"`
   Transcript      []Transcripts `json:"transcript"`
   InteractionType string        `json:"interaction_type"`
}
type Response struct {
   ResponseID      int    `json:"response_id"`
   Content         string `json:"content"`
   ContentComplete bool   `json:"content_complete"`
   EndCall         bool   `json:"end_call"`
}

func Retellwshandler(c *gin.Context) {
   upgrader := websocket.Upgrader{}

   upgrader.CheckOrigin = func(r *http.Request) bool {
       return true
   }

   conn, err := upgrader.Upgrade(c.Writer, c.Request, nil)
   if err != nil {
       log.Fatal(err)
   }

   response := Response{
       ResponseID:      0,
       Content:         "Hello, I'm your AI buddy. How can I help you today?",
       ContentComplete: true,
       EndCall:         false,
   }

   err = conn.WriteJSON(response)
   if err != nil {
       log.Fatal(err)
   }

   for {
       messageType, ms, err := conn.ReadMessage()
       if err != nil {
           conn.Close()

           break
       }

       if messageType == websocket.TextMessage {
           var msg Request
           json.Unmarshal(ms, &msg)
	    log.Println(msg)
       }
   }
}

Then, add the following to your import list:

"github.com/gorilla/websocket"

In this code, we created a websocket route that returns "Hello, I'm your AI buddy. How did your day go?" on a successful connection, and logs your messages to the terminal. 

Now, register the websocket route in your main() function, by updating the function to match the following code.

func main() {
   err := godotenv.Load()
   if err != nil {
       log.Fatal("cannot retrieve env file")
   }

   app := gin.Default()
   app.Any("/llm-websocket/:call_id", Retellwshandler)
   app.POST("/twilio-webhook/:agent_id", Twiliowebhookhandler)
   app.Run("localhost:8081")
}

Implement your AI server

To do so, add the following code to the bottom of your main.go file.

func HandleWebsocketMessages(msg Request, conn *websocket.Conn) {
   client := openai.NewClient(GetOpenAISecretKey())

   if msg.InteractionType == "update_only" {
       log.Println("update interaction, do nothing.")
       return
   }

   prompt := GenerateAIRequest(msg)

   req := openai.ChatCompletionRequest{
       Model:       openai.GPT3Dot5Turbo,
       Messages:    prompt,
       Stream:      true,
       MaxTokens:   200,
       Temperature: 1.0,
   }
   stream, err := client.CreateChatCompletionStream(context.Background(), req)
   if err != nil {
       log.Println(err)
       conn.Close()
   }

   defer stream.Close()
   var i int
   for {
       response, err := stream.Recv()
       if err != nil {
           var s string
           if (errors.Is(err, io.EOF) && i == 0) || (!errors.Is(err, io.EOF)) {
               s = "[ERROR] NO RESPONSE, PLEASE RETRY"
           }

           if errors.Is(err, io.EOF) && i != 0 {
               s = "\n\n###### [END] ######"
           }
           airesponse := Response{
               ResponseID:      msg.ResponseID,
               Content:         s,
               ContentComplete: false,
               EndCall:         false,
           }
         

           out, err := json.Marshal(airesponse)
           if err != nil {
               log.Println(err)
               conn.Close()
           }

           err = conn.WriteMessage(websocket.TextMessage, out)
           if err != nil {
               log.Println(err)
               conn.Close()
           }

           break
       }
       if len(response.Choices) > 0 {
           s := response.Choices[0].Delta.Content

           airesponse := Response{
               ResponseID:      msg.ResponseID,
               Content:         s,
               ContentComplete: false,
               EndCall:         false,
           }
           log.Println(airesponse)

           out, _ := json.Marshal(airesponse)

           err = conn.WriteMessage(websocket.TextMessage, out)
           if err != nil {
               log.Println(err)
               conn.Close()
           }
       }
       i = i + 1
   }
}

func GenerateAIRequest(msg Request) []openai.ChatCompletionMessage {
   var airequest []openai.ChatCompletionMessage

   systemprompt := openai.ChatCompletionMessage{
       Role:    "system",
       Content: "##Objective\\n You are an AI voice agent engaging in a human-like voice conversation with a user. You will respond based on your given instruction and the provided transcript and be as human-like as possible\\n\\n## Style Guardrails\\n- [Be concise] Keep your response succinct, short, and get to the point quickly. Address one question or action item at a time. Do not pack everything you want to say into one utterance.\\n- [Do not repeat] Do not repeat what is in the transcript. Rephrase if you have to reiterate a point. Use varied sentence structures and vocabulary to ensure each response is unique and personalized.\\n- [Be conversational] Speak like a human as though you are speaking to a close friend -- use everyday language and keep it human-like.\\n\\n## Role\\n\r\nTask: As an AI friend, you are to have a chat with the user about how his or her day went. Your role involves giving advice, listening, and acting as a close friend.\\n\\nConversational Style: Communicate concisely and conversationally. Aim for responses in short, clear prose, ideally under 10 words. This succinct approach helps in maintaining clarity and focus during your interaction with your friend.\\n\\nPersonality: Your approach should be empathetic, understanding, and informal. Do not repeat what is in the transcript.",
   }

   airequest = append(airequest, systemprompt)

   for _, response := range msg.Transcript {
       var p_response openai.ChatCompletionMessage

       if response.Role == "agent" {
           p_response.Role = "assistant"
       } else {
           p_response.Role = "user"
       }

       p_response.Content = response.Content

       airequest = append(airequest, p_response)
   }

   return airequest
}

Then, update the imports list to match the following:

import (
    "bytes"
    "context"
    "encoding/json"
    "errors"
    "io"
    "log"
    "net/http"
    "os"

    "github.com/gin-gonic/gin"
    "github.com/gorilla/websocket"
    "github.com/joho/godotenv"
    "github.com/sashabaranov/go-openai"
    "github.com/twilio/twilio-go/twiml"
)

In this code, you created two functions, HandleWebsocketMessages() and GenerateAIRequest(), for generating AI responses. The HandleWebsocketMessages() function:

  • Takes in messages from your WebSocket handler

  • Converts the messages to an OpenAI ChatGPT prompt using the GenerateAIRequest() function

  • Makes a call to the OpenAI API to generate a response

  • Returns the response back to Retell AI via the WebSocket connection

Update your WebSocket handler code

Now, update your Retellwshandler() websocket handler to match the following so that it calls the new HandleWebsocketMessages() function. Messages sent to your websocket route will be transferred to the HandleWebsocketMessages() to generate an AI response with the OpenAI API, which the WebSocket route will return.

func Retellwshandler(c *gin.Context) {
   upgrader := websocket.Upgrader{}

   upgrader.CheckOrigin = func(r *http.Request) bool {
       return true
   }

   conn, err := upgrader.Upgrade(c.Writer, c.Request, nil)
   if err != nil {
       log.Fatal(err)
   }

   response := Response{
       ResponseID:      0,
       Content:         "Hello, I'm your AI buddy. How can I help you today?",
       ContentComplete: true,
       EndCall:         false,
   }

   err = conn.WriteJSON(response)
   if err != nil {
       log.Fatal(err)
   }

   for {
       messageType, ms, err := conn.ReadMessage()
       if err != nil {
           conn.Close()

           break
       }

       if messageType == websocket.TextMessage {
           var msg Request
           json.Unmarshal(ms, &msg)
          
           HandleWebsocketMessages(msg, conn)
       }
   }
}

Then, update your import code to the following to import the Go packages that you used.

import (
   "bytes"
   "context"
   "encoding/json"
    "errors"
   "io"
   "log"
   "net/http"
   "os"


   "github.com/gin-gonic/gin"
   "github.com/gorilla/websocket"
   "github.com/joho/godotenv"
   "github.com/sashabaranov/go-openai"
   "github.com/twilio/twilio-go/twiml"
)

Update your main() function to the following to register your LLM WebSocket route.

func main() {
   err := godotenv.Load()
   if err != nil {
       log.Fatal("cannot retrieve env file")
   }
   app := gin.Default()
   app.Any("/llm-websocket/:call_id", Retellwshandler)
   app.POST("/twilio-webhook/:agent_id", Twiliowebhookhandler) 
   app.Run("localhost:8081")
}

Then, start your Go server.

go run main.go

After that,  in a new terminal session or tab run the following command. This connects the server running on your local machine to Twilio and Retell AI.

ngrok http 8081

Create a Retell AI agent for your AI Buddy

Then, run the request in a new terminal session or tab to generate a Retell AI agent for your AI server. Make sure to change <ENTER_RETELL_AI_SECRET_KEY> to your Retell AI secret key, and <YOUR_LLM_WEBSOCKET_URL> to your WebSocket URL. Your WebSocket should be in the following format: wss://<ngrok_url>/llm-websocket/

This agent will convert the AI response text to speech.

curl --request POST \
  --url https://api.retellai.com/create-agent \
  --header 'Authorization: Bearer <ENTER_RETELL_AI_SECRET_KEY>' \
  --header 'Content-Type: application/json' \
  --data '{
  "llm_websocket_url": "<YOUR_LLM_WEBSOCKET_URL>",
  "voice_id": "11labs-Adrian",
  "enable_backchannel": true,
  "agent_name": "Jarvis"
}'

Your AI agent should be added to your Retell AI dashboard if the registration was successful.

Connect your webhook URL to Twilio

To connect your webhook URL to Twilio, open your Twilio Console and navigate to Phone Numbers > Manage > Active Numbers. Select your Twilio phone number, and open the Configure tab.

Under Voice Configuration, set your Configure with dropdown to "Webhook, TwiML Bin, Function, Studio Flow, Proxy Service". Set the "A call comes in" dropdown to "Webhook", add your webhook URL to your webhook URL, and set "HTTP" to HTTP POST. Your webhook URL should be in the following format: https://<ngrok_url>/twilio-webhook/<retell_ai_agent_id>. You can find the Retell AI agent id in the response from the curl request to Retell AI.

Finally, scroll down and click Save configuration to register your changes. 

Chat with your AI Buddy

To chat with your AI buddy, make a call to your Twilio phone number. Your AI buddy should pick up your call and have a chat with you.

That's how to build an AI buddy with Go, OpenAI, Retell AI, and Twilio Programmable Voice

In this tutorial, you learned how to build an AI buddy that you can call and chat with about how your day went, using Twilio Programmable Voice, Retell AI, and OpenAI. You can also edit the AI prompt and convert your AI from a friend to a therapist, a coach, etc. 

The entire code for this tutorial can be found on GitHub.

Quadri Sheriff is a Software developer and tech-savvy technical writer who specializes in Developer and API documentation and has strong knowledge of docs-as-code tools.