Implementing Multi-Party Calls with VoIP and GSM using the Programmable Voice API

September 19, 2019
Written by
Sasankh Munukutla
Contributor
Opinions expressed by Twilio contributors are their own

Blog post header: Implementing Multi-Party calls with VoIP and GSM using Programmable Voice

At Tarjimly, we provide free on-demand translation services for refugees and people in need of humanitarian service. We are supported by the Twilio.org Impact Fund, as we are a tech nonprofit tackling the world’s toughest problems through the power of communications. 

Using Twilio Programmable Voice, our new feature allows translators and aid workers to add additional people to two-way calls (for example, a lawyer or a doctor who can give advice remotely.) While you can do this with a regular conference call, we allow the experience to feel like a regular call to the end-user – backed by a robust infrastructure in the backend supporting up to 250 people on the line.

In this guide, we aim to take your existing VoIP two-user call capability and build a robust infrastructure that can handle dialing in other users – using only their phone numbers.

Tarjimly Three-Way Calls diagram

This guide will focus on the backend, a Python Flask App for Twilio. The endpoints in the backend are called from your frontend (Web/Mobile App) and the Twilio console.

Getting Started with VoIP Calls

To complete this tutorial, you will need the following:

Twilio Console Setup

Now that you have your Twilio account and all of the prerequisites, we’ll go through Twilio’s console setup. 

Conference Type Setup

To make it seamless to add additional people to a call, we now make all calls as conferences, including two-way calls behind the scenes. We need conferences to be Agent Conferences, in order to be able to add participants (make outbound calls from the conference and add participants in). 

In the Twilio Programmable Voice Console go to Programmable Voice > Conferences > Settings and ensure your conferences are Agent Conferences. There is no difference in cost for an agent conference.

Voice Conference settings page

Set up a Voice Request URL for your TwiML App

In the Twilio Voice Console, now go to Programmable Voice > TwiML > TwiML Apps

Click on your app and set up the Voice Request URL Setup. This is the endpoint that Twilio calls when your Frontend App requests Twilio make a call.

TwiML Apps page

For example, "myURL.com/makeCall" [placeholder] is the endpoint Twilio will call when you receive a call.

If you don’t have Voice URL set up yet, click on your app and configure the Voice URL to your desired endpoint. Make sure the method is "HTTP POST" as you will pass in request values for your endpoint like a to and from for the call.

New TwiML App settings page

Here is how you would set up your endpoint for Twilio Console 

Start a two-user call from your App

Now, let’s write the logic for the endpoint you configured earlier to create a call between two users. For reference, the caller is the person who initiates the call and the callee is the person who receives an incoming call as the target of the caller.

 

"""
Creates an endpoint that can be used in your TwiML App as the Voice Request Url.
In order to make an outgoing call using Twilio Voice SDK, you need to provide a
TwiML App SID in the Access Token. You can run your server, make it publicly
accessible and use `/makeCall` endpoint as the Voice Request Url in your TwiML App.
"""
@app.route('/makeCall', methods=['GET', 'POST'])
def makeCall():
   client = Client(app.config['TWILIO_ACCOUNT_SID'], app.config['TWILIO_AUTH_TOKEN'])
   print("## Making a call")
   resp = VoiceResponse()
   to_id = request.values['To']
   from_id = request.values['From']
   session_id = request.values['SessionID']  # name we use to uniquely identify conference 

 
   # creates the call from the from_id to the to_id, adding the to_id to the conference
   call = client.calls.create(from_="client:"+from_id, to="client:"+to_id,
                           url=app.config['MY_URL']'/joinConf/'+str(session_id),
                              status_callback_event=['completed'],
                            status_callback=app.config['MY_URL']+'/completeCall/'+str(session_id))
   # updates global dictionary to handle edge case where caller leaves before callee picks up  
   global sessionID_to_callsid
   sessionID_to_callsid[session_id] = call.sid

   # dials the caller (from_id) into the conference
   dial = Dial()
   dial.conference(session_id,     
waitUrl='https://twimlets.com/holdmusic?Bucket=my-static-music',
                   status_callback=app.config['MY_URL']+'/leave',
                   status_callback_event="leave join")
   resp.append(dial)

   return str(resp)

# this is an endpoint to add the callee into the conference call
@app.route('/joinConf/<string:call_session_id>', methods=['GET', 'POST'])
def conferenceCall(call_session_id):
   print("## Making a conference call")
   resp = VoiceResponse()
   dial = Dial()
   dial.conference(call_session_id,
                   waitUrl='',
                   status_callback=app.config['MY_URL']+'/leave',
                   status_callback_event="leave")
   resp.append(dial)
   return str(resp)

The entire code is presented here for completeness and each significant portion is repeated in the subsections below for clarity.             

Start the call

This endpoint receives the following values: 

  • From - the client id of the caller
  • To - the client id of the callee
  • SessionID - a unique, friendly name for the conference so we can be sure to add all users to the conference
We first create a call from the `from_id` to the `to_id`. The `url` we specify is defined below in the `joinConf` endpoint that serves the TwiML to add the callee (`to_id`) to this conference if they accept the call.
client = Client(app.config['TWILIO_ACCOUNT_SID'], app.config['TWILIO_AUTH_TOKEN'])
print("## Making a call")
resp = VoiceResponse()
to_id = request.values['To']
from_id = request.values['From']
session_id = request.values['SessionID']  # name we use to uniquely identify conference 

 
# creates the call from the from_id to the to_id, adding the to_id to the conference
call = client.calls.create(from_="client:"+from_id, to="client:"+to_id,
                           url=app.config['MY_URL']'/joinConf/'+str(session_id),
                              status_callback_event=['completed'],
                            
status_callback=app.config['MY_URL']+'/completeCall/'+str(session_id))
We specify a `status_callback_event` and `status_callback` URL when creating the call to handle the edge case where the callee rejects the call and the caller will still be in the conference. (The code and explanation for that endpoint come later in this guide.)  We also create a global dictionary `sessionID_to_callsid` (defined at the top of the file as a global dictionary) to map the `session_id`  to this particular `call.sid`. This ends the conference call when we hit the edge case where the original caller leaves before the callee picks up.
# updates global dictionary to handle edge case where caller leaves before callee picks up  
   global sessionID_to_callsid
   sessionID_to_callsid[session_id] = call.sid

Add the caller to the conference call

Next, we add the caller into the conference call using the session_id as the friendly name.

Note that during the wait between when the caller is added to the conference and the callee joins the conference, the caller will hear wait music (they will be in the conference alone). To make this experience replicate a regular call you can customize the wait music to a phone dialing tune, so the caller feels like they are just waiting on the callee to pick up. 

You can provide a waitUrl inside the console which serves TwiML and the MP3 wait music of your choice. Here is a detailed guide on how to do this, you will need a Public Amazon S3 bucket and an audio file. 


We also specify a status_callback_event and status_callback URL to handle the case where only 1 person is left in the conference. We use the leave event and will use it in other cases as well. For this case, we also include the join event, as we need to get the conference_sid when the callee rejects the call. The code and explanation for these endpoints come later in this guide.

# dials the caller (from_id) into the conference
   dial = Dial()
   dial.conference(session_id,     
waitUrl='https://twimlets.com/holdmusic?Bucket=my-static-music',
                   status_callback=app.config['MY_URL']+'/leave',
                   status_callback_event="leave join")
   resp.append(dial)

   return str(resp)

joinConf Endpoint

The joinConf endpoint adds the callee to the conference call if they accept the call. 

Its logic is very similar to how we add the caller to the conference call. We specify no waitMusic in this case; when the callee joins the call the caller will already be in the call and we do not need to worry about the join event. We only need leave for the status_callback_event.

# this is an endpoint to add the callee into the conference call
@app.route('/joinConf/<string:call_session_id>', methods=['GET', 'POST'])
def conferenceCall(call_session_id):
   print("## Making a conference call")
   resp = VoiceResponse()
   dial = Dial()
   dial.conference(call_session_id,
                   waitUrl='',
                   status_callback=app.config['MY_URL']+'/leave',
                   status_callback_event="leave")
   resp.append(dial)
   return str(resp)

Dial additional participants into the call using their phone number

This endpoint is called by your frontend app when you attempt to add a third person into your call (it doesn’t go through Twilio unlike when you start a call).

@app.route('/addUser', methods=['POST'])
def addUser():
   data = json.loads(request.data)
   phone_number = data['PhoneNumber']
   session_id = data['SessionID']
   client = Client(app.config['TWILIO_ACCOUNT_SID'], app.config['TWILIO_AUTH_TOKEN'])
   print("Attemtping to add phone number to call: " + phone_number)

   participant = client.conferences(session_id).participants.create(
       from_=app.config['MY_TWILIO_PHONE_NUMBER'],
       to=phone_number,
       conference_status_callback=app.config['MY_URL']+'/leave',
       conference_status_callback_event="leave",)

   data = {
       "status_code": 200,
   }
   print(participant)
   resp = jsonify(data)
   return resp

This endpoint receives the following values: 

  • phone_number - the phone number of the person you wish to add to the call
  • session_ID - a unique, friendly name for the conference so we can add this phone number to the conference

You will also need a Twilio Phone Number, in order to dial another number into this conference. 

Using the participant property of a conference, we add the phone number to this conference. The from_ is our Twilio Phone Number and the to is the phone number we wish to add. 

As earlier, we specify a conference_status_callback_event and conference_status_callback URL. This will handle the case where only 1 person is left in the conference. We use the ‘leave’ event here so no one gets stuck in conference limbo.

Ensure the call ends when only one person is left 

This endpoint helps us end the call when only 1 participant is left. It also handles the edge case when the original caller leaves the call before a callee picks up

# this endpoint is called whenever a participant leaves a conference call
# used to end call depending on participants left and to handle edge case of original caller # leaves the call before callee picks up
# also called when the original caller first joins the conference to get map the sessionID # to the conference_sid to handle the edge case when the callee rejects call
@app.route('/leave', methods=['GET', 'POST'])
def leaveCall():
   event = request.values['SequenceNumber']
   conference_sid = request.values['ConferenceSid']
   global sessionID_to_confsid
   sessionID_to_confsid[request.values['FriendlyName']] = conference_sid
   client = Client(app.config['TWILIO_ACCOUNT_SID'], app.config['TWILIO_AUTH_TOKEN'])

   if request.values['StatusCallbackEvent'] == 'participant-leave':
     print("A Participant Left Call")   
     # ends conference call if only 1 participant left
     participants = client.conferences(conference_sid).participants
     if len(participants.list()) == 1:
       client.conferences(conference_sid).update(status='completed')
       print("Call ended")
      # ends conference call if original caller leaves before callee picks up
     elif len(participants.list()) == 0 and event == '2':
     client.calls(sessionID_to_callsid[request.values['FriendlyName']]).update(status='completed')
       print("Call ended")
  
  data = {
     "status_code": 200,
  }
  resp = jsonify(data)
  return resp

When this event is fired, we first get the sequence number. We then create a global dictionary sessionID_to_conflsid (defined at the top of the file as a global dictionary) to map the session_id  to the conference_sid.

This is useful when we handle the edge case of the callee rejecting the call (covered next). Note this part of the code runs whenever this endpoint is called (including the case where the caller first joins the conference call, allowing the global dictionary to have a mapping from the session_id  to the conference_sid if the callee rejects the call).

We then handle participants leaving the call by determining if that’s why the endpoint was called (this excludes the case when the caller first joins the conference call). 

We determine the number of participants (active participants) left in this call. Note this endpoint is called by Twilio whenever a person leaves the conference aside from the exception where the original caller first joins the conference. So if only 1 person is left, we end the conference by updating the status of the conference to `completed`. This means the last remaining user has their call ended. 

The next statements handle the edge case to end the conference call if the caller leaves before the callee picks up. 

In this case, there are no participants in the call when the endpoint is triggered, but since the first person joined (Sequence Number 0) and was removed (Sequence Number 1) the Sequence Number will now be 2. Using these two conditions, we take advantage of the global dictionary used in the makeCall endpoint that maps a sessionID to a specific call_sid to end the specific call to the callee (as the caller has ended the call). 

Edge Case: Callee rejects call

The completeCall endpoint handles the edge case when the callee rejects the original call from the caller.

# this is an endpoint to end the conference call if the callee rejects the call
@app.route('/completeCall/<string:call_session_id>', methods=['GET', 'POST'])
def completeCall(call_session_id):
   print("## Ending conference call, callee rejected call")
   client = Client(app.config['TWILIO_ACCOUNT_SID'], app.config['TWILIO_AUTH_TOKEN'])
   print(request.values)
   global sessionID_to_confsid
   participants = client.conferences(sessionID_to_confsid[call_session_id]).participants

   # only does so if 1 participant left in the conference call (i.e. the caller)
   if len(participants.list()) == 1:
       client.conferences(sessionID_to_confsid[call_session_id]).update(status='completed')
       print("Call ended")
   data = {
       "status_code": 200,
   }
   resp = jsonify(data)
   return resp

Taking advantage of the global dictionary used in the leave endpoint that maps a sessionID to the conference_sid, we first find out the number of participants in the conference. 

If there is only 1 participant in the call (after the callee rejects the call), then we end the conference by updating the status of the conference to completed.

Config File

Make sure to define a config file and have the following values defined: 

  • TWILIO_ACCOUNT_SID - obtained from Twilio Console
  • TWILIO_AUTH_TOKEN - obtained from Twilio Console
  • MY_URL - the URL where your Flask App is hosted
  • MY_TWILIO_PHONE_NUMBER - obtained from Twilio console

Translating for Humanity: one conference call at a time

All over the world, the inability to communicate has stopped people from gaining access to basic human rights and needs. If you speak multiple languages, even if you can just hold a conversation, you can be very useful. While reading and writing are useful skills, they aren't necessary for helping! A minute or two of your time can change a life. Download Tarjimly, sign up and help! 

If you know of people who would benefit from our service connect us! Caseworkers, asylum offices, doctors offices, food banks, civil rights activists, journalists are just some of the people who benefit from Tarjimly!

About the Author

Sasankh Munukutla is a Software Engineer Intern at Tarjimly, a tech nonprofit that provides on-demand humanitarian language translation and an undergraduate student at Stanford University. sasankh@tarjim.ly