Hacking Twilio Client to Play Videos Over the Phone

April 15, 2019
Written by

Video over phone

How can I play a Youtube video over WebRTC?
Is there a way to play dynamic audio over a Twilio Voice call?

Many people have asked about playing dynamic audio over the phone using Twilio. And while TwiML does allow you to create dynamic templates based on user input, it doesn’t help me if I want to start and stop audio or play a video over the phone. I could imagine some enterprising person using these powers to prank the robocallers, spammers and even the occasional pesky family member. In this post, we are going to look at one quick way to inject any MediaStream into a phone call using the WebRTC-based Twilio Client SDK.

Prerequisite: This post starts off at the very end of the Twilio Client quickstart, so if you haven’t completed that it is recommended you do so now. Of course, these concepts could be applied to any web app being hosted on a server, but if you want this demo to just work, complete the quickstart now.

Using a video as your source for the phone call

Before we dive into the Twilio Client piece we first need to add a media element to our HTML that will allow us to get the MediaStream. This could be done a myriad of ways including dynamically creating a media element at the time of page render with javascript. However, let’s keep it simple and use the HTML5 video element for this.

If you finished the quickstart you should have an index.html file. At the top of the <body> tag, let's add the following:

<video src="http://storage.com/filthy.mp4" id="videoStream" preload="auto" controls playsinline loop height=200></video>

This inserts a video element at the top of our page that we can now use for our phone call. This could have just as easily been an <audio> tag but I think using video is a little more fun.

The next thing we need to do is create a global JavaScript variable, that we’ll call audioStream, that can be used by our Twilio Client SDK as the audio source for the call. By default, the Twilio library will ask us to choose a device on our computer as a source of audio input. It does this by using the AudioContext API to create a MediaStream from the selected audio device, such as a microphone. In our case we don’t want the audio to come from any of these devices, we want it to come from the video element. Luckily our video element has a very handy method for capturing the MediaStream called captureStream().

Add the following to the bottom of your index.html.

<script type="text/javascript">
    let video = document.getElementById("videoStream"),
        audioStream = video.captureStream();
  </script>

And with that, we now have a global variable called audioStream that contains a MediaStream from our video.

Hacking the Twilio Client

As mentioned above, the Twilio Client SDK expects a MediaStream from one of the devices returned from getUserMedia, but this isn’t what we want. So instead we are going to grab the audioStream variable we created earlier and insert it strategically into our Twilio Client instance. In order to do this, we need to edit the Twilio Client JavaScript SDK.

First, we need to download the JavaScript SDK here. Once you have downloaded the twilio.js file, open it and edit the following function:

PeerConnection.prototype._createAnalyser = function (stream, audioContext) {
    var analyser = audioContext.createAnalyser();
    analyser.fftSize = 32;
    analyser.smoothingTimeConstant = 0.3;
    var streamSource = audioContext.createMediaStreamSource(audioStream); // NEW!!
    streamSource.connect(analyser);
    return analyser;
};

We need to add two more lines to the _setupPeerConnection function:

PeerConnection.prototype._setupPeerConnection = function (rtcConstraints, iceServers) {
    var self = this;
    var version = this._getProtocol();
    version.create(this.log, rtcConstraints, iceServers);
    audioStream = audioStream.active ? audioStream : document.getElementById("videoStream").captureStream(); // NEW!!
    addStream(version.pc, audioStream); // NEW!!
    var eventName = 'ontrack' in version.pc
        ? 'ontrack' : 'onaddstream';
    version.pc[eventName] = function (event) {
        var stream = self._remoteStream = event.stream || event.streams[0];
        if (self._isSinkSupported) {
            self._onAddTrack(self, stream);
        }
        else {
            self._fallbackOnAddTrack(self, stream);
        }
        self._startPollingVolume();
    };
    return version;
};

If you’d like to download a completed monkey-patched version of twilio.js you can download it here.

As a reminder to run the server I prefer http-server. Type the following into your terminal to open up a local server over port 8080.

$ npm install http-server -g
$ http-server

Now all we need to do is open our app and make a phone call to our personal number. After you’ve answered the call (should be coming from your Twilio number) press play to begin the video. Now you should be able to hear the video playing through the phone call!

Watch the demo now: https://youtu.be/fuyxHzp5fDg

To see all of the code, check out this GitHub repo: https://github.com/jarodreyes/client-audio-hack

What’s next?

A lot of fun can be had once you can inject dynamic audio into a phone call. Remember this demo works just as well by using an <audio> tag in place of the video element. I could imagine this being used for lots of things including:

  • Mash-ups
  • Virtual Instruments
  • Testing Audio via PSTN

I hope you enjoyed this post, and if you have more fun ideas for injecting audio into Twilio phone send me a note.