Getting Started with Twilio Video

Hey there! This blog post covers an older, pre-release version of the Twilio Video SDK, so the code below likely won’t work anymore. For the very latest, check out the Twilio Video quickstart in the language of your choice.

If a picture is worth a thousand words, what is a video worth? A million? More? That’s the question we’ll look for developers like you to answer with Twilio Video, which we’ve released in a limited beta at Signal. Twilio Video makes it easy for you to connect your users by capturing every wave, groan, and belly laugh in a high-quality, peer-to-peer video conversation built on top of standard technologies like WebRTC.

But Twilio Video is about much more than providing the server-side infrastructure you’d need and drastically reducing the amount of code to write video applications with WebRTC. Twilio’s new SDKs enable cross-platform video conversations between web and native mobile clients (iOS and Android targeted initially), and cross-platform use of APIs like WebRTC’s data channel, which allows you to send arbitrary data between clients (think chat/IP messaging or screen sharing). You will also have server-side control over conversations with a rich REST API, allowing you to to intelligently manage the client-side experience from your back end code when necessary.

Twilio’s mission is to empower developers to change the way the world communicates forever, and we feel this technology is a critical step along that path. We are excited beyond words to see what you’ll build with Twilio Video.

In this tutorial, we’ll show you how to get started with Twilio Video in desktop web browsers that support WebRTC (recent Chrome and Firefox builds should do the trick). We’ll show you the server-side code you’ll need to write to power video (spoiler alert, it’s not much) and many of the JavaScript APIs you’ll have available to build a communication experience in the browser. Sound like a plan? Then let’s get cracking!

What You’ll Need

  • Twilio Account and Access to the Video Beta Program (All Signal Attendees!)
  • Node.js installed on your system for the required back-end components (we’re using Node.js in this example, but Twilio’s helper libraries are available in C#, Java, Ruby, PHP, and Python also)
  • A WebRTC enabled browser – the latest Chrome and Firefox releases should do nicely

What You’ll Build

  • A basic video calling application with two parties in the browser
  • A backend to handle generating capability tokens using your Twilio account

Let’s get to it!

Building the Video Calling Interface

Most of the fireworks for Twilio Video will be launched in the browser, so let’s start here! We’ll begin by creating a single HTML page that will become our video chat application. This page will be served by your web application as-is, and with a little bit of JavaScript and CSS will form 100% of the UI for this example.

In the static asset directory of our application, we have a file called “index.html”. We need two chunks of UI to power our application – one to allow the user to enter a name (which will allow other users to video call them using that name), and another chunk to either accept or initiate an outbound video call.

Here’s the markup we need to allow the user to enter their name.

And here’s the markup for the actual video call UI, which is initially hidden.

Twilio’s JavaScript SDK will handle inserting HTML 5 video elements into the divs we’re targeting – #me for your local video feed, and #you for the remote participant.

All together, the markup (plus a tiny bit of CSS) for the UI looks like this.

Now, we’re ready to write the JavaScript code that will power the video conversation.

Starting the Conversation

The next component you’ll need is the Twilio JavaScript SDK. This file must be loaded from a CDN managed by Twilio. The WebRTC APIs that our tools are built on are evolving rapidly, and loading the JS SDK from our CDN ensures you always have code that is compatible with the latest revisions.

We include the JS SDK using a script tag beneath our UI markup but above the </body> tag, like so:

Next, we include a version of jQuery from a CDN to make our event handling and DOM manipulation a little easier:

Next, we’ll create a script tag that will contain some actual code to drive our UI. The first thing we need to do is allow the user to specify what name they will be reachable at by other users. This name is the unique address for an object we call an “Endpoint” in the API. An Endpoint is a person or entity that can be involved in a “Conversation”, which is a shared communication channel between multiple Endpoints. An Endpoint could be a browser based client like the one we’re building, an old-school telephone on the PSTN, or an iPad application running Twilio’s iOS SDK.

To allow the user to specify their unique name, we provided a text field and a button in a basic form-like interface in our markup. We begin by attaching an event handler to the button click like so.

When the button is clicked, we need to create an Endpoint with the name the user entered. To do that, we’ll need to use the “Twilio.Endpoint” constructor along with a secure access token which will allow our browser-based client to communicate with Twilio. This value we’ll actually need to generate on the server rather than in the browser, so we’ll fetch the token via an Ajax request. We’ll check out that server code in a bit, but for now just understand that it’s generating a one-time use token, for a client with our unique name, to allow our browser to communicate with Twilio.

Here’s the initialization code all together.

At the end of the click handler, we pass the Endpoint we created to an “init” function that will set up the actual video calling UI. Let’s see how that works next.

Reach Out And Video Call Someone

After creating our endpoint, we need to configure it to accept incoming video calls, as well as initiate outbound calls to any other user we choose. That process begins in the “init” function we used a moment ago – let’s look at the key steps of that function.

The first order of business is to register an event listener for incoming calls via the “invite” event.

This event handler is passed an “Invitation” object, which in this case we will immediately accept (you could reject it as well). The “accept” function executes an async process to connect your browser to another client, and will notify you when that process is done using a promise. When the conversation is established, we pass in a function called “showConversation”, which will handle rendering the conversation video feeds in our UI. We’ll check out how that works in just a bit.

The next thing we need to do to initialize our calling UI is attach event handlers to handle the user initiating a call themselves.

When the #call button is clicked, we create a new Conversation between our endpoint and another endpoint with the name the user entered into the #other-name input box. Just like accepting a call, we specify a “showConversation” as a callback when our browser is connected to the person we are trying to call.

Finally, we need to tell our endpoint to start listening for inbound calls:

“listen” returns a promise as well, but for brevity we are omitting a callback function to handle it. Everything will work on the first try always, right? Right?

All together, the init function looks like this:

Now our app is all set to both make and receive calls. The next thing we need to do is actually show the video feeds associated with the call, which happens in the “showConversation” function which we’ll show next.

Video Killed the… Um… I Can’t Think of a Good “Radio Star” Pun

The “showConversation” function has two jobs – attach the local media stream (the input from your own web cam) and the remote stream (the video feed coming from the other person) to elements in the UI. This is done using functions called “attach”, which take a selector string for an element, to which it will append a video tag with the feeds.

Here’s what that looks like:

Now, all together, our front end code (in a single HTML file) looks like this:

That’s it for the front end – but remember that token we had to fetch from our server via Ajax? Well, it’s not going to generate itself, so let’s hop into our server code to see what we need to do to generate this token.

Our Express Webapp

In this example, our back end application is a simple Node.js web application using the popular Express web framework. Our usage of it is fairly minimal – we create an HTTP server and use Express to handle incoming requests to it. We also use the built-in Express middleware for serving static assets (HTML, CSS, JavaScript) from the “public” folder of our app. That will handle sending “index.html” to the browser when a user visits the root URL of the app.

We define only a single route, that will be requested via Ajax from the browser. This route generates the access token we’ll need to allow our browser-based code to talk to Twilio. It also initializes our application by using the Twilio REST API to generate a secure keypair which we use to sign the access token we send to the browser.

Here’s our server code all in one shot:

Most of the token generation logic is found in the “token.js” module, which exports two functions. The “initialize” function fetches the keys that we use to sign our token. The “generateToken” function generates the secure string token we send to the browser.

Our call to “initialize” happens only once on startup, after which the module-level SIGNING_KEY_SID and SIGNING_KEY_SECRET variables are populated. We’ll need these values to mint our token. We won’t dive into this code right now – eventually you will be able to create and save these values in the account portal, which is probably going to be easier than using the REST API to create them.

Where we will spend some time is in the code that generates the access token we send to the browser. Ultimately, this code will return a JSON Web Token (JWT), serialized as a string that we’ll include with our response on the “/token” route. Let’s take a look at the code we need to write to make this happen.

First, we’ll need to create a new access token, which is a helper object that will help us build our JWT. To this constructor, you’ll pass in the signing key SID from our “initialize” function (SIGNING_KEY_SID) and your Twilio Account SID (found on your dashboard).

Next, we’ll need to configure the token we generate to have a unique Endpoint name, and have permission to both accept and send conversation invites:

We also need to grant our browser-based client the ability to create NAT traversal tokens to assist in connecting browsers peer-to-peer:

Finally, we sign and generate a string representation the token:

All together, the generateToken function looks like this:

And that’s it! Now you’re ready to start making and receiving video calls in the browser.

Wrapping Up

Video is the first step on a longer journey to open up scalable IP communications to every developer in every application. Using video, you’ll be able to connect your users in rich conversations where more emotion and meaning are sent over the wire than what could be accomplished in a voice-only call.

In a short time, you’ll be able to create cross-platform interactions of this kind, connecting iOS, Android, and web apps seamlessly using the same infrastructure. We can’t wait to see what you build! Please hit us up at with any questions, and we’d love to help you out.

  • Mick Stevens

    Hey Kevin, I think the name value in the app.json is too long (>30 chars?) so it busts Heroku’s deploy button rules? Hope it’s ok to flag it here, I’m being lazy in not doing a pull request to change the name to “Twilio Video Quickstart”.

  • Marcos Placona

    Thanks for this Mick. I’ve just sent Kevin a Pull Request ( for this, and we should be merging it in soon.

    • Jonathan Ekwempu

      Good writeup. Just curious, can one use JavaScript to write a cross-platform interaction application or do I need different SDKs for the different platforms? What strategy is Twilio adopting?

  • Jonathan Ekwempu

    Thanks Kevin for good post. Just curious, can one use JavaScript to write a cross-platform interaction application or do I need different SDKs for the different platforms? What strategy is Twilio likely to adapt?

    • kevinwhinnery

      Hey Jonathan,

      This particular JavaScript SDK is targeted at the browser. We will provide native Android and iOS SDKs in the near future as well. It’s possible the community will wrap the native SDKs for video for use in Cordova, Titanium, React Native or similar SDKs, but it’s not likely to be a priority in the short term.


      • 9 months later…

        Hey Kevin, what’s the current state of this? Can we use twilio to do cross-platform video calls from cordova-based mobile apps?

        • kevinwhinnery

          Sorry I missed this – currently there are no plans to do cross-platform wrappers. We’ll definitely make noise about it on the blog/twitter if this changes,

      • usman

        10 months later… :)

        • kevinwhinnery

          See above – no plans as yet.

          • usman


      • Do you know of anyone in the community doing this? I have Cordova + Android working just fine using the Telerik Webview (new projects would probably use Crosswalk). iOS is 90% of the way there, as in I have media capture and iosrtc plugins working, but the actual connection to Twilio’s wss video endpoint appears to be failing (just speculation).

        It has not been a huge hassle, other than addressing certain security and permissions issues on the various platforms. This final issue with iOS is the only thing that’s stopping us for releasing Twilio Video + Cordova for iOS.

    • I would love to see Twilio’s video capabilities opened up to the React Native world.

  • Jason

    Hi There,
    I am planning to use Twilio for my project base on Cordova.
    But I have some questions:
    1. Should we use Javascript SDK for my project? any risks/cons with Webview?
    3. Could I build a plugin base on iOS, Android SDKs the same with

    • Jason, did you ever make progress with this? We are looking for a webview-based solution. We are using the Telerik Webview. Android works fine. iOS appears to have functioning media capture with a plugin. But the Twilio Video session isn’t succeeding to connect to a Room. We are seeing a very generic-seeming “Open failed” error in the console, with an attempt to connect to a wss (websockets) endpoint immediately before this.

      Were you able to proceed with Cordova and Twilio Video for this project?

  • Sandesh Sardar

    I have following query regarding Twilio :-

    Supported text chatting with video chat (like Skype)?

    Supported attachment sending functionality?

    Supported video recording?

    Supported screen sharing?

    Supported multi-user video calling?

    • Jorge

      Can you your anwers Sandesh?

  • Ali Habib

    So both me and the other side will require token form the server , will this cost money or the calling video after start , am not sure about other-name meaning the other side token you mean ?! or what

  • Akshay Champavat

    I am getting error in my console like

    Endpoint ERROR b { toString=function(), clone=function()}twilio-….min.js (line 95)INVALID_TOKEN: (intermediate value).forEach is not a function

  • Krishna Karki

    Any documentation with angular 2 typescript?

    • Megan Speir

      Hi Krishna, This is not currently covered within docs on the Twilio website. You may find some folks actively developing Video with Angular 2 and TypeScript if you keep an eye on the Github repo: