Get Started with TypeScript and Twilio Programmable Video

January 28, 2021
Written by
Jamie Corkhill
Contributor
Opinions expressed by Twilio contributors are their own
Reviewed by

typescript (1).png

This article is for reference only. We're not onboarding new customers to Programmable Video. Existing customers can continue to use the product until December 5, 2024.


We recommend migrating your application to the API provided by our preferred video partner, Zoom. We've prepared this migration guide to assist you in minimizing any service disruption.

In this article, you’ll learn to use TypeScript and Twilio Programmable Video to build a development version of a video chatting application. You’ll use the Twilio Video JavaScript Client Library (with TypeScript) to manage participant connection logic without worrying about handling authentication and identity management.

Twilio Programmable Video is a suite of tools for building real-time video apps that scale as you grow, from free 1:1 chats with WebRTC to larger group rooms with many participants. You can sign up for a free Twilio account to get started using Programmable Video.

TypeScript is an extension of pure JavaScript - a “superset” if you will - and adds static typing to the language. It enforces type safety, makes code easier to reason about, and permits the implementation of classic patterns in a more “traditional” manner. As a language extension, all JavaScript code is valid TypeScript code, and TypeScript is compiled down to JavaScript.

Parcel is a blazing-fast web configuration bundler that supports hot-module replacement and which bundles and transforms your assets. You’ll use it in this article to work with TypeScript on the client without having to worry about transpilation or bundling and configuration.

Requirements

  • Node.js - Consider using a tool like nvm to manage Node.js versions.
  • A Twilio Account for Programmable Video. If you are new to Twilio, you can create a free account.

Project configuration

Begin by globally installing Parcel, which will greatly simplify the bundling process:

npm i -g parcel-bundler

Next, create a directory called typescript-video in which to bootstrap your project and open your editor or IDE to point to it. Create an index.html file at the root of your project as well as a src folder. Inside your src folder, create an index.ts file, as such:

mkdir typescript-video
cd typescript-video
touch index.html
mkdir src
touch src/index.ts

Open up index.html in your favorite text editor and add the following code:

<html>
    <body>
        <script src="./src/index.ts"></script>
    </body>
</html>

Inside of src/index.ts, add the following alert message:

alert('Hello, World!');

From your terminal or command prompt you can now run Parcel, pointing at your root HTML file, and it should do all the work of bundling the project and installing the required dependencies:

parcel index.html

With that, you should see your project running on http://localhost:1234 (or whatever port Parcel selects). If you navigate there in your web browser, you should see the alert message on the screen. After you see that, bring the server down with Ctrl + C.

At this point, you should see that a node_modules folder (among others) has been created. In some cases, that doesn’t happen. If you find that the node_modules folder hasn’t been created, even if other folders were, you’ll have to initialize the project manually. Perform the following steps if this applies to you:

npm init -y
npm i typescript

Next, initialize the TypeScript project with this command:

node_modules/.bin/tsc --init

This will enable you to take advantage of the TypeScript Compiler’s strict type checking. With that, you can delete the code inside your index.html file and remove your index.ts file. You only created the latter to allow Parcel to create the project for TypeScript.

Generating Access Tokens for testing

The final step of configuration is to generate two tokens that you’ll use to permit two “users” to participate in the video call within a specific “room” - a room represents a real-time session through which data can be shared, such as audio, video, etc. All participants within a room can choose to share their streams with other members of the same.

Twilio’s servers require that each client that joins a video call have an Access Token which grants them access to a specific room. Generally, you would introduce a server that manages the creation of tokens for specific users with one of Twilio’s server-side libraries - a given user could make a request to receive a token, and the server would respond with one. That user would then connect to the room granted by the token.

To simplify things for your purposes, you won’t introduce that server here, and will instead pre-generate a couple of tokens that represent the user's identity. You’ll hardcode those tokens and implement a local storage-based pointer system to choose which client gets which token. You could also learn how to generate access tokens with Twilio Functions.

Navigate to the Testing Tools page in the Programmable Video section of the Twilio Console. In the Client Identity field, enter the name “John” and under the Room Name field for Access Token, use “room-name”. Refer to the screenshot below:

screenshot of testing tools in the twilio console

Click the Generate Access Token button, and save the result somewhere. Refresh the page, and perform all the same steps again, only this time use the name “Alice” for Client Identity instead of “John”. Save that Access Token as well.

These tokens do expire within an hour so, from time-to-time, you’ll have to refresh them or generate new ones.

Build the application

You’re now ready to build the application. This will be divided into two parts - the HTML structure and the TypeScript logic. We'll be starting with the former.

Create the HTML structure

Open index.html and add the following content:

<!DOCTYPE html>
<html lang="en">
<head>
    <title>Twilio Video Development Demo</title>

    <style>
        .media-container {
            display: flex;
        }

        .media-container > * + * {
            margin-left: 1.5rem;
        }
    </style>
</head>
<body>
    <div class="media-container">
        <div id="local-media-container"></div> 
        <div id="remote-media-container"></div>
    </div>
    
    <button id="join-button">Join Room</button>

    <script src="./src/video.ts"></script>
</body>
</html>

This will scaffold a basic HTML structure, consisting of a container to hold both local and remote media, as well as a button to join a room. Since you are only building a basic development-version of this application, you don’t need any buttons to handle leaving the room, for you’ll accomplish that by simply refreshing the page.

The <div> element with ID local-media-container will be used to display a preview window, while the <div> with ID remote-media-container will hold video elements that display other people on the call.

While it’s technically a bad practice to place CSS within the HTML document itself, you can get away with it here because it’s a tiny amount of styles and it’s a development-only version.

The styles simply align the elements horizontally and add a left margin between them for spacing purposes. The * + * selector is called a Lobotomized Owl”.

Create the main logic

Add three files to your src folder - token-repository.ts, types.ts, and video.ts with the commands below:

touch src/token-repository.ts
touch src/types.ts
touch src/video.ts

Open up the types.ts file and add the following code:

/**
 * Permits a type T to be null.
 */
export type Nullable<T> = T | null;

This code makes use of union types to define a type that can be of type T or type null. We'll see this being used in the video.ts file.

Open up the token-repository.ts file and add the following:

/**
 * Creates an instance of a token repository.
 */
function makeTokenRepository() {
    const tokens = [
        '[TOKEN]',
        '[TOKEN]'
    ];

    return {
        /**
         * Provides the token for the current pointer, and iterates the 
         * pointer for the next usage.
         */
        getNextToken() {
            if (!localStorage.getItem('tokenPointer'))
                localStorage.setItem('tokenPointer', String(0));

            let tokenPointer = parseInt(localStorage.getItem('tokenPointer')!); 
      
            if (tokenPointer >= tokens.length) {
                alert(`Maximum client count reached. Refresh all ${tokens.length} pages.`)
                localStorage.setItem('tokenPointer', String(0));
                throw new Error('Maximum client count reached.');
            }
        
            const token = tokens[tokenPointer];

            // Increment pointer
            localStorage.setItem('tokenPointer', (tokenPointer + 1).toString());

            return token;
        }
    }
}

/**
 * An instance of a token repository.
 */
export const tokenRepository = makeTokenRepository();

Since we're not implementing a token server you can hardcode the two tokens you created in the Project Configuration section in place of the [TOKEN] placeholders in the array. By the end, the array should have a length of two with element 0 being the first token and element 1 being the second.

The name “Repository” borrows from the Repository Pattern (used in a front-end context), and you’re using it here to mimic the use of an external server that generates tokens. Since you store both tokens in the array, you have to ensure that the first user who joins the call uses the first token, and the second user who joins the call uses the second token.

To do that, you store a pointer in local storage. This “pointer” is an integer pointing to the index of the array from which to pull the token. When the first user joins, they’ll use the token in position 0.

Then, the pointer will be incremented to 1, which means the next user who joins will receive the token in position 1. If more than two participants try to join the room, there will be a failure since there are only two tokens.

You don’t need to spend too much time trying to make sense of this. It’s simply here so that you can focus on implementing the client-side video logic without having to think about how the tokenization process works for authentication and identity on the server.

Now that you have the utility type and the repository created, it’s time to make use of Twilio’s client library for Programmable Video. Install it first with the following commands:

npm i twilio-video
npm i @types/node @types/twilio-video --save-dev

Open the video.ts file and add the following import statements and component handles:

import { 
    connect,
    createLocalVideoTrack,
    RemoteAudioTrack, 
    RemoteParticipant, 
    RemoteTrack, 
    RemoteVideoTrack, 
    Room
} from 'twilio-video';

import { tokenRepository } from './token-repository';

import { Nullable } from './types';

// UI Element Handles
const joinButton = document.querySelector('#join-button') as HTMLButtonElement;
const remoteMediaContainer = document.querySelector('#remote-media-container') as HTMLDivElement;
const localMediaContainer = document.querySelector('#local-media-container') as HTMLDivElement;

Here, you capture a handle to the button for joining the room and the containers that will hold both types of media. Next, you’ll want to add an entry point - that is, a function you’ll set to run when the page is first loaded. This will also perform some initial operations. Add the highlighted portion underneath the prior code, which is shortened to “...” in the snippet below.


… 

/**
 * Entry point.
 */
async function main() {
    // Provides a camera preview window.
    const localVideoTrack = await createLocalVideoTrack({ width: 640 });
    localMediaContainer.appendChild(localVideoTrack.attach());
}

// Entry point.
main();

The main() function generates a local video track from the user’s onboard camera and displays it within the localMediaContainer element via Twilio’s attach() method available on the track. This creates a “preview window”, per se, that the user can see prior to joining the room.

A “track” represents a stream containing media such as audio and video or data.

Here, since these tracks are coming from the user’s webcam, they are considered to be local tracks (or, local media) for that user’s session. But, in general, whether a given track is local or remote depends on the context - the tracks belonging to a user named Alice would be considered local tracks from Alice’s perspective.

Similarly, tracks belonging to a user Bob would be considered local from Bob’s perspective. If they both publish their tracks, Alice’s tracks would now be considered remote to Bob, and Bob’s tracks would be remote to Alice.

Calling attach() on a track means to attach it to the DOM. In this case, since you call attach() with no parameters on a video track, a video element is created and its srcObject is set to be the video stream. Further, the playsInline and autoPlay attributes are both automatically set to true. The former prevents the video from expanding to full screen while the latter sets the video to play automatically. To learn more about tracks, visit the Twilio Video Documentation. To learn about the internal workings of tracks, visit the Twilio Video Client Library GitHub Repository.

If you run the project with the parcel index.html command and navigate to http://localhost:1234 (or whatever port Parcel runs on for you), you should immediately see your webcam stream on the page after accepting the relevant permissions (if prompted). Bring the server down after you test it.

Next, you’ll want to create the function that will be fired when the user clicks the Join Room button. This function will use the token repository to pull a token, connect to the room, and wire up a few event handlers. You haven’t created the event handlers used by the code below yet, so don’t worry if you see errors.

Place all the highlighted code and any future code underneath your main function definition but before your main() function invocation.


async function main() { … }

/**
 * Triggers when the join button is clicked.
 */
async function onJoinClick() {
    joinButton.disabled = true;

    const room = await connect(tokenRepository.getNextToken(), {
        name: 'room-name',
        audio: true,
        video: { width: 640 }
    });

    // Attach the remote tracks of participants already in the room.
    room.participants.forEach(
        participant => manageTracksForRemoteParticipant(participant)
    );

    // Wire-up event handlers.
    room.on('participantConnected', onParticipantConnected);
    room.on('participantDisconnected', onParticipantDisconnected);
    window.onbeforeunload = () => room.disconnect();
}

// Entry point.
main();

The onJoinClick() function is the click event handler for the join button, which will be wired up later. It connects the current user to the room, with hardcoded media constraints and a hardcoded room name (the one you specified in the Project Configuration section).

The connect() function, provided by Twilio, performs the process of connecting the current participant to the room in question, and handles signaling, codecs, room creation, etc. If you want, take a look at the implementation of the connect() function. After connecting to the room, you can send and receive tracks with other connected participants. Visit the Twilio Video JavaScript Getting Started Guide to learn more.

In the implementation of the Twilio Video JavaScript Client Library, objects like Room and Track extend Node’s EventEmitter class. This allows them to handle connection events which are relevant to you.

The two room.on() statements beneath connect() do just that. When a participant connected event is fired, meaning a new participant has just joined the room, the onParticipantConnected() callback function, which you have yet to create, will be called. The same goes for a disconnection event.

Note that these events are not fired for existing users in the room at the time that you join - rather, you’ll receive these events for every participant who joins after you do (because you can only start listening for events after you join, you miss out on events before). The payload for these events is the specific RemoteParticipant in question, which allows you to access their tracks and identity.

Display Twilio video

In order to display video streams on the screen, you need to do a few things. First, you have to display any streams that are already published by participants who are already in the room, which you do right after calling connect() by enumerating the room’s current participants. You also have to listen for subscription events of those participants so that you can display streams that the already-connected participants might publish in the future.

Additionally, you’ll have to display any streams published now or in the future by remote participants whenever they join or connect, which takes up the two room.on() statements.

The manageTracksForRemoteParticipant() function, which will be created shortly, will manage subscribing to remote tracks available on a given remote participant and displaying them to the screen.

Thus, you’ll need to call it not only for all the participants currently in the room when you join, which you do right after calling connect() but also for each participant that enters the room at a later point in time - that is, within the onParticipantConnected() function.

At the bottom of the file but right above the main() function invocation, add the following helper functions:

…
/**
 * Attaches all attachable published tracks from the remote participant.
 * 
 * @param publications 
 * The list of possible publications to attach.
 */
function attachAttachableTracksForRemoteParticipant(participant: RemoteParticipant) {
    participant.tracks.forEach(publication => {
        if (!publication.isSubscribed)
            return;

        if (!trackExistsAndIsAttachable(publication.track))
            return;

        attachTrack(publication.track);
    });
}


/**
 * Attaches a remote track.
 * 
 * @param track 
 * The remote track to attach.
 */
function attachTrack(track: RemoteAudioTrack | RemoteVideoTrack) {
    remoteMediaContainer.appendChild(track.attach());
}

/**
 * Guard that a track is attachable.
 * 
 * @param track 
 * The remote track candidate.
 */
function trackExistsAndIsAttachable(track?: Nullable<RemoteTrack>): track is RemoteAudioTrack | RemoteVideoTrack {
    return !!track && (
        (track as RemoteAudioTrack).attach !== undefined ||
        (track as RemoteVideoTrack).attach !== undefined
    );
}

// Entry point.
main();

Twilio’s TypeScript Type Definitions define a RemoteTrack to be one of three track types - a RemoteVideoTrack, a RemoteAudioTrack, or a RemoteDataTrack. Only the former two are attachable to the DOM (and thus have an attach and detach method), so that’s why you added the trackExistsAndIsAttachable() function above.

The trackExistsAndIsAttachable function acts as a type guard and helps ensure that the track you’re dealing with is either an audio or video track. Using Duck Typing, it performs both a runtime and compile-time check to assert that the track is of the right type. See the Type Guards section of the TypeScript Documentation to learn more.

The attachTrack() function attaches either an audio or video track to the DOM. You use it to attach the tracks of remote participants.

Finally, the attachAttachableTracksForRemoteParticipant() function will accept a remote participant and enumerate their tracks. For each track, if you’ve subscribed to it and if it is of the right type, it’ll be attached to the DOM via the attachTrack function.

You’re almost complete with this section now. The next steps are to add the function manageTracksForRemoteParticipant() to manage track subscriptions and track attachment as well as the individual track and participant event handlers. This is the function that will handle all tracks for every remote participant.

After the onJoinClick() function but before the attachAttachableTracksForRemoteParticipant() function, add the following code:

…

/**
 * Triggers when a remote participant connects to the room.
 * 
 * @param participant 
 * The remote participant
 */
function onParticipantConnected(participant: RemoteParticipant) {
    manageTracksForRemoteParticipant(participant);
}

/**
 * Triggers when a remote participant disconnects from the room.
 * 
 * @param participant 
 * The remote participant
 */
function onParticipantDisconnected(participant: RemoteParticipant) {
    document.getElementById(participant.sid)?.remove();
}

/**
 * Triggers when a remote track is subscribed to.
 * 
 * @param track 
 * The remote track
 */
function onTrackSubscribed(track: RemoteTrack) {
    if (!trackExistsAndIsAttachable(track))
        return;

    attachTrack(track);
}

/**
 * Triggers when a remote track is unsubscribed from.
 * 
 * @param track 
 * The remote track
 */
function onTrackUnsubscribed(track: RemoteTrack) {
    if (trackExistsAndIsAttachable(track))
        track.detach().forEach(element => element.remove());
}

/**
 * Manages track attachment and subscription for a remote participant.
 * 
 * @param participant 
 * The remote participant
 */
function manageTracksForRemoteParticipant(participant: RemoteParticipant) {
    // Handle tracks that this participant has already published.
    attachAttachableTracksForRemoteParticipant(participant);

    // Handles tracks that this participant eventually publishes.
    participant.on('trackSubscribed', onTrackSubscribed);
    participant.on('trackUnsubscribed', onTrackUnsubscribed);
}

…

Most of these functions are event handlers:

  • When a participant is connected, subscribe to their track events and handle any already published tracks.
  • When a participant is disconnected, find their video element and remove it.
  • When a new track is subscribed to, check that it’s attachable and attach it if so.
  • When a track is unsubscribed, remove it from the DOM.

Lastly, you need to wire up the onJoinClicked event handler, which you can do right after the trackExistsAndIsAttachable() type assertion function but right before the main function invocation:

// Button event handlers.
joinButton.addEventListener('click', onJoinClick);

A step by step walk-through

You added a lot of code in the last section, so let’s take a moment to walk-through everything that happens in a typical session.

Firstly, when the page is first loaded, the main() function is invoked. The main() function utilizes the createLocalVideoTrack() function from Twilio’s video library to create a local video track during runtime. It then attaches that track to the DOM within the localMediaContainer div.

When the user clicks the Join Room button, the onJoinClick() event handler will be executed. It utilizes the tokenRepository to capture a token and then uses that token to connect to the room. Next, it goes through each of the participants who are already in the room at the time of joining.

For each of those participants, any tracks they already have published that are of an audio or video nature are attached to the DOM so they can be displayed. Event handlers are registered for each existing participant to capture any tracks that they may publish at some point in the future. These last few steps all stem from the manageTracksForRemoteParticipant function, which is called for each existing participant within onJoinClick.

Inside the onJoinClick() function, a few more event handlers are registered:

  • One for new participants connecting.
  • One for existing participants disconnecting.
  • One that disconnects the current user from the room if they leave the page.

The onParticipantConnected() function passes straight through to the manageTracksForRemoteParticpant() function, which subscribes to the relevant events and delegates calls to attach tracks. The onParticipantDisconnected() function removes the video element from the DOM for that user.

Finally, when a user tries to leave the page, they’re disconnected from the room thanks to the window.onbeforeunload event handler.

Note: In a production app you should also recycle their token at this point so that they can reconnect without both users having to refresh their pages.

Run and test the application

Run the project with parcel index.html now that the application is complete. Open up two browser windows or tabs and in your browser, navigate to the location where the project is running.

After agreeing to any permission requests, you should immediately see your webcam stream in the preview window. By clicking the Join Room button on both pages, a new window should appear showing the webcam stream from the other client. In this case, it’s the same because this is a local project only and both are using the same webcam.

Conclusion

In this project, you learned how to set up a Parcel and TypeScript project that makes use of Twilio Programmable Video. You learned how to manage participant connection states and deal with published tracks. To view this project’s source code, visit the “getting-started” branch at its GitHub Repository. You can also expand your app by checking out the next post in this series: Get Started with Twilio Programmable Video Authentication and Identity Using TypeScript.

Jamie is an 18-year-old software developer located in Texas. He has particular interests in enterprise architecture (DDD/CQRS/ES), writing elegant and testable code, and Physics and Mathematics. He is currently working on a startup in the business automation and tech education space, and when not behind a computer, he enjoys reading and learning.