Choosing cameras in JavaScript with the mediaDevices API

Most smart phones come with a front and back camera, when you’re building a video application for mobile you may want to choose or switch between them.

If you’re building a chat app you probably want the front camera, but if you’re building a camera app then you’re more interested in the rear camera. In this post we’re going to see how to choose or switch between cameras using the mediaDevices API and media constraints.

What you’ll need

To follow along with this post you’ll need:

  • An iOS or Android device with two cameras to test with, if you have two webcams this will work on your laptop too
  • ngrok so you can easily access the project from your mobile device (and because I think ngrok is awesome)
  • The code from this GitHub repo to get you started

To get the code, clone the project and checkout the starter project tag.

This starter project gives you some HTML and CSS so we can concentrate on the JavaScript. You can open the index.html file directly, but I recommend you serve these files with a webserver. I like to use the npm module serve. I’ve included serve in the repo too, to use it first install the dependency with npm and then start the server.

Once you are running the server, open up a tunnel to it using ngrok. serve hosts the files on port 5000, to tunnel to that port with ngrok enter this on the command line in a new window:

Now you have a publicly available version of the site you can open this on your mobile device so that you can test it later. Make sure you open the HTTPS URL as the APIs we are using only run in a secure context.

The ngrok window shows two URLs you can use, pick the HTTPS one.

The app should look like this:

The app should have a title saying 'Camera fun' with a button and an empty drop down box.

Getting the media stream

Our first challenge is getting the video stream from any camera onto the screen. Once that is complete we will investigate the options for selecting the specific camera. Open up app.js and start by selecting the button and video elements from the DOM:

We’ll request access to the camera using the mediaDevices API when the user clicks or touches the button. To do so, we call navigator.mediaDevices.getUserMedia passing an object of media constraints. We’ll start with a simple set of constraints, we only want video, so we’ll set video to true and audio to false.

getUserMedia returns a promise, when that resolves we have access to a media stream from the camera. Set the video’s srcObj to the stream and we will see it on screen.

Save the file, reload the page and click the button. You should be presented with a permissions dialog requesting access to your camera, once the permissions are granted your video will appear on screen. Try this on your computer and your phone, when I tried with my iPhone the camera selected was the front facing camera.

The camera app, now with my face in the previously blank space!

If you are using an iPhone, make sure you check in Safari as this doesn’t seem to work with other browsers.

What cameras are available?

The mediaDevices API gives us a way to enumerate all the available devices for both audio and video input. We’ll use the enumerateDevices function to build up a set of options for a <select> box so we can use it to choose the camera we want to see. Open up app.js again and start by selecting the <select> from the DOM:

enumerateDevices returns a promise, so let’s write a function we can use to receive the result of the promise. The function will take a list of media devices as an argument.

The first thing to do is empty the <select> of any existing options and append one empty <option>. Then we loop through the devices, filtering out any that aren’t of kind “videoinput”. We then create an <option> using the device’s ID as the value and the device’s label for the text. We also handle the case where a device doesn’t report a label by generating a simple “Camera n” label.

At the end of app.js make the call to enumerateDevices.

Refresh the page and take a look at the drop down select next to the button. If you’re on Android, or using Chrome or Firefox, you will see the name of the cameras you have available.

On an iPhone however, you will see the generically named “Camera 1” and “Camera 2” from our function. On iOS you will not get the labels of the cameras until you have granted permission for the site to use at least one of the cameras. This makes our interface less useful for selecting a camera as, even though you do get the ID of the devices, you can’t tell which camera is which.

On the iPhone you only see the labels we made up, 'Camera 1' and 'Camera 2'.

We have not yet hooked up the drop down select to change the camera. Before we do, let’s look at another way we can influence which camera we want to select.

Facing mode

An alternative approach that we can use to select a camera is the facingMode constraint. This is a less exact way of picking a camera than getting its ID from the enumerateDevices function, but works really well for mobile devices. There are four options you can use for the constraint: user, environment, left and right. The constraints are explained in the MDN documentation, for the purposes of this post we’re going to use user and environment as they map nicely to front facing and back facing cameras on a mobile device.

To use the facingMode constraint we need to change the constraints we are using in our call to getUserMedia. Rather than just saying true for video we need an object of these constraints. Update the code to select the front facing camera like this:

Test from your mobile device now. You should find the front facing camera is selected. Update the facingMode to environment and try again. Now the rear facing camera should be selected.
Let’s put this code together with the results we got from enumerateDevices above to build a camera switcher once we’ve got permission to read the camera data.

Switching cameras

We have the code to pick a user or environment camera on the first selection, but if we want to switch cameras there’s a little more work to do.

First up, we should retain a reference to the current stream so that we can stop it when we switch to another one. Add one more variable and a utility function to stop the tracks in a stream to the top of app.js.

The function stopMediaTracks takes a stream and loops through each media track in the stream, stopping each of them.

We’ll change cameras when we press the same button, so we need to update the event listener. First, if we have a currentStream then we should stop it. Then we’ll check the <select> to see if we are choosing a particular device and build up the video constraints based on that.

Update the button’s click handler and the video constraints like so:

When we want to select a device by its deviceId we use the exact constraint. We avoid that for the facingMode constraint though, as that could fail on a device that doesn’t recognise having a “user” or “environment” facing mode, leaving us with no media at all.

Still within the click handler, when we get permission to use the video we are going to change a couple more things. Set the currentStream to the new stream passed to the function, so that we can stop it later, and set off another call to enumerateDevices.

enumerateDevices returns a promise, so we can return it from our then function and chain a new then for the result which will then be handled by our gotDevices function.

Replace your existing call to getUserMedia with the following:

When you’ve added all that code, your app.js should look like this completed one. Refresh the page and you can play about selecting and changing cameras. This works on both mobile and desktop too.

The finished result, this is an animation showing that you can select one camera then change and go from viewing the back camera to the front camera.

Next steps

We’ve seen how to select a user’s camera with the facingMode or deviceId constraint. Remember, facingMode is more reliable before you have permission to use the camera, but selecting a deviceId is more accurate. You can get all the code from this blog post in the GitHub repo and you try out the application live here.

If you are using Twilio Video to build a video application, you can use these constraints when calling either connect or createLocalVideoTrack.

Selecting or switching cameras is a useful feature for video chat, allowing users to pick the exact camera they want to use within your application’s interface, and it could go hand in hand with sharing your screen during a video call too.

Are there other video features you’d like to see that would be useful in video chats? Or any questions about this feature? Let me know in the comments or on Twitter at @philnash.