Introducing Audio Processor APIs: Power High Quality Audio Experiences with Twilio Voice

March 11, 2024
Written by
Reviewed by
Paul Kamp
Twilion

Introducing Audio Processor APIs: Power High Quality Audio Experiences with Twilio Voice

 

The Twilio Voice JS SDK now exposes Audio Processor APIs, enabling access to raw audio input and the ability to modify audio data before sending it to Twilio. This feature enables client-side noise cancellation use cases in addition to many others. In this post, we’ll introduce you to the feature, give you an overview of how it works, and show you how to use it with our launch partner Krisp’s Noise Cancellation product.

With this new feature, the following use cases are now straightforward to achieve on the client side:

  • Remove background noise using a noise cancellation library of your choice (“bring your own”).
  • Inject music playback when placing a call on hold.
  • Apply custom audio filters to manipulate speech, boost or suppress target frequencies, and more.
  • Capture and pipe client-side audio to leverage in your AI workflows or model training.

Why audio quality matters

Studies show that audio quality not only impacts a user’s call experience but also improves meeting effectiveness. Here are a few examples:

  • Audio Quality Directly Impacts Credibility: Research suggests when our brains have a hard time understanding information, we are more inclined to dismiss the information.
  • Background Noise Reduces the Ability to Concentrate: Background or low-level noise often disrupts people’s ability to concentrate and can cause stress.
  • Audio Quality is more important than Video Quality: A 2022 research study showed that audio quality influences the perceived quality of the entire experience. Thus, bad audio lowers the perception of your overall experience, which is why you can watch a grainy video with clear audio but will stop watching an HD video with bad audio.

The bad news is that unwanted audio can happen from anyone, anywhere. It only takes one person with bad background noise to ruin an entire call, and improving your audio experience is why we decided to release the Audio Processor API for the Twilio JS Voice SDK.

Best-in-Class Noise Cancellation in Twilio Voice with Krisp.ai

We’re thrilled to launch this feature with a new integration provided by our partners at Krisp

Krisp.ai’s technology is built on a leading-edge Deep Neural Network that can differentiate between background sounds and the human voice. Krisp’s Voice AI SDKs are designed to identify the primary human voice actively speaking and cancel other sounds so users are not distracted by background noises and voices. Krisp SDKs are available for browsers (WASM JS), desktop apps (Win, Mac, Linux), and mobile apps (iOS, Android). 

The Krisp Audio JS SDK is a lightweight audio processor that can run inside your Twilio Voice application and create crystal clear audio. To get started, you need to receive an email invitation from Krisp to access the Krisp SDK Portal, download the Krisp Audio JS SDK, and place it in the assets of your Twilio Voice project. We’ll walk you through the integration steps with code samples in the next section.

Add Krisp Noise Cancellation to Twilio Voice

The Krisp SDK is loaded alongside the Twilio Voice JS SDK and runs as part of the audio pipeline between the microphone and audio encoder in a step called pre-processing. During this step, the AI based noise cancellation algorithm does its magic, removing unwanted sounds like barking dogs, construction noises, honking horns, and coffee shop chatter.

After the preprocessing step, the audio gets encoded and delivered to the end user. It’s very important to note that everything happens on your device, with almost no latency, and with none of your media ever sent to a server. 

The following example shows how to integrate the Krisp SDK into a sample Twilio Voice SDK application:

import { AudioProcessor, Device } from '@twilio/voice-sdk';  
import KrispSDK from '/noisecancellation/krisp/latest.js.version/dist/krispsdk.mjs';

let audioContext = null;

class NoiseCancellationAudioProcessor implements AudioProcessor {  
  constructor() {  
    if (!audioContext) {  
      audioContext = new AudioContext();  
    }  
  }

  async init() {  
    // Initialize the Krisp SDK  
    this.krispSDK = new KrispSDK({  
      params: {  
        models: {  
          modelBVC: '/noisecancellation/krisp/latest.js.version/dist/models/model_bvc.kw',  
          model8: '/noisecancellation/krisp/latest.js.version/dist/models/model_8.kw',  
          model16: '/noisecancellation/krisp/latest.js.version/dist/models/model_16.kw',  
          model32: '/noisecancellation/krisp/latest.js.version/dist/models/model_32.kw',  
        }  
      }  
    });  
    await this.krispSDK.init();  
  }

  async createProcessedStream(stream) {  
    if (!this.krispSDK) {  
      await this.init();  
    }  
    // Create Audio Filter  
    // This will create an audioworklet processor, and return AudioWorkletNode  
    this.filterNode = await this.krispSDK.createNoiseFilter({ audioContext, stream }, () => {  
      // Ready callback  
      // Enable it once ready  
      this.filterNode.enable();  
    });

    // Create source and destination
    this.source = audioContext.createMediaStreamSource(stream);
    this.destination = audioContext.createMediaStreamDestination();

    // Connect source to filter and filter to destination
    this.source.connect(this.filterNode);
    this.filterNode.connect(this.destination);

    // Return the resulting stream
    return this.destination.stream;
  }

  async destroyProcessedStream(stream) {  
    // Cleanup  
    if (this.source) {  
      this.source.disconnect();  
    }  
    if (this.destination) {  
      this.destination.disconnect();  
    }  
    if (this.filterNode) {  
      this.filterNode.disconnect();  
      await this.filterNode.dispose();  
    }  
  }  
}  
// Construct a device object, passing your own token and desired options  
const device = new Device(token, options);

// Construct the AudioProcessor  
const processor = new NoiseCancellationAudioProcessor();

// Add the processor  
await device.audio.addProcessor(processor);  
// Or remove it later  
// await device.audio.removeProcessor(processor);

Get started today

Audio Processor APIs for Twilio Programmable Voice were introduced in Twilio JS Voice SDK version 2.9.0. For additional information, please see our developer documentation. We can’t wait to hear what you build!