Add-on

Remeeting Automatic Speech Recognition

Fast and accurate transcription, with speaker labels and word-level alternatives

Convert Twilio voice recordings into readable text, using state-of-the-art innovations in neural networks and artificial intelligence. Remeeting transcribes calls at faster than real-time speed and achieves over 93% transcription accuracy on an industry-standard evaluation.

See remeeting.com/benchmark to compare vendors.

The service is designed for compatibility with the IBM Watson Speech to Text API, and can return identically formatted results.

  • Accurate Transcription

    Recordings sent from Twilio are quickly processed in parallel by Remeeting, using a proprietary speech engine to deliver industry-leading accuracy. See vendor comparison link in description.

  • Compatible with IBM Watson

    Remeeting can serve as a drop-in replacement for the IBM Watson Speech to Text API, producing results that are identically formatted and oftentimes more accurate. Current support is limited to US English, with additional languages in development.

  • Speaker Labels

    Reliably identify who spoke when, even if two speakers are mixed down to a single channel. Remeeting can automatically separate their voices. (Dual-channel recordings will perform faster, and also enable recognition of overlapped speech.)

  • Word-level timestamp & alternatives

    Every word in the transcript can be shown with its start and end times. In addition, results can list acoustically similar alternative words and their corresponding confidence levels.

Sample annotation responses

API Response JSON

Remeeting converts audio input into written text, similar to IBM Watson's service. A sample JSON payload is included below, and conforms to the industry-standard Web Speech API Specification: https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html

{
  "result_index": 0,
  "speaker_labels": [...],
  "results": [
    {
      "word_alternatives": [...],
      "final": true,
      "alternatives": [
        {
          "transcript": "hello world ",
          "timestamps": [["hello", 0.03, 0.33], [ "world", 0.33, 0.84]],
          "confidence": 0.9947,
          "word_confidence": [["hello", 0.9999], ["world", 1.0]],
          "speaker_label": 1,
          "timestamp":  [0.03, 0.84]
        }
      ]
    },
    {
      "word_alternatives": [...],
      "final": true,
      "alternatives": [
        {
          "transcript": "um hi ",
          "timestamps": [["um", 1,08, 1.44 ], [ "hi", 1.62, 1.95]],
          "confidence": 0.9936,
          "word_confidence": [["um", 0.9905], ["hi", 0.9995]],
          "speaker_label": 0,
          "timestamp":  [1.08, 1.95]
        }
      ]
    }
  ]
}

Alternative result format (not IBM-compatible)

Specifying the format as "sherlock" will return the result as Remeeting's custom JSON, encoding a more complete set of information in a reasonably compact data structure. Beware that this format is subject to frequent unannounced changes as we update our system.

{
    "confidence": 0.994,
    "duration": 2.0,
    "duration_processed": 2.7725,
    "segments": [
        {
            "confidence": 0.994664,
            "interval": [
                0.0325,
                0.8425
            ],
            "speaker": 1,
            "transcript": "hello world",
            "word_alternatives": [
                {
                    "alternatives": [
                        {
                            "confidence": 0.99994,
                            "word": "hello"
                        },
                        {
                            "confidence": 5.7718e-05,
                            "word": ""
                        }
                    ],
                    "interval": [
                        0.0325,
                        0.3325
                    ]
                },
                {
                    "alternatives": [
                        {
                            "confidence": 1.0,
                            "word": "world"
                        }
                    ],
                    "interval": [
                        0.3325,
                        0.8425
                    ]
                }
            ]
        },
        {
            "confidence": 0.993598,
            "interval": [
                1.085,
                1.955
            ],
            "speaker": 0,
            "transcript": "um hi",
            "word_alternatives": [
                {
                    "alternatives": [
                        {
                            "confidence": 0.99052,
                            "word": "um"
                        },
                        {
                            "confidence": 0.0062398,
                            "word": "uh"
                        },
                        {
                            "confidence": 0.0018398,
                            "word": "hello"
                        },
                        {
                            "confidence": 0.00079183,
                            "word": ""
                        },
                        {
                            "confidence": 0.00033265,
                            "word": "i'm"
                        },
                        {
                            "confidence": 0.00010554,
                            "word": "umm"
                        },
                        {
                            "confidence": 5.7848e-05,
                            "word": "ah"
                        },
                        {
                            "confidence": 3.8915e-05,
                            "word": "hum"
                        },
                        {
                            "confidence": 2.3183e-05,
                            "word": "oh"
                        },
                        {
                            "confidence": 1.9575e-05,
                            "word": "er"
                        },
                        {
                            "confidence": 8.5311e-06,
                            "word": "uh-huh"
                        },
                        {
                            "confidence": 7.8966e-06,
                            "word": "m"
                        },
                        {
                            "confidence": 7.8357e-06,
                            "word": "hi"
                        },
                        {
                            "confidence": 6.5019e-06,
                            "word": "em"
                        }
                    ],
                    "interval": [
                        1.085,
                        1.445
                    ]
                },
                {
                    "alternatives": [
                        {
                            "confidence": 0.99954,
                            "word": "hi"
                        },
                        {
                            "confidence": 0.0003845,
                            "word": "high"
                        },
                        {
                            "confidence": 4.8966e-05,
                            "word": ""
                        },
                        {
                            "confidence": 1.3676e-05,
                            "word": "hello"
                        },
                        {
                            "confidence": 8.6633e-06,
                            "word": "hai"
                        }
                    ],
                    "interval": [
                        1.625,
                        1.955
                    ]
                }
            ]
        }
    ],
    "speakers": [
        "Left_Channel",
        "Right_Channel"
    ],
    "word_count": 4
}
Log in to install Remeeting Automatic Speech Recognition
The Twilio advantage
  • Communicate reliably

    Experience a 99.95% uptime SLA made possible with automated failover and zero-maintenance windows.

  • Operate at scale

    Extend the same app you write once to new markets with configurable features for localization and compliance.

  • Many channels

    Use the same platform you know for voice, SMS, video, chat, two-factor authentication, and more.

  • No shenanigans

    Get to market faster with pay-as-you-go pricing, free support, and the freedom to scale up or down without contracts.