Twilio Changelog | May. 06, 2026

New Public Beta: V3 Batch Transcription Configuration API

What is  Batch Transcription Configuration?

The Batch Transcription Configuration API enables you to transcribe completed call recordings with configurable speech-to-text engines, models, and language settings. Create reusable transcription configurations, then submit any recording for transcription — results are delivered via webhook with sentence-level detail including speaker separation, timestamps, and confidence scores.

API Endpoints

New Configuration API

Create, manage, and reuse transcription configurations.

Method

Endpoint

Description

POST

/v2/Configurations/Transcription

Create a transcription configuration

GET

/v2/Configurations/Transcription

List all configurations

GET

/v2/Configurations/Transcription/{id}

Get a specific configuration

PUT

/v2/Configurations/Transcription/{id}

Update a configuration

DELETE

/v2/Configurations/Transcription/{id}

Delete a configuration


New Transcription API (V3)

Submit recordings for transcription and check status.

Method

Endpoint

Description

POST

/v3/Transcriptions

Submit a recording for transcription (returns 202)

GET

/v3/Transcriptions/{id}

Check transcription status

GET

/v3/Transcriptions

List transcriptions - to be released, fast follow.

DELETE

/v3/Transcriptions/{id}

Delete a transcription - to be released, fast follow.


Configuration Object

Each configuration specifies how a recording should be transcribed:

  • transcriptionEngine — Speech-to-text provider (deepgram, google, or twilio_managed)

  • speechModel — Model variant (nova-3, nova-2, chirp_2, or twilio_managed)

  • language — Language code for transcription (e.g., en-US, es-ES, de-DE)

  • participantDefaults — Audio channel → participant type mapping for speaker separation

  • transcriptionStatusCallback — Webhook URL + method for receiving completed transcripts

  • conversationConfigurationId — Optional link to a conversation configuration

Access & Enablement

Important: If you are a current V2 Conversational Intelligence customer, you must request the V3 account access flag to be enabled on your account. We did this in order to help our current production customers have a clear differentiation between the new V3 Conversation Intelligence and Batch Transcription Configuration. 

 Once enabled, the V3 Transcription APIs and the Console Configuration Wizard will become available under Transcriptions in your Twilio Console.

To request access, contact your Twilio account team or submit a request through the Twilio Console > Voice > Transcriptions page. Enablement is typically processed within 1 business day.

Supported Engines, Models & Languages

Engine

Models

Languages

deepgram

nova-3, nova-2

en-US, en-GB, en-AU, es-ES, es-US, es-MX, de-DE, fr-FR, it-IT, pt-BR, pt-PT, nl-NL, no-NO, pl-PL, sv-SE, da-DK, multi

google

chirp_2

en-US, en-GB, en-AU, es-ES, es-US, de-DE, fr-FR, it-IT, pt-BR, pt-PT, nl-NL, no-NO, pl-PL, sv-SE, da-DK

twilio_managed

twilio_managed

en-US, en-GB, en-AU, es-ES, es-US, es-MX, de-DE, fr-FR, it-IT, pt-BR, pt-PT, nl-NL, no-NO, pl-PL, sv-SE, da-DK, multi

   

Note: google/chirp_2 does not support es-MX. Use es-ES or es-US for Spanish with Google. The twilio_managed engine automatically selects the best available model for your language.

Submitting a Transcription

POST /v3/Transcriptions with:

  • sourceId — Recording SID (RE...) of a completed call recording

  • transcriptionConfigurationId — ID of a previously created configuration

  • participants — Array of participant objects with type, address, and audioChannelIndex

Participant types: CUSTOMER, HUMAN_AGENT, AI_AGENT

Webhook Delivery

When transcription completes, a POST is sent to your configured callback URL containing the full transcript with:

  • Sentence-level segments with text content

  • Speaker/participant identification per sentence (via audio channel mapping)

  • Start and end timestamps (seconds) for each sentence

  • Word-level timestamps within each sentence

  • Confidence scores per sentence

  • Resolved configuration showing which engine/model/language was used

  • Participant metadata (type, address, channel)

  • Duration of the transcribed audio

Idempotency

The POST /v3/Transcriptions endpoint supports idempotent requests to prevent duplicate transcription processing during retries.

  • Header: Idempotency-Key

  • Format: UUIDv7

  • Behavior: Duplicate submissions with the same key return the original response instead of creating a new transcription

Transcription Status Lifecycle

Status

Description

QUEUED

Transcription request accepted and queued for processing

PROCESSING

Audio is being transcribed by the configured engine

COMPLETED

Transcription finished successfully; webhook delivered with results

FAILED

Transcription could not be completed (invalid audio, engine error, etc.)


Integration with Recording Configuration & Conversation Intelligence

The V3 Transcription Configuration integrates across the Voice platform to enable automated end-to-end workflows:

Auto-Transcribe via Recording Configuration

Attach a V3 Transcription Configuration to your Recording Configuration to automatically transcribe recordings as they complete — no additional API call required. When a call ends and the recording is ready, the platform automatically submits it for transcription using your configured engine, model, and language settings.

Downstream Analysis with V3 Conversation Intelligence

Completed transcription results can be sent to the new V3 Conversation Intelligence platform for downstream analysis including sentiment detection, topic extraction, compliance monitoring, and custom operator evaluation. This creates a fully automated pipeline: Call → Recording → Transcription → Conversation Intelligence — configured once, executed automatically on every call.

Known Limitations (Beta)

  • Transcript sentence content is delivered via webhook only; the GET endpoint returns status and metadata but not sentence text

  • Google chirp_2 engine has longer processing times compared to Deepgram (minutes vs seconds)

  • Default configuration fallback is not yet available — a transcriptionConfigurationId must be provided on each submission

  • Maximum recording duration for transcription is subject to engine-specific limits

Getting Started

1. Create a transcription configuration via POST /v2/Configurations/Transcription specifying your preferred engine, model, language, and callback URL

2. Place a call and enable recording (dual-channel recommended for speaker separation)

3. Once the recording is complete (status: completed), submit it via POST /v3/Transcriptions with the recording SID and your configuration ID

4. Receive the completed transcript at your webhook URL with sentence-level detail

5. Optionally poll GET /v3/Transcriptions/{id} to check status before webhook delivery

Base URLs

  • Configuration API: https://voice.twilio.com/v2/Configurations/Transcription

  • Transcription API: https://voice.twilio.com/v3/Transcriptions

  • Authentication: HTTP Basic (Account SID : Auth Token)

Voice API

Additional Resources

Blog

Read more about our latest product updates, product tutorials, and community projects.


Docs

See API reference documentation, quickstarts, SDKs, and multi-language code samples.

Events

Find upcoming events and join us virtually or in person to learn more about our products.