Twilio Changelog | May. 06, 2026

New Public Beta: V3 Batch Transcription Configuration API

What is Batch Transcription Configuration?

The Batch Transcription Configuration API enables you to transcribe completed call recordings with configurable speech-to-text engines, models, and language settings. Create reusable transcription configurations, then submit any recording for transcription — results are delivered via webhook with sentence-level detail including speaker separation, timestamps, and confidence scores.

API Endpoints

New Configuration API

Create, manage, and reuse transcription configurations.

Method	Endpoint	Description
POST	/v2/Configurations/Transcription	Create a transcription configuration
GET	/v2/Configurations/Transcription	List all configurations
GET	/v2/Configurations/Transcription/{id}	Get a specific configuration
PUT	/v2/Configurations/Transcription/{id}	Update a configuration
DELETE	/v2/Configurations/Transcription/{id}	Delete a configuration

New Transcription API (V3)

Submit recordings for transcription and check status.

Method	Endpoint	Description
POST	/v3/Transcriptions	Submit a recording for transcription (returns 202)
GET	/v3/Transcriptions/{id}	Check transcription status
GET	/v3/Transcriptions	List transcriptions - to be released, fast follow.
DELETE	/v3/Transcriptions/{id}	Delete a transcription - to be released, fast follow.

Configuration Object

Each configuration specifies how a recording should be transcribed:

transcriptionEngine — Speech-to-text provider (deepgram, google, or twilio_managed)
speechModel — Model variant (nova-3, nova-2, chirp_2, or twilio_managed)
language — Language code for transcription (e.g., en-US, es-ES, de-DE)
participantDefaults — Audio channel → participant type mapping for speaker separation
transcriptionStatusCallback — Webhook URL + method for receiving completed transcripts
conversationConfigurationId — Optional link to a conversation configuration

Access & Enablement

Important: If you are a current V2 Conversational Intelligence customer, you must request the V3 account access flag to be enabled on your account. We did this in order to help our current production customers have a clear differentiation between the new V3 Conversation Intelligence and Batch Transcription Configuration.

Once enabled, the V3 Transcription APIs and the Console Configuration Wizard will become available under Transcriptions in your Twilio Console.

To request access, contact your Twilio account team or submit a request through the Twilio Console > Voice > Transcriptions page. Enablement is typically processed within 1 business day.

Supported Engines, Models & Languages

Engine	Models	Languages
deepgram	nova-3, nova-2	en-US, en-GB, en-AU, es-ES, es-US, es-MX, de-DE, fr-FR, it-IT, pt-BR, pt-PT, nl-NL, no-NO, pl-PL, sv-SE, da-DK, multi
google	chirp_2	en-US, en-GB, en-AU, es-ES, es-US, de-DE, fr-FR, it-IT, pt-BR, pt-PT, nl-NL, no-NO, pl-PL, sv-SE, da-DK
twilio_managed	twilio_managed	en-US, en-GB, en-AU, es-ES, es-US, es-MX, de-DE, fr-FR, it-IT, pt-BR, pt-PT, nl-NL, no-NO, pl-PL, sv-SE, da-DK, multi

Note: google/chirp_2 does not support es-MX. Use es-ES or es-US for Spanish with Google. The twilio_managed engine automatically selects the best available model for your language.

Submitting a Transcription

POST /v3/Transcriptions with:

sourceId — Recording SID (RE...) of a completed call recording
transcriptionConfigurationId — ID of a previously created configuration
participants — Array of participant objects with type, address, and audioChannelIndex

Participant types: CUSTOMER, HUMAN_AGENT, AI_AGENT

Webhook Delivery

When transcription completes, a POST is sent to your configured callback URL containing the full transcript with:

Sentence-level segments with text content
Speaker/participant identification per sentence (via audio channel mapping)
Start and end timestamps (seconds) for each sentence
Word-level timestamps within each sentence
Confidence scores per sentence
Resolved configuration showing which engine/model/language was used
Participant metadata (type, address, channel)
Duration of the transcribed audio

Idempotency

The POST /v3/Transcriptions endpoint supports idempotent requests to prevent duplicate transcription processing during retries.

Header: Idempotency-Key
Format: UUIDv7
Behavior: Duplicate submissions with the same key return the original response instead of creating a new transcription

Transcription Status Lifecycle

Status	Description
QUEUED	Transcription request accepted and queued for processing
PROCESSING	Audio is being transcribed by the configured engine
COMPLETED	Transcription finished successfully; webhook delivered with results
FAILED	Transcription could not be completed (invalid audio, engine error, etc.)

Integration with Recording Configuration & Conversation Intelligence

The V3 Transcription Configuration integrates across the Voice platform to enable automated end-to-end workflows:

Auto-Transcribe via Recording Configuration

Attach a V3 Transcription Configuration to your Recording Configuration to automatically transcribe recordings as they complete — no additional API call required. When a call ends and the recording is ready, the platform automatically submits it for transcription using your configured engine, model, and language settings.

Downstream Analysis with V3 Conversation Intelligence

Completed transcription results can be sent to the new V3 Conversation Intelligence platform for downstream analysis including sentiment detection, topic extraction, compliance monitoring, and custom operator evaluation. This creates a fully automated pipeline: Call → Recording → Transcription → Conversation Intelligence — configured once, executed automatically on every call.

Known Limitations (Beta)

Transcript sentence content is delivered via webhook only; the GET endpoint returns status and metadata but not sentence text
Google chirp_2 engine has longer processing times compared to Deepgram (minutes vs seconds)
Default configuration fallback is not yet available — a transcriptionConfigurationId must be provided on each submission
Maximum recording duration for transcription is subject to engine-specific limits

Getting Started

1. Create a transcription configuration via POST /v2/Configurations/Transcription specifying your preferred engine, model, language, and callback URL

2. Place a call and enable recording (dual-channel recommended for speaker separation)

3. Once the recording is complete (status: completed), submit it via POST /v3/Transcriptions with the recording SID and your configuration ID

4. Receive the completed transcript at your webhook URL with sentence-level detail

5. Optionally poll GET /v3/Transcriptions/{id} to check status before webhook delivery

Base URLs

Configuration API: https://voice.twilio.com/v2/Configurations/Transcription
Transcription API: https://voice.twilio.com/v3/Transcriptions
Authentication: HTTP Basic (Account SID : Auth Token)

Voice API

New Public Beta: V3 Batch Transcription Configuration API

Additional Resources

Blog

Docs

Events