How to Create WhatsApp Voice Transcripts with Rust

December 09, 2025
Written by
Popoola Temitope
Contributor
Opinions expressed by Twilio contributors are their own
Reviewed by

How to Create WhatsApp Voice Transcripts with Rust

Creating transcripts of WhatsApp voice messages unlocks powerful capabilities such as customer support, transcription, analytics, and automation. Twilio makes it easy to receive these messages via webhooks, while Rust provides the performance and reliability required for efficient media handling.

In this tutorial, you’ll learn how to build a Rust server that receives audio messages from WhatsApp through Twilio's WhatsApp Business API, downloads the media, converts it into a usable format, and prepares it for further processing-like transcription.

Prerequisites

To complete this tutorial, you should have the following:

  • Rust v1.87 or higher installed
  • Ngrok installed and linked to an active account
  • A Twilio account (free or paid). Click here to create one, if you don't have one already.
  • An AssemblyAI account
  • A mobile/cell phone with WhatsApp installed and an active WhatsApp account

Create a new Rust project

To get started, open your terminal, navigate to your desired directory, and run the following commands to initialize a new Rust project with Cargo.

cargo new twilio-whatsapp-audio
cd twilio-whatsapp-audio

Once the project has been successfully created, open the project folder in your preferred IDE or code editor, such as NeoVIM or RustRover.

Add the application's dependencies

Next, let’s add the necessary dependencies for the application. To do this, open the Cargo.toml file in the project’s root directory and add the following under the [dependencies] section.

actix-web = "4.4.0"
serde = { version = "1.0.188", features = ["derive"] }
tokio = { version = "1.32.0", features = ["full"] }
reqwest = { version = "0.11.20", features = ["json"] }
dotenv = "0.15.0"
tracing = "0.1.37"
tracing-subscriber = "0.3.17"
uuid = { version = "1.4.1", features = ["v4"] }
tempfile = "3.8.0"

Here's a breakdown of the dependencies:

  • actix-web: A web framework for building HTTP servers and APIs
  • dotenv: Loads environment variables from a .env file
  • reqwest: A HTTP client for sending requests and handling responses
  • serde: A library for serializing and deserializing Rust data structures
  • tempfile: Securely creates temporary files for media processing
  • tokio: An asynchronous runtime used by Actix and other async libraries
  • tracing: A structured, event-based logging for diagnostics
  • tracing-subscriber: Formats and collects tracing log data
  • uuid: Generates unique identifiers, useful for tracking and filenames

Create the environment variables

Now, let’s create a .env file to store the application credentials. To do this, inside the project’s root folder, create a new file named .env, and add the following environment variables to it.

ASSEMBLYAI_API_KEY=<ASSEMBLYAI_API_KEY>
HOST=127.0.0.1
PORT=8080
TWILIO_ACCOUNT_SID=<TWILIO_ACCOUNT_SID>
TWILIO_AUTH_TOKEN=<TWILIO_AUTH_TOKEN>

Retrieve your Twilio credentials

Let’s retrieve the Twilio Account SID and Auth Token. Log in to your Twilio Console dashboard, where you’ll find them under the Account Info section, as shown in the screenshot below.

Twilio dashboard with account SID, Auth Token, and verification resource links displayed.

Copy the Account SID and Auth Token, and replace the <TWILIO_ACCOUNT_SID> and <TWILIO_AUTH_TOKEN> placeholders in the .env file accordingly with those values.

Retrieve your AssemblyAI API key

To transcribe incoming WhatsApp audio messages, log in to your AssemblyAI dashboard and click on API Keys in the left-hand side navigation menu to access your API key, as shown in the screenshot below.

Screenshot showing API Key management with options to edit project and copy API key.

Copy the API key and replace the <ASSEMBLYAI_API_KEY> placeholder in the .env file with it.

Create the data models

Let’s now create the application's data models. To do this, navigate to the src folder, open the main.rs file, and replace its contents with the following code.

use actix_web::{web, App, HttpServer, HttpResponse, Error};
use dotenv::dotenv;
use serde::{Deserialize, Serialize};
use std::env;
use tracing::{info, error};
use uuid::Uuid;
#[derive(Debug, Deserialize)]
struct TwilioWebhook {
    #[serde(rename = "MediaContentType0")]
    media_content_type: Option<String>,
    #[serde(rename = "MediaUrl0")]
    media_url: Option<String>,
}
#[derive(Debug)]
enum AudioProcessingError {
    DownloadError(String),
    TranscriptionError(String),
    FileSystemError(String),
    EnvVarError(std::env::VarError),
    RequestError(reqwest::Error),
}
impl std::fmt::Display for AudioProcessingError {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            Self::DownloadError(msg) => write!(f, "Failed to download audio: {}", msg),
            Self::TranscriptionError(msg) => write!(f, "Failed to transcribe audio: {}", msg),
            Self::FileSystemError(msg) => write!(f, "File system error: {}", msg),
            Self::EnvVarError(e) => write!(f, "Environment variable error: {}", e),
            Self::RequestError(e) => write!(f, "Request error: {}", e),
        }
    }
}
impl std::error::Error for AudioProcessingError {}
impl From<std::env::VarError> for AudioProcessingError {
    fn from(err: std::env::VarError) -> Self {
        Self::EnvVarError(err)
    }
}
impl From<reqwest::Error> for AudioProcessingError {
    fn from(err: reqwest::Error) -> Self {
        Self::RequestError(err)
    }
}
impl From<std::io::Error> for AudioProcessingError {
    fn from(err: std::io::Error) -> Self {
        Self::FileSystemError(err.to_string())
    }
}

In the code above, we:

  • Imported all the necessary crates for web handling, environment variables, serialization, logging, and UUID generation
  • Created the TwilioWebhook model to represent the incoming webhook payload from Twilio, mapping Twilio field names to Rust fields using Serde
  • Create the TwilioResponse model, which defines the structure of the response sent back to Twilio
  • Create the AudioProcessingError model as a custom error type that handles potential failures, including downloading, transcription, environment variables, HTTP requests, and file system operations

Create the application's functions

Next, let’s create a function to handle and process incoming Twilio WhatsApp audio messages. To do this, add the following code to the end of main.rs.

async fn handle_webhook(form: web::Form<TwilioWebhook>) -> Result<HttpResponse, Error> {
    info!("Received webhook: {:?}", form);
    if let (Some(media_url), Some(content_type)) = (&form.media_url, &form.media_content_type) {
        if content_type.starts_with("audio/") {
            info!("Audio message detected: {}", media_url);
            match process_audio_message(media_url).await {
                Ok(transcription) => {
                    info!("Transcription complete: {}", transcription);
                    let response = format!("Transcription: {}", transcription);
                    return Ok(HttpResponse::Ok()
                        .content_type("application/xml")
                        .body(format!(
                            r#"<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Message>{}</Message>
</Response>"#,
                            response
                        )));
                }
                Err(e) => {
                    error!("Error processing audio: {:?}", e);
                    let response = "Sorry, there was an error processing your audio message.";
                    return Ok(HttpResponse::Ok()
                        .content_type("application/xml")
                        .body(format!(
                            r#"<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Message>{}</Message>
</Response>"#,
                            response
                        )));
                }
            }
        } else {
            info!("Non-audio media received with content type: {}", content_type);
        }
    } else {
        info!("No media content detected in request");
    }
    let response = "Please send an audio message for transcription.";
    Ok(HttpResponse::Ok()
        .content_type("application/xml")
        .body(format!(
            r#"<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Message>{}</Message>
</Response>"#,
            response
        )))
}
async fn process_audio_message(media_url: &str) -> Result<String, AudioProcessingError> {
    info!("Starting to process audio from URL: {}", media_url);
    let twilio_account_sid = env::var("TWILIO_ACCOUNT_SID")?;
    let twilio_auth_token = env::var("TWILIO_AUTH_TOKEN")?;
    let client = reqwest::Client::new();
    let response = client
        .get(media_url)
        .basic_auth(&twilio_account_sid, Some(&twilio_auth_token))
        .send()
        .await?;
    let status = response.status();
    if !status.is_success() {
        let error_body = response.text().await.unwrap_or_else(|_| "Could not read error body".to_string());
        error!("Failed to download audio: HTTP {} - {}", status, error_body);
        return Err(AudioProcessingError::DownloadError(format!(
            "HTTP error: {} - {}", status, error_body
        )));
    }
    let temp_dir = tempfile::tempdir()?;
    let file_name = format!("{}.ogg", Uuid::new_v4());
    let file_path = temp_dir.path().join(&file_name);
    let audio_data = response.bytes().await?;
    std::fs::write(&file_path, &audio_data)?;
    info!("Successfully downloaded audio file to: {:?}", file_path);
    let transcription = transcribe_audio(&file_path).await?;
    Ok(transcription)
}
async fn transcribe_audio(file_path: &std::path::Path) -> Result<String, AudioProcessingError> {
    let assemblyai_api_key = env::var("ASSEMBLYAI_API_KEY")?;
    let file_data = tokio::fs::read(file_path).await?;
    let client = reqwest::Client::new();
    #[derive(Deserialize)]
    struct UploadResponse {
        upload_url: String,
    }
    let upload_response = client
        .post("https://api.assemblyai.com/v2/upload")
        .header("Authorization", &assemblyai_api_key)
        .body(file_data)
        .send()
        .await?;
    if !upload_response.status().is_success() {
        let error_text = upload_response.text().await.unwrap_or_else(|_| "Could not read error body".to_string());
        return Err(AudioProcessingError::TranscriptionError(format!("Upload failed: {}", error_text)));
    }
    let upload_result: UploadResponse = upload_response.json().await?;
    #[derive(Serialize)]
    struct TranscriptionRequest {
        audio_url: String,
    }
    #[derive(Deserialize)]
    struct TranscriptionResponse {
        id: String,
    }
    let transcription_request = TranscriptionRequest {
        audio_url: upload_result.upload_url,
    };
    let transcription_response = client
        .post("https://api.assemblyai.com/v2/transcript")
        .header("Authorization", &assemblyai_api_key)
        .header("Content-Type", "application/json")
        .json(&transcription_request)
        .send()
        .await?;
    if !transcription_response.status().is_success() {
        let error_text = transcription_response.text().await.unwrap_or_else(|_| "Could not read error body".to_string());
        return Err(AudioProcessingError::TranscriptionError(format!("Transcription request failed: {}", error_text)));
    }
    let transcription_result: TranscriptionResponse = transcription_response.json().await?;
    let transcript_id = transcription_result.id;
    #[derive(Deserialize)]
    struct TranscriptResult {
        status: String,
        text: Option<String>,
        error: Option<String>,
    }
    let polling_url = format!("https://api.assemblyai.com/v2/transcript/{}", transcript_id);
    let mut attempts = 0;
    const MAX_ATTEMPTS: u32 = 30;
    const POLLING_INTERVAL: std::time::Duration = std::time::Duration::from_secs(2);
    loop {
        if attempts >= MAX_ATTEMPTS {
            return Err(AudioProcessingError::TranscriptionError("Timed out waiting for transcription".to_string()));
        }
        tokio::time::sleep(POLLING_INTERVAL).await;
        attempts += 1;
        let poll_response = client
            .get(&polling_url)
            .header("Authorization", &assemblyai_api_key)
            .send()
            .await?;
        if !poll_response.status().is_success() {
            let error_text = poll_response.text().await.unwrap_or_else(|_| "Could not read error body".to_string());
            return Err(AudioProcessingError::TranscriptionError(format!("Polling failed: {}", error_text)));
        }
        let transcript_status: TranscriptResult = poll_response.json().await?;
        match transcript_status.status.as_str() {
            "completed" => return Ok(transcript_status.text.unwrap_or_default()),
            "error" => return Err(AudioProcessingError::TranscriptionError(
                transcript_status.error.unwrap_or_else(|| "Unknown error".to_string()),
            )),
            "queued" | "processing" => continue,
            _ => return Err(AudioProcessingError::TranscriptionError(
                format!("Unknown transcription status: {}", transcript_status.status),
            )),
        }
    }
}

In the code above:

  • The handle_webhook() function processes incoming Twilio WhatsApp messages by checking whether the request includes an audio file, based on the media URL and content type. If valid, it calls the process_audio_message() function and returns an XML response containing either the transcription or a specific error message.
  • The process_audio_message() function downloads the audio file from Twilio, temporarily saves it, and passes it to the transcribe_audio() function for transcription
  • The transcribe_audio() function uploads the audio file to AssemblyAI and returns the transcribed text

Add the application's entry-point function

Let’s add a main() function to serve as the application’s entry point. To do this, add the following code to the end of main.rs.

#[actix_web::main]
async fn main() -> std::io::Result<()> {
    dotenv().ok();
    tracing_subscriber::fmt()
        .with_max_level(tracing::Level::INFO)
        .init();
    let required_vars = vec!["TWILIO_ACCOUNT_SID", "TWILIO_AUTH_TOKEN", "ASSEMBLYAI_API_KEY"];
    for var in required_vars {
        if env::var(var).is_err() {
            error!("Required environment variable {} is not set", var);
            std::process::exit(1);
        }
    }
    let host = env::var("HOST").unwrap_or_else(|_| "127.0.0.1".to_string());
    let port = env::var("PORT")
        .unwrap_or_else(|_| "8080".to_string())
        .parse::<u16>()
        .expect("PORT must be a number");
    info!("Starting server at http://{}:{}", host, port);
    HttpServer::new(|| App::new().route("/webhook", web::post().to(handle_webhook)))
        .bind((host, port))?
        .run()
        .await
}

The main() function initializes the Actix Web server, loads environment variables, and starts an HTTP server that listens on the specified address, handling POST requests to the "/webhook" endpoint with the handle_webhook() function.

Start the application

To start the application, by running the command below.

cargo run

Make the application accessible over the internet

Now, let’s make the application accessible over the internet using ngrok. To do this, open a new terminal tab or window and run the command below.

ngrok http 8080

The command above will generate a F orwarding URL in your terminal, as shown in the screenshot below. Copy it and keep it handy for the next step.

Terminal window with ngrok command output displaying online status and HTTPS session details.

Connect the app to Twilio's WhatsApp Sandbox

To connect your WhatsApp to the Twilio WhatsApp Sandbox, navigate from the Twilio dashboard to Explore Products > Messaging > Try it Out > Send a WhatsApp Message, as shown in the screenshot below.

Instructions to connect to Twilio's WhatsApp Sandbox with QR code and phone number example.

Next, on the Try WhatsApp page, copy your Twilio WhatsApp number and send the displayed join message to that number, as shown in the screenshot below.

Screenshot of a WhatsApp chat integrating with Twilio Sandbox, confirming setup and providing instructions.

Configure the Twilio WhatsApp webhook

For the application to receive incoming WhatsApp messages, you need to add the application endpoint to the Twilio WhatsApp webhook. To do this, go to the Twilio Try WhatsApp page, click on Sandbox Settings, and configure the settings as follows.

  • When a message comes in: add the generated ngrok forwarding URL and append "/webhook" to the end of the URL
  • Method: POST

After configuring the settings, click the Save button to apply your changes, as shown in the screenshot below.

Screenshot of the sandbox configuration settings in a web application, highlighting the receiver endpoint URL and methods.

Test the application

From your WhatsApp number, send a voice note to the Twilio WhatsApp number. You should then receive the transcribed text of your message, as shown in the screenshot below.

Screenshot of a WhatsApp conversation with Twilio Sandbox, including a voice message and a text response.

That’s how to create WhatsApp voice transcripts with Rust

In this tutorial, you learned how to create transcripts of Twilio WhatsApp voice messages using Rust, Twilio's WhatsApp Business API and AssemblyAI. Whether you’re building a WhatsApp customer service bot or a voice-driven analytics system, handling voice messages can significantly enhance customer engagement by enabling users to send voice notes to interact with your services.

Popoola Temitope is a mobile developer and a technical writer who loves writing about frontend technologies. He can be reached on LinkedIn .