Latency Extension for A2A: Optimizing Agent Routing for Real-Time Communications

July 29, 2025
Written by
Paul Kamp
Twilion
Reviewed by

Latency Extension for A2A: Optimizing Agent Routing for Real-Time Communications

Back in April, our friends at Google announced the Agent2Agent (A2A) protocol, an open framework that enables AI agents to communicate securely with each other, exchange information, and coordinate actions across various enterprise platforms or applications. This June, A2A joined the Linux Foundation, enabling companies to collaborate closely on fostering an open and interoperable ecosystem for AI agents using the A2A protocol and other interoperability technologies.

Today, we’re excited to share an extension for the A2A protocol, enabling agents to declare and make decisions off on latency. This extension will give developers new tools to build truly responsive, latency-aware applications for communications and other real-world use cases that require synchronous behavior.

Why latency matters (and why bring it to A2A)

We’ve watched A2A gain traction as the protocol of choice for agent-to-agent interactions, and we set out to understand how it could help our customers meet their goals. Here at Twilio, several of our products focus on synchronous communications. During this exploration, we saw how extending the protocol to be latency-aware could be beneficial.In various synchronous domains – such as voice, chatbots, or IVRs – the difference between a 500-millisecond and a 5-second response may also be the difference between a satisfied caller and a hangup. And much of that latency is determined by the agents powering those experiences, whether it's because of model size, hardware performance, backend setup, or geographic location. Because the original A2A protocol doesn't mandate that agents declare their expected latency, we built this extension to add the functionality. It allows you to:

  • Select the best-fit agent for the response time your experience demands, or
  • Adapt gracefully – for example, adding an indicator, playing a filler prompt, or adding typing sounds if a high-latency agent is the only option

This kind of flexibility makes A2A more robust for a broader range of communications use cases – including the ones our customers care about most.

Demo: Latency-aware agent selection

In our sample application, we built a voice agent on Twilio ConversationRelay, with our new latency-aware extension to A2A aiding the model selection.

ConversationRelay lets you integrate human-like voice AI agents into any stack with Twilio Voice, combining fast speech-to-text (STT) and text-to-speech (TTS) with an LLM of your choice. We used it to spin up a quick movie and actor demo.

Video:

 

 

For this demo:

  • Three remote movie/actor agents were exposed, each advertising their latency in the A2A Agent Card.
  • When a user calls in ( “Ask me about a movie or actor”), the app identifies all available agents and routes the call to the one that matches the required skill with the lowest latency for the task.
  • If a higher-latency agent is selected, the system can automatically play a holding prompt or take other actions.

Technical Details: A2A and latency deep dive

When the Client Agent fetches the available AgentCard, a structured context of the AgentCard is generated and provided to the Client as context. This context includes the server information, such as the endpoint, authentication, capabilities & skills, etc. The latency extension is used to provide additional contextual information. The Client can then use this information to make an informed decision on how to route the request to a given server.

Example Agent Card JSON showing skillLatency

Using the AgentExtension of the AgentCard, we propose the following extension:

{
  uri: "https://github.com/twilio-labs/a2a-latency-extension",
  description: "",
  required: true,
  params: {
    supportsTaskLatencyUpdates: true,
    skillLatency:
      minLatency: ...,
      maxLatency: ...,
      // --or--
      p50Latency: ...,
      p75Latency: ...,
      p90Latency: ...,
      p95Latency: ...,
      p99Latency: ...,
      // --or--
      skillName: {
        ...
      }
  }
}

This allows the Client to know a server’s individual skills’ latency. Then, the LLM can make a decision on which server to use to accomplish a given task.

{  
  "name": "Movie Agent",
  "description": "An agent that can answer questions about movies and actors using TMDB.",
  "url": "http://localhost:41242",
  "version": "0.0.2",
  "capabilities": {
    "streaming": true,
    "pushNotifications": false,
    "stateTransitionHistory": true,
    "extensions": [
      {
        "uri": "https://github.com/twilio-labs/a2a-latency-extension",
        "description": "This extension provides latency updates for tasks. The server will send DataPart messages with latency information for each tool call.",
        "required": true,
        "params": {
          "skillLatency": {
            "searchMovies": 1000,
            "searchPeople": 2000
          },
          "supportsLatencyTaskUpdates": true
        }
      }
    ]
  },
  "defaultInputModes": [
    "text"
  ],
  "defaultOutputModes": [
    "text",
    "task-status"
  ],
  "skills": [
    {
      "id": "general_movie_chat",
      "name": "General Movie Chat",
      "description": "Answer general questions or chat about movies, actors, directors.",
      "tags": [
        "movies",
        "actors",
        "directors"
      ],
      "examples": [
        "Tell me about the plot of Inception.",
        "Recommend a good sci-fi movie.",
        "Who directed The Matrix?",
        "What other movies has Scarlett Johansson been in?",
        "Find action movies starring Keanu Reeves",
        "Which came out first, Jurassic Park or Terminator 2?"
      ],
      "inputModes": [
        "text"
      ],
      "outputModes": [
        "text",
        "task-status"
      ]
    }
  ],
  "supportsAuthenticatedExtendedCard": false
}

Try it, fork it, contribute: next steps

We’re excited to see how you apply latency-aware selection to your A2A workflows. We at Twilio recognize the importance of latency management for communications workflows – but we know the community will come up with other incredible use cases.

  • Browse the repository, try the samples, and adapt the extension for your own A2A-powered agents.
  • Found an edge case, have feedback, or want to propose improvements? We welcome your issues and PRs – let’s make real-time agent experiences better together!

Appendix and more resources

About Twilio Forward

Twilio Forward focuses on Horizon-3 initiatives focused on driving step-change innovation that empowers builders and unlocks Twilio’s next era of growth. As an incubation lab, we explore bold new ideas, from the most advanced, almost unimaginable technologies to emerging solutions that address today’s real-world challenges. Our mission is to push boundaries, reimagine what’s possible, and build what comes next.



Rikki Singh is a product and engineering leader based in Bay Area, California. At Twilio, she leads the emerging technology and innovation group called Twilio Forward. Outside of work, Rikki enjoys hiking and camping with her husband and toddler.

Kousha Talebian is a Principal Engineer from Vancouver, BC, working on the Emerging Technology and Innovation team. You can reach him at ktalebian [at] twilio.com . Outside of work, Kousha enjoys running with his dog and experimenting with various cuisines from around the world.

Paul Kamp is the Technical Editor-in-Chief of the Twilio Blog. You can reach him at pkamp [at] twilio.com