Twilio Request Telemetry

Twilio Request Telemetry
September 27, 2015
Written by
Doug Black

In 2012, NASA's Curiosity rover landed on Mars. In the three years since then they've been monitoring the health of its systems and operations by collecting telemetry--measurements and data collected by the rover and sent back to Earth for monitoring and diagnostics.

When something unexpected happens NASA is able to piece together what happened hundreds of millions of miles away by examining the collected telemetry.

While Twilio isn't quite as complex an undertaking as space exploration, we need systems that help us understand failure all the same. We have many hundreds of systems each writing thousands of lines of diagnostics information into log files every minute. But with every API request resulting in possibly hundreds of new log lines across dozens of individual hosts, collating all of these lines together to tell the story of a request's lifetime becomes difficult and time intensive.

To address this, last year Twilio rolled out request IDs internally to allow us to match each generated log line with a public API request. This has greatly reduced the level of effort and amount of guesswork required to understand how each request behaved in our cluster, and has shortened incident response time.

Introducing Request IDs and Durations

Today we begin exposing some of this telemetry to you, so that you have more context about what happened with each and every API request your application makes.

Specifically, we are now returning two new HTTP headers as part of every API response. Here's an example.

Twilio-Request-Id: RQ5cedae4e7e7e4e70937a8198f5d2d1c0
Twilio-Request-Duration: 0.270
  • Twilio-Request-Id: This is the ID of your API request. By always logging this header in your own application and providing it to us during any issues, we can investigate exactly what happened with your individual request.
  • Twilio-Request-Duration: This is the length of time--in seconds-- your request spent being processed in Twilio's cluster. By always logging this header and comparing it to the amount of time your application perceived the request to take, you can better understand where any latency in the system is being introduced.

Visit REST API: Twilio's Response docs for more.