Replacing Absolute Paging and Related Properties

Replacing Absolute Paging and Related Properties
April 15, 2015
Written by

Hello Everyone,

tl;dr: We are going to sunset the following properties and parameters across all endpoints that return List Resources on 9/14/2015:

Parameters:

  • page (without the PageToken)

Properties:

  • numpages
  • total
  • lastpageuri

Also, accounts created after Tuesday, 6/16/2015 will no longer have access to these properties - just as new accounts created today don't have permissions to use the page parameter by itself. Note: If you are using recent helper libraries and aren't storing the counts mentioned above, you should be unaffected by this deprecation.

Based on the work to deprecate these features, change the docs, and update our official helper libraries for the past 3 years - this change should impact very few customers. Still, this hasn't been an easy decision for us and we want to explain why we're doing it. As we say at Twilio: "Let's start with the why...".

Phase 1: Absolute Paging

Way back in 2008, and then again in 2010, we had to decide on a way to let people page through the records returned by the API. The most straightforward solution was Page Numbers. Easy for the customer to understand: want page 10, just ask for page 10. Easy for us to implement in MySQL:

SELECT …
LIMIT Page * PageSize, PageSize

We could also perform the same query with a count, and then we could provide all kinds of nifty information, like the total number of records, the number of pages, the first page URI, next page URI, previous page URI, and last page uri. What could go wrong?

The first problem we hit was people using Twilio too much. Tables started growing to millions and then billions of records, and beyond. LIMIT 100, 50 will come back quite quickly, but as your offset grows so does the query time, so LIMIT 150000, 50 is not nearly as snappy. We looked for technical solutions like caching offsets or using deferred joins; we tried a lot of transparent ways to solve this problem.

The next problem we hit was counting. It seemed natural and obvious that if you were going to allow people to give you a page number, they needed to know how many pages there were. The only way to calculate that was to take the total and divide by the page size. Easy peasy. As totals became larger and larger, though, API requests started to slow down. It took 1ms to pull the first 50 SMS records, but 400ms to count how many SMS records there were for an account (or 4,000ms, or 40,000ms, or worse, depending on the result set).

Phase 2: Relative Paging and Cached Counts

To solve the first problem, we introduced the concept of Relative Paging using AfterSid and BeforeSid, and changed the next page uri and previous page uri to use them. Most people didn't want to jump around their dataset, getting Page 17 and then Page 44 and then Page 3. People just wanted to get the next page, and the next, and the next, until they got through them all or hit something they'd already seen before. With AfterSid and BeforeSid we could change our queries to:

SELECT ...
WHERE Sid > AfterSid LIMIT PageSize

With appropriate indices, our queries were lightning-fast again.

As for the second problem, we reached for a tried-and-true tool when faced with calculating something expensive that never changes: caching. We could just take the list of filters, generate a hash based off of them, and we had ourselves a caching key. Now the expensive count was performed exactly once, and every time after that we could load it from cache. We could then quickly calculate all the records not in the cached count to bring the cache up to date. This resulted in a system where the first 50 SMS records took 1ms to retrieve + 1 ms cache lookup + 5ms progressive count calculation + 1ms cache store. Even though it was a few more steps, all the steps were significantly faster — a big win for everyone.

These two improvements allowed us to continue providing the same responses even as our datasets grew tremendously. Absolute Paging still wouldn't work for large page numbers, but we encouraged people to page via next page uri and previous page uri, and for the vast number of use cases this performed well and provided all the necessary data.

One thing we realized along the way was that customers weren't just using counts to calculate Absolute Paging boundaries. Some were using counts to get usage patterns; to address this need we launched the Usage Records API to let customers extract information they need.

Phase 3: PageTokens and the end of Caching

Everything was fine: helper libraries followed the Relative Paging uris and the caching system for counts worked. We updated our documentation to deprecate Absolute Paging in 2012 and have advised everyone to use the Relative Paging uris provided by the API. Nothing could stop us, until some developers in Twilio started listening to Rock n' Roll music and playing around with NoSQL gateway drugs like Memcache and Redis. Soon enough there were production services successfully running on Hazelcast and Dynamo.

These new datastores had their own ways of paging that required different information than AfterSid and BeforeSid. To provide support for our non-relational friends, we made a change a few months ago: we ditched AfterSid and BeforeSid in favor of a PageToken. Since everyone was just following urls, we could make this change without anyone noticing.

Counting, though, has proved a little more tricky. Twilio has traditionally provided a rich set of filters for every list. Combined with the recently-launched HTTP DELETE feature, this means that caching counts is impossible, because a DELETE requires that an uncountable number of cache keys need to be updated. A deleted record may have been cached in any number of cached counts, and updating or invalidating them was not possible due to the sheer combination of filters. One of the two hard problems in computer science, cache invalidation, had reared its ugly head.

Result:

We were stuck in the classic problem of having three desirable properties, rich list filtering, high performance, and counts, but we had to choose two. The options we debated between were:

  • Rich Filtering + Counts, but with dreadful performance. We'd need to perform expensive count calculations on every request.
  • High Performance + Counts, but no filtering. We could eke out the performance needed with smart caching.
  • Rich Filtering + High Performance, but no counts. This is what we went with.

By not having to calculate the counts on demand, we can continue to support Rich Filtering and High Performance. The only penalty is that there are no more counts in the result set, because it's too costly to calculate. The impact, though, is minimal, because for over 3 years there have been a count free way to page (via Relative Paging uris) and alternative ways to get your Usage Information, through the Usage API. Of the three options, this one had the least downside.

Lessons Learned:

  1. Paging with limit and offset works great with small offsets, but there are penalties to be paid with large offsets. There are workarounds for this, but they may not be viable in all situations.
  2. Paging with limit and offset works great in an RDBMS, but at some point you might find yourself in a datagrid or some other futuristic place that has no concept of limit and offset.
  3. Paging URIs are best provided as hypermedia in the response, either as part of the payload or via stylish link headers. In either case, asking the client to construct the next page URI or previous page URI distributes your paging strategy and makes it much harder to change.
  4. Information generation has a cost. Although it might be easy for smaller datasets to provide counts and every filter you can think of, it still costs something and those costs might not scale well. Understand what happens to those costs when the data set size changes by an order of magnitude (or three).
  5. No one is perfect. Every decision we make is based off of our best understanding of the data at hand. Baked into our decisions are assumptions that might not hold and changes we may not be able to see coming. Sometimes we have to admit that a previous design decision wasn't sustainable, endeavour to understand the problem space better, and make changes that provide the best sustainable features.

The Takeaway:

Maintaining a public API is a difficult, but rewarding, task. Developers build amazing software with Twilio everyday. A large part of what makes that possible is our commitment to making sure API changes are backwards compatible.

The API Team has been considering this change carefully for over a year. We've made the series of incremental changes above to reduce impact and examined our production traffic to understand how real world customers are using this feature. Making a backwards incompatible change is not something we do lightly, but sometimes it is necessary to move forward.

This change will place us in a position to provide accurate data quickly and reliably for the foreseeable future. Absolute Paging, which has been deprecated for years and which doesn't work reliably for large Page Numbers, will be disabled starting 9/14/2015. Totals, number of pages, and last page URI have been removed from all helper libraries today in a new major version. Starting on Tuesday, 6/16/2015, all new accounts will no longer be returning the total, number of pages, and last page URI in their list responses. On 9/14/2015 these fields will be disabled for all existing accounts.

We recommend that you use the most recent helper libraries to get your records, as shown in this sample. Furthermore, if you're using the properties above to aggregate data, you should utilize the usage API instead.

If you have any questions, feel free to reach out to us via help@twilio.com.

Update:

We had previously listed 8/31/2015 as the sunset date, that is now postponed to 9/14/2015