What is the “Goldilocks” number of cloud-computing accounts to limit your blast radius?

March 30, 2017
Written by
Brandon Sherman
Contributor
Opinions expressed by Twilio contributors are their own

Twilio Bug Logo

When signing up for an IaaS account provider, like Amazon, Google Cloud, or Azure, you’ll be asked to provide or link an email address and password.  After a brief wait, possibly a confirmation email, you are logged in— dropped at an empty console, teeming with possibilities.  Is this where the next billion-dollar unicorn starts?  What will you build today?

Fast forward a hopefully small amount of time, to the point when your application begins to grow by leaps and bounds.  Instead of just one person at the console, or even one team, many separate people with differing roles and responsibilities have a need to interact with that one cloud computing account, with its one logon to rule them all.  When is it time to open a second account, with a separate all-powerful login?  Why not a third account?

There are many reasons why you may want to have multiple separate accounts:

  • Cost attribution is simpler
  • Permissions are easier to scope
  • Logical separation of services
  • Logical separation of environments
  • Limiting the blast radius when something bad happens

 

The first four are easily understood, but the last topic— that of blast radius— requires additional thought and planning.

Blast radius

The concept of blast radius is simple: If there was an “explosion” in your infrastructure, how far would the damage extend?  A well-architected solution should be capable of containing an explosion without harming a service’s ability to function.  One solution to containing an explosion— whether caused by an attacker or your own software gone rogue— is to utilize multiple accounts.  In cloud computing environments, separate accounts put hard limits on credentials such as API keys.  Especially risky are root credentials assigned to the initial user or others who have been bestowed with super-user capabilities.  This is especially risky when a credential can provision other credentials within an account, as they provide the capability to insert “back door” credentials to a service.  Often times, these initial super-user credentials cannot be limited or removed, but even among other service or human accounts, there is the risk that someone (or something) with the appropriate level of access could wipe out everything they have been granted access to.  In this case, the blast radius is your entire infrastructure.

 

To reduce the reach of these powerful credentials, a wall should be built in order to contain each one.  Simply put, to reduce the damage a single API key is capable of, build another account.  Each account is like a fence, and “good fences make for good neighbors.”

Benefits of your second account

There is a clear benefit from expanding from one account to two accounts.  Development and Production workloads can be separated, making a logical difference between resources more akin to a physical separation.  This separation even improves the speed of an organization.  An intern or junior developer can now deploy experimental systems without accidentally harming the day-to-day business operations, and if that experiment goes poorly— say they deployed an old version of WordPress— the Production account stays untouched.

 

This separation of Production and Development will work well until the first outage stemming from a difference between Production and Development.  At the post-mortem, it will be decided it makes sense to have a Stage account which will perfectly mirror the Production account.

Benefits of your nth account

We’re on a roll now!  Deploying one production-grade service shouldn’t have to interfere with other services.  AWS sets account-based limits on how many EC2 instances are allowed to be running at any given time.  If one service in your production account suddenly has to scale up, it could deprive all the other services of room to grow; if one service in your production account is suddenly breached, it could provide the API credentials to terminate the instances of any team in the same account.

It is a reasonable course of action to cut each service out of that monolithic Production account and give it its own group of Dev/Stage/Prod accounts.  This further separates resources, creates more walls, and further reduces the blast radius.  Now, the extent of the damage can be a single service.  A compromise in one small portion of the overall business is safely contained.

But wait, there’s more!  Let’s put each segment of a service into it’s own separate account.  This is not a new concept by any means; the three-tier web app concept, the model/view/controller paradigm, containers, virtual machines… the list goes on.  Why not apply the same segmentation applied everywhere else to the infrastructure that runs a service?  This is possible within a cloud environment where Infrastructure is Code.  Before, in a datacenter world, the smallest level of segmentation possible would have been a redundant datacenter across town.  Now, we can separate even the datacenter that provide for an application.  If an attacker breaks into the web servers providing a website, the compromise can be contained and the entire account hosting the web servers can be burned down and a new one, with pristine servers, spun up in minutes automatically.

“Goldilocks”

Let’s take a step back; just because each monolithic application can be decomposed into services which are then broken into micro services doesn’t mean each segment of a microservice has to be sharded across a half-dozen accounts.  Although Amazon and Google have both introduced a product called Organizations to ease the management of multi-account architectures, there are technical hurdles that can interfere.  As one such example, the management of MFA tokens of human users in many accounts simply does not scale, although solutions exist.  If extremely low-latency communications are essential to the functioning of your application, it may be difficult to achieve these goals while bouncing through a dozen separate accounts with separate IP space.  While there are workarounds and solutions, just because something is possible doesn’t mean it should be done.  Splitting each sub-component of a microservice out to each live in it’s own account may not be workable from a cost, automation, or human perspective.  There are clear tradeoffs between one account, many services and one service, many accounts.  While looking at this spectrum, it raises the question alluded to at the beginning of this post: What is the “Goldilocks” number of accounts?

 

There isn’t one.  No one formula can correctly account for the complexity of your service, the risk your business finds acceptable, or the overhead of managing multiple accounts.  However, there is a point, different for each unique service, that is sustainable and manageable and which provides a reasonable level of decomposition.

 

Arguments about the overhead multiple accounts can cause are solvable because the infrastructure is already code— and so too should your security.  DevSecOps posits that security is code.  Having the right number of accounts, not too many and not too few, but the number of accounts which is just right is an assessment which has to be done individually.  But I can promise you, the bears would have been a lot less hungry if they didn’t keep all their porridge in the same account— er— house.