
While AWS components are generally resilient, when you run at the scale of Twilio, it becomes necessary to fine-tune things to achieve the highest level of quality and availability.
In this post, we’ll examine how to improve the Elastic Load Balancers (ELBs) to increase their fault tolerance. Using custom health checks and multivalued DNS records, we will be able to obtain fine-grained metrics on the availability of each of the ELB constituent nodes. With these metrics, we can adjust the self-healing behavior of the ELB with any criteria we consider for our purposes.
Requirements
In order to implement the fault-tolerant ELB solution, you need an AWS account with permissions for creating:
- Route53 hosted zones and records
- DNS Health Checks
- Elastic Load Balancers
- CloudWatch Alerts
About the Elastic Load Balancer (ELB) internals
In order to understand the solution, it’s necessary to know a little bit about the internal structure of …