In this post, we will see how Amazon Route 53 can assist to smoothly failover from one unhealthy resource to a healthy resource with (almost) no impact to users.
This will be covered throughout the post:
- Amazon Route 53 basics
- Steps to achieve DNS failover
- Hands-on to achieve DNS failover
Amazon Route 53 gives the customer the possibility to register new domains, to route Internet traffic to domain resources and to check the health of the domain resources.
Amazon Route 53 allows the customers to choose how and to which resources the traffic will be routed to. The traffic can be routed based on latency or weight of the record sets resources.
Along with this, Route 53 can help customers with failover between domain resources.
Multiple failover configurations are possible:
- active-active – the traffic is routed to all resources and if one of them becomes unhealthy, the traffic is not routed to it anymore
- active-passive – the traffic is routed only to the active resource and if it becomes unhealthy, all the traffic will be routed to the passive resource
The unavailability of resources is detected by using health checks that monitor the reachability of specific IP or domain on TCP or HTTP or HTTPS protocols on chosen port. Very often this port is either 80 or 443.
Once the primary(active) resource is detected as unavailable, the traffic is shifted to the secondary(passive) resource.
When the primary resource is considered again healthy, the traffic is shifted back.
This diagram explains how Amazon Route 53 will handle the failover between the two DNS zone record sets:
The main site is running in a VM in Azure. This VM has lighttpd running and to access the website, using the public IP address is enough to see the content:
And the website that will be used in case of a failover, is running in AWS S3. The endpoint link can be used to access the backup website:
The two websites should have identical content so that the client will not know that something has gone wrong with the primary site.
In this case, the two index.html files are slightly different so that we can monitor the failover.
This is the website from Azure VM:
And this is the website from AWS S3:
A client expects to access a single link and the failover should be transparent to him/her. More than this, the website, regardless where the data comes from, should be accessed from the same easily remembered link and not some cryptic URL.
awswork.com domain will be used to point to these URLs.
The record set pointing to Azure VM IP address will be configured as primary failover record type and the record set pointing to AWS S3 endpoint will be the secondary failover record type.
Then a Route 53 health check will verify if the primary record is accessible and failover to secondary one in case something is wrong with the primary record.
The health check will verify if the IP address or the domain are reachable on the configured protocol(HTTP, HTTPS, TCP) for the configured protocol. In this case, because the website hosted in Azure VM is primary, the health check needs to know if everything is fine with it so it can failover to secondary location.
Additionally, for faster failover, the health check timers can be optimized:
Next, there is the possibility to trigger an alarm when the health check returns an error during monitoring:
As the website hosted by the Azure VM is up and running and Route 53 health check does not return any error, the status will be healthy:
It is time to create two record sets in the awswork.com domain.
The first one will point to the IP address of the Azure VM. Because we want to implement failover, the routing policy configuration will reflect the type of routing policy:
The Azure VM is the primary record and needs to be monitored using the previously created health check:
The second record set from the awswork.com zone will point to S3 endpoint storing the backup site and will be configured as secondary failover record with the same type of routing policy:
Once the DNS zone is configured, if awswork.com URL is accessed, we should see the website stored in Azure VM:
Let us try if the failover works and for this, the web server from Azure VM was stopped which should cause the health check to fail in 30 seconds:
Once the health check fails, Route 53 will redirect the users to S3 content as per below:
Starting again the webserver on Azure VM, will clear the alarm on health check and the clients should again be routed to the Azure VM website:
And this is how you can enable DNS failover on Amazon Route 53 between two endpoints, in this case, an Azure VM and Amazon S3.