On Friday afternoon, 17 July 2020, many internet users experienced connectivity issues around the same time. Even sites like Discord, Feedly, Politico, Shopify and League of Legends were inaccessible. Initially, it was suspected that there was some kind of Denial-of-Service (DOS) attack, but we’ve since learnt that the outage was caused by issues with CloudFlare’s DNS service.
Over the years, you may have heard of cyber blitzkrieg with fancy names like “TCP SYN”, “flood attack”, “smurf attack”, “session hijacking” and many more. However, hardly anyone expects that such a foundational service like DNS, which has been around for over 30 years, would be a point of vulnerability. The Domain Name System (DNS) is the phonebook of the Internet. Humans access information online through domain names, like nytimes.com or espn.com, which are then translated by the DNS into Internet Protocol (IP) addresses (such as 126.96.36.199) so browsers can load the website/ internet resource. When the DNS goes down, websites become inaccessible.
The accident, like many similar ones in history, reminds everyone again of how vulnerable our internet infrastructure can be, and how important it is to have a comprehensive monitoring system in place.
Here are 3 best practices to help our IT infrastructure community stay on top of accidents like this:
- Have a customized monitoring configuration for each mission-critical service, such as DNS. Very often, mission critical services and other application oriented services are mixed and monitored in a generic way. And, when accidents happen, it can be hard to quickly pin down the root cause. This is why Netreo defines many specialized monitoring templates for different device types.
- Leverage synthetic check capabilities to scan the performance of services periodically, and be alerted when services are trending toward potential issues; and,
- Deploy real-time traffic monitoring and pattern recognition capabilities. These would be helpful for troubleshooting when the crisis is underway.
The current pandemic also brings the importance of IT infrastructure monitoring to a new level. Many employees are now working from their home network, which is used to only carrying high volume entertainment content such as YouTube or Netflix videos. But now, work-related//mission-critical traffic such as Zoom video conferences or corporate VPN connections must also be supported by the average household network. Companies that can adapt to this new paradigm – and assure strong user experience and service levels across their IT infrastructure will certainly be in a better position during and beyond this global crisis.