Cloud applications have generally earned a strong reputation for reliable uptime. But it’s still critical to monitor your cloud-hosted systems and applications with continuous testing of availability and response time. These four monitoring tips can help you count on the cloud.
Cloud service providers often deliver three nines or better of uptime, but outages do happen. Even with all the redundancy and other protective measure the major players have built into the Web architecture, their systems are still taken down by a variety of glitches. Here are a few examples:
- An expired SSL certificate resulted in a 12-hour global outage of encrypted storage traffic for the majority of Microsoft Azure users (and a 24-hour outage for some).
- An internal DNS error that cost Apple $2 million per hour while its App stores, iTunes, and other services were down.
- A data center virtual network breakdown caused a two-hour forty-minute outage for Google Compute Engine.
Silent Slowdowns Can Sap Productivity and Sales
Outages shouldn’t be your only concern regarding cloud services. Slowdowns can behave the same as outages to your users — and what’s worse, they can easily go unnoticed until users complain. Or maybe they won’t complain, and a recurring source of lost productivity or sales will go undetected.
If your sales force, for example, uses a cloud-based CRM system that’s taking 15 seconds for log-ins, you need to know. Essentially, you’re paying people to wait. If your shopping cart isn’t providing a fully populated response for eight seconds, that system is as good as down for most potential buyers.
Don’t let these slowdowns go undetected. Monitoring cloud applications can be tricky, but your cloud applications’ ongoing performance and history deserve the same level of visibility on your network management dashboard as other critical systems. An Application Performance Management tool, APM, allows development teams to proactively monitor and improve application performance. APM tools, such as Stackify Retrace, provide code-level insights along with integrated logging to identify more issues in QA and continuously observe applications in production environments.
Monitoring the Cloud
Beyond validating your cloud applications’ current availability, your monitoring tools should be automatically tracking these elements:
- Transactional steps: If several steps need to occur in sequence (initial authentication, database calls, middleware steps, etc), confirm whether any of these steps is inactive or slow. If so, you should be able to determine the faulty element’s effect on overall availability.
- Latency: If network delay is seriously affecting a Web-based application, you need to track that latency and follow up to resolve the issue if possible. Latency can especially cause service delays for mobile users, so if that platform is essential to your business, you need to have the ability to see at a glance whether latency could be creating or aggravating a service issue.
- Response time alerts: Set alert levels for page load times, For many applications, responses slower than 1.5 – 2 seconds mean service has been seriously compromised.
- Server/network timings: If the data you’re seeing about the performance of your cloud environment isn’t granular enough, you probably won’t know whether service issues are related to network issues, server configuration, or even page or script design.
Have the Data You Need To Hold Cloud Service Providers Accountable
Knowing the real-time status of cloud-based systems may give you time to prepare for the effects of an impending outage. You may be able to take corrective action, or at least communicate to affected users so they’re aware of the problem and can act accordingly.
The ability to see historical information at a glance, and produce reports to document it, is also important. With this data in hand, you can hold your service providers accountable. If they’re not delivering on the service level requirements they’ve committed to, you need to show them what’s happening.
If you’ve done the hard work of migrating bare metal services to the cloud, you’ve probably seen an increase in uptime, and that’s great. But the cloud’s dramatically increasing role in IT system infrastructure will likely create more complexity and more service issues.
Prepare yourself now to handle emerging cloud service issues by monitoring cloud-hosted applications thoroughly.