The cloud offers unparalleled flexibility. However, that flexibility comes at a cost. The amount of moving pieces increases. The environment becomes more heterogeneous. So, if you want to stay on top of things, you need a more comprehensive view of your cloud infrastructure. After all, you don’t want your customers to realize that something has gone awry before your people do.
In this post, I’m going to talk about cloud monitoring. I’m assuming that you’re already familiar with the domain—maybe you’ve already set it up in your organization. So, let’s go over some ways you can extract more value from your cloud monitoring tools so that you get more out of your investment.
Get Visibility Over an Increasingly Complex Ecosystem
If you’re building a cloud native application, it’s almost a certainty that you’ll be using cloud monitoring tools from a wide variety of vendors. Only the biggest technology companies have the resources and the mindset to develop everything in-house. And even those companies rely on third-party software here and there.
Increasingly, that extends to the cloud provider itself. According to some reports, multi-cloud approaches are the norm across the industry. What does that mean for somebody monitoring an application? At the very least, it suggests that the native monitoring provided by a cloud provider might not be enough.
Every serious cloud provider gives you a host of cloud monitoring tools for their services. However, those tools don’t usually extend past the boundary of that provider. This leaves you potentially using multiple disparate monitoring systems. That’s a risk, as your development teams can be spread too thin as they try to use all those different tools effectively.
That’s where a dedicated monitoring provider comes in. If you use one, make sure it integrates well with the tools you want to use. That’s going to save you a lot of effort. Netreo is an example of a monitoring platform that provides a vast array of integrations. Don’t underestimate the benefits of having a simpler ecosystem. Your developers will thank you for it.
Focus on Automation
Automation is one of the most crucial practices to handle infrastructure at scale. Let’s say you have a microservices-based system distributed across multiple regions. You can’t hope to handle that complexity unless you’ve automated most of it. Without automation, that complexity will grind your development to a halt sooner or later. Or worse, it’ll lead to an elevated rate of mistakes that can harm your business.
What does that have to do with monitoring, you might ask? In my opinion, it has a lot to do with it. Monitoring is infrastructure. Thus, the same principles apply. For instance, if you have automation available to map your network topology, you’ll be able to understand how data flows from and to your different applications. And that’s just one example. As you automate how you collect valuable metrics and set up new resources, you free up time for other activities to evolve and improve your products.
Dive Into More Specific Resources
The cool thing about automation is that it compounds its effect. I mentioned understanding networking in the paragraph above. Well, what about other resources?
Imagine you have a setup that provisions some EC2 instances. If you have a custom integration, you can peek into the details for every individual box. You can set alerts based on the data you get, as we’ll see in a bit. But there are other possibilities, like building custom workflows to perform defined actions as well.
You probably don’t need this level of granularity for every resource you own. Nevertheless, it’s convenient to have the possibility to dive deeper into selected elements that are particularly relevant for you.
Consider the Business Side
It’s safe to assume that most organizations are investing in monitoring capabilities today. However, I think that many of them focus predominantly on technical metrics. Don’t get me wrong—technical metrics are the core of monitoring. Still, there’s a missed opportunity by over-indexing on the technical side.
At the end of the day, systems are there to serve the needs of their users. These technical metrics mean nothing if the core business flows aren’t performing properly. That’s why it makes sense to use the monitoring infrastructure that you’re dutifully creating for that as well. Business metrics are dependent on context, so this is something that you’ll typically implement through custom metrics tailored to your domain.
One way of merging the technical and business aspects is to set service-level objectives (SLOs). Essentially, you want to define the expectations for a system based on the business outcomes, and codify them so that you can track and act upon them. Monitoring plays an important role here, as it’s significantly hard to track this manually. Thus, good support from the cloud monitoring tools is paramount to make the approach feasible.
React to Incidents Quickly and Confidently
Monitoring isn’t just about observing what’s going on. When anomalies go past certain thresholds, it’s time to declare an incident. Proper incident management ensures that you detect problems quickly and that you act confidently to solve the issue at hand with minimal user impact.
For that, you need to convert all the insights that you collect through your monitoring into alerts. This should be part of your monitoring provider so that you don’t duplicate any efforts. Needless to say, use the same mindset around automation so that you get reproducible results.
Configuring alert thresholds isn’t easy, though. You have to strike a balance between reducing false positives and false negatives. It’s easy to move too far in one direction and end up triggering alerts too often, or not often enough and miss an incident. The answer to that is smarter tools. There are techniques like anomaly detection, where the tooling leverages the data points to collect to identify suspicious patterns. That way you need less manual tuning, which can be error-prone.
Move Sideways to Different Parts of the Stack
A significant benefit of using an integrated monitoring provider is the ability to extend your monitoring to other areas with little effort. Let’s consider synthetic transactions. Assume you already have a rich overview of your infrastructure. Adding high-level flows is the cherry on top that will make it less likely that you’ll miss any worrying trend.
Adding other types of monitoring is only a small incremental addition. That investment is worth it, and that’s when you start to reap the benefits of your tool. Another advantage of using a unified tool is that it tends to be easier to move across these different components. Jumping from a synthetic to a dashboard and then zooming to a single resource will save your developers many headaches. With enough effort, you can build it yourself, but it’s probably going to cost more than using an existing tool.
Get Started With Monitoring
In this post, I’ve talked about ways of leveraging great tools to monitor cloud infrastructure more effectively. In summary, these are my recommendations:
- use all the integrations available to cover as much ground as possible
- automate everything that you can automate to reduce manual work
- dive deeper into the resources where you need that extra level of visibility
- don’t stop at technical metrics, but consider business metrics as well
- use the metrics you collect to get meaningful alerts to act promptly
- use cloud monitoring as a bridge to move to other aspects of your application
Netreo is a monitoring provider that offers you all the capabilities that I’ve listed above. If that piques your interest, get a demo here.
This post was written by Mario Fernandez. Mario develops software for a living—then he goes home and continues thinking about software because he just can’t get enough. He’s passionate about tools and practices, such as continuous delivery. And, he’s been involved in frontend, backend, and infrastructure projects.