Nowadays, monitoring is very important. Why is that? Because applications become more and more complicated. But not only applications—infrastructure becomes complicated too. Some companies are moving to the cloud; others are building hybrid infrastructure.
And if some pieces of infrastructure are in the cloud while others are on-premises, this creates even more unclarity on how to get an overview of the infrastructure as a whole.
In this post, you’ll learn how to monitor hybrid cloud infrastructure well.
What Is Hybrid Cloud Monitoring?
First, let’s clarify how hybrid cloud monitoring differs from traditional monitoring. The biggest difference is that we have two very different pieces of infrastructure that we need to monitor.
Traditional tools used for on-prem monitoring are not doing well in a cloud environment and vice versa. Therefore, either you need to make a compromise and use two different monitoring solutions, or you need to find one that works well with both on-prem and cloud infrastructures.
So let’s discuss how you normally monitor the two, and from there, we’ll see how to create good hybrid cloud monitoring.
Monitoring On-Prem Infrastructure
When you have to deal with bare metal servers and your own network equipment, you will focus on different things than when in the cloud. You’ll have to look at all the low-level indicators like CPU temperature, hard disks health, load balancers saturation, etc. Even if you have some layer of abstraction on top of your bare metal machines (for example, virtualization or container orchestration), you’ll still have to monitor the underlying machines. You’ll probably need to monitor even things like cooling fan speeds and UPS status.
The goal of your on-premises monitoring will also be slightly different from the monitoring you do in the cloud. Scaling and capacity planning is different on-prem. Therefore, when monitoring usage of on-prem machines, you’ll more often look at long-term usage patterns. Since scaling a data center often takes weeks, you’ll have to predict the need for increased capacity way earlier.
The actual tools you typically use for on-prem monitoring also differ from those you’d use to monitor the cloud. Traditionally, these tools require you to install and manage the monitoring server somewhere in your infrastructure. This means you’ll also need to care about things like capacity monitoring and upgrades of the tool itself.
Even though the general idea of monitoring stays the same, if your applications are running in the cloud, your monitoring focus will be different. For starters, you won’t care about (and you won’t even have access to) things like temperatures and fan speeds.
Generally, unlike with on-prem, you’ll only monitor one or two layers on machines. For example, if your application is running in a virtual machine, then in the cloud, you’ll only have to worry about CPU and memory usage of that virtual machine. However, on-prem, you’ll have to monitor not only that virtual machine but also the underlying bare metal server.
Another thing is that the cloud gives you a lot of flexibility and fast scaling. This means that your monitoring needs to react quickly to spikes in resources consumption in order to effectively use autoscaling capabilities.
Speaking of autoscaling, what you’ll definitely want to monitor when in the cloud is cost. Unlike with on-prem where your cost is static, in the cloud, you (usually) pay per use. Therefore, you’ll not only want to monitor the overall growing cost, but you’ll also need to find overutilized (or even unused) resources in order to scale them down (or shut them down) to save costs.
When it comes to the tools, those designed to monitor the cloud often come in software as a service (SaaS) form. You don’t need to install and manage the actual monitoring tool. You only need to send all your metrics to it.
How to Monitor Hybrid Cloud Infrastructure
As we mentioned before, you have two options.
One solution is to use two different tools (one better suited for on-prem and one for the cloud). Surprisingly, this is a very common approach but for the wrong reasons. Companies don’t build hybrid infrastructure from scratch. They end up with hybrid infrastructure when they want to slowly migrate their on-prem to the cloud. It means that, usually, they already have an on-prem monitoring system designed and that has been running for many years. That’s why when they take the first steps in the cloud, they typically create a new monitoring system for the cloud. This is because a completely new team is often assembled to manage the cloud environment. But monitoring both environments separately brings many disadvantages.
In a hybrid cloud, part of your system is running on-prem and part in the cloud. But at the end of the day, it’s the same system. They work together. Therefore, monitoring parts of it with one tool and parts with another puts you in the risky position of “missing the big picture.” Some companies realize that, so what do they do? They extend the on-prem monitoring to monitor some bits of the cloud, and they try to monitor some bits of on-prem with the cloud monitoring tool. This partially solves the problem of having an overview of the whole system but creates a new problem. Now you monitor the same systems twice. That’s not efficient.
So what’s the solution? Unified monitoring!
Don’t treat a hybrid cloud as two separate environments. Treat them as two parts of the same system because, in fact, they are. Use a tool that can monitor both on-prem and cloud systems well.
Even if you have separate teams managing both sides, you’ll still benefit from unifying your monitoring. Your on-prem team will be offloaded from the management tasks of the monitoring tool itself. At the same time, your cloud team will have great visibility into the on-prem part of the infrastructure. In hybrid environments, clients’ requests usually need to touch a few systems both on-prem and in the cloud in order to fulfill the request and send the response.
Without visibility in both systems, cloud teams can only guess “if these errors are not coming from us, then it must be something wrong in the on-prem side.” And the same guessing will happen on the on-prem side. This leads to a long debugging process, which also means that in case of a real disaster, your recovery time will be much longer. With unified monitoring, there is no need for guessing.
Combining all the data from all the sources in one monitoring system brings another advantage: the ability to correlate data from different sources. Increased CPU usage in the cloud doesn’t necessarily mean there’s something happening in the cloud. It can mean that the on-prem systems are responding slower to cloud requests; therefore, cloud systems need to put in extra effort to keep up with the demand (more CPU used for caching or more CPU I/O waiting time). Unified monitoring helps you uncover such situations easily.
Netreo to the Rescue
One such tool that can effectively monitor both on-prem and cloud infrastructures is Netreo. In fact, not only can it monitor your hybrid cloud environment, but it can also monitor the applications running on that infrastructure. Therefore, it’s a complete full-stack monitoring solution for all your needs. But don’t take my word for it. Register for a Demo here and check it out for yourself.
The general idea of monitoring a hybrid cloud is simple. You need to know what’s happening to your on-prem and cloud environments. In practice, this is often achieved by having two separate monitoring solutions. And while this is not completely wrong, with this post, we tried to show you the benefits of a different approach.
Monitoring your hybrid cloud with one monitoring tool is simply better. It not only helps to decrease debugging time for both on-prem and cloud teams but also allows you to correlate data from different parts of the system. On top of that, if you add application monitoring to the same solution, you’ll find yourself having great visibility into any piece of your system.
If you want to learn more about how to increase the efficiency of your hybrid cloud monitoring by adding application monitoring to it, check out this blog post. And if you want to know how to squeeze every bit of performance out of your data center, check out this blog post too.
This post was written by Dawid Ziolkowski. Dawid has 10 years of experience as a Network/System Engineer at the beginning, DevOps in between, Cloud Native Engineer recently. He’s worked for an IT outsourcing company, a research institute, telco, a hosting company, and a consultancy company, so he’s gathered a lot of knowledge from different perspectives. Nowadays he’s helping companies move to cloud and/or redesign their infrastructure for a more Cloud Native approach.