Luke Skywalker: Ben! Why didn’t you tell me? You told me that Darth Vader betrayed and murdered my father.
Obi-Wan Kenobi: Your father was seduced by the Dark Side of the Force. He ceased to be the Jedi Anakin Skywalker and “became” Darth Vader. When that happened, the good man who was your father was destroyed. So, what I told you was true… from a certain point of view.
I’m going to go out on a limb and assume that everybody reading this blog recognizes the above characters and movie dialog. The instructive phase is “from a certain point of view”. It got me thinking about the idea of application monitoring within an Network Management System (NMS) tool. One of the core functions of such a system is to provide this kind of visibility. A customer might call up and say, “We just deployed Okram ERP version 10 and want to monitor it. Can you help us?” Like usual, the answer is “It Depends”. Depends on what you may ask? Perspective and point of view. Ultimately, I want to know from what perspective visibility is needed. I’ve written previously about the necessity of taking a more holistic approach to monitoring and that advice still holds. However, that doesn’t absolve otherwise capable IT personnel team from their duty to undertsand the details and how they all fit together to make things run smoothly. Let’s investigate the various points of view you want to consider when monitoring your critical applications.
Peek-a-Boo, I See You
The first perspective to consider when monitoring applications are the user experience components. These are, arguably, the most important pieces because when they break not only do your monitoring systems let you know, but so will your users and customers. Anybody who’s been an IT professional for more than 15 minutes surely knows that Hades hath no fury like an end user scorned. For web applications this type of visibility translates to running web-based checks from inside and outside your network and validating response times.
Availability monitoring is often the first component of the end user’s experience as it determines your site’s availability, uptime, and response times. Most application performance management tools test uptime and response times for your site by periodically pinging predefined routes and reporting back on them. An APM tool, like Stackify Retrace, provides a dashboard of uptime checks where you can see how your site is performing. You can visualize if your site is experiencing unreliable performance by the number of spikes throughout the day. Evaluating uptime checks can help in identifying development trends to aid in zero-downtime deployments.
If the application in question isn’t web-based, then some kind of end user validation is advisable. Perhaps you can script the client component to simulate user behavior? Powershell, Applescript, and BASH are valuable allies in this endeavor.
Packets, Packets Everywhere, but Not a Frame to Link
Once we’ve defended ourselves (or mitigated the threat) from a pitchfork-wielding user base there are additional angles to consider when validating application health. Network connectivity is one such perspective. Obviously, if the network isn’t working properly, then there’s a high chance there will be other problems as well. The first big variable that comes to mind is network latency. Hopefully, in your network, it’s standard operating procedure for the network engineers to have their fingers on the pulse of site-to-site latency. However, the power of measuring with regard to application health is to correlate latency spikes with application-specific variables. Does latency jump when a user in Madison executes a query for a part located on a database located in Denver? More importantly, what does that spike tell you about your application status and its design?
Another aspect of application monitoring that falls squarely in the network connectivity camp is Layer 4 OSI monitoring. Want to make sure an application is listening for connections properly? Set your NMS system to make either a UDP or TCP port check. This method of monitoring is an excellent facsimile when you lack the ability/tools to monitor the full application stack. I’ll often take advantage of it when the application in question isn’t web-based. Another benefit of monitoring at this level is that it can quickly tell you when your Information Security people are playing fast and loose with network access control lists.
Poppin’ the Hood
The last point of view you absolutely want to consider when monitoring your applications is from the perspective of the backend systems themselves. These are components that your end users can’t see and your network engineers likely have little interest in. What specific variables are we talking about here? Obviously, the “UP”ness of all the major systems that make up the application are your target. That means database-specific servers, virtualization hypervisors, load balancers, and many other are in play. However, there are individual elements of each of these servers that need monitoring too. Remember, the ultimate goal is to choose variables to monitor, which have a direct impact on the behavior of your application. For example, are there specific processes in your middleware server that should never consume more than 100MB megabytes of RAM? Do certain levels of disk utilization lead to slow database queries and a bevy of user complaints? If yes, then these are items you’ll want setup in your NMS.
Tying it All Together
Now what? Presumably you’ve got the ability within your NMS to logically group together similar “things” like HOST status, threshold exceptions, availability commands and others. In this way you can create a logical group that represents the application to monitor. From that point it likely isn’t an arduous task to configure very targeted alerts from your NMS informing administrators and application owners precisely what’s happening with their systems. However, it’s also trivial to get high-level business-centric reports on application health that managers can use to assist them in their decision making process. When all is said and done, unlike Luke Skywalker, considering all the perspectives of your mission critical applications is easy … kinda’ like using a lightsaber to cut through butter.