My favorite show on television is “The Big Bang Theory” (I’m a self-admitted IT geek, what did you expect?). In a rerun I watched a few weeks ago Amy gives a purposely-nicotine-addicted monkey a cigarette as a reward for some behavior. Sure it is only a TV show, but that got me to thinking. What else could you train a monkey to do? One task came to mind recently after fairly unfruitful pre-sales consultation with an IT Manager. This guy was undoubtedly the inspiration for the Pointy-Haired Boss of Dilbert fame. He says to us “All I’m really interested in doing is pinging my devices to see if they’re up. What you guys offer is overkill and way too expensive for my needs.” I thought to myself, “maybe he’s right?” Any monkey can be trained to ping a list of devices to see if they’re up or not. What’s the point of putting in a big, complex monitoring tool when it’s really a simple problem to solve?
Take a look at the image above. The code on the left took me all of 90 seconds to write and implement. The snippet on the right took just a little longer. Both
are simple ping utilities. They can be run repeatedly from a task scheduler in any operating system to give you UP/DOWN status on a list of devices. Monitoring problem solved, right? Not quite. All full-featured network management systems have the ability to aggregate multiple types of information including availability (up/down status), time-series data (usage-over-time statistics), pushed data, and others. However, for the sake of argument, let’s assume you only care about availability information and nothing else, just like our aforementioned Pointy-Haired Boss. There are still a number of elements that haven’t been considered and are necessary to make any monitoring tool useful.
Looking in the Rear View Mirror
First, the results from the repeated ping attempts on the list of devices need to be logged. One or two missed pings tell you nothing about your device except that it didn’t respond. Was the missed ping due to extreme packet loss? Or, perhaps, the network link between your remote device and the monitoring tool is highly latent? Unless your output is logged, then effective troubleshooting and proper resolution to the problem is usually out of the question.
Of course, logging of results is just the first step. What happens when there is a missed ping? What action should be taken next? Perhaps we should trying sending the ping again as there might have been a temporary glitch in the network? Should our tool retry three times? Four? At what interval should those retries occur? If after a series of retries should an email be sent out? I’ve only asked three questions, but there are numerous others that should be factored into pinging your devices for availability.
Using the CLI is so 1980s
Last, but certainly not least, is the necessity of putting a functional/friendly user interface around your ping tool. Is a quality UI a requirement? No. However, if your IT staff is composed of folks mostly born after the Reagan Administration (or your IT Management team is entirely peopled with Pointy-Haired Bosses), then chances are they’re going to be much more comfortable in the point ‘n click world of Windows than in the command-line-driven operating systems of yesteryear.
Implementation of an enterprise IT systems monitoring solution is a complex beast. Yes, tools like OmniCenter and its brethren aim to make the task more manageable. However, one shouldn’t go into such an endeavor without fully appreciating the details involved. That said, if my legion fans or loyal readers of this blog know of a case where a Monkey has done a full NMS implementation, kindly let me know. I’ll get my resume spruced up a bit.