Clean Your Room! – Addressing NMS Tool Care and Feeding Burdens

Part VI – 7 Habits of Highly Effective Network and Systems Administrators


I mentioned I get peculiar inspiration for these blog topics, right? Before I sat down to continue our “7 Habits” series I was tucking my daughter in for bed. Predictably, two things happened. First, I stepped on something small, sharp, and plastic, which elicited from my mouth a series of muffled curse words not suitable for print in a family-friendly blog. Second, I asked my little girl, “Didn’t we just clean this room a couple hours ago? How can it already be a disaster zone?”  And then it hit me. This scenario plays itself repeatedly with a lot network management systems. They require constant care and feeding to remain relevant and useful in most environments.

The most effective systems and network administrators understand NMS tools that possess a high care and feeding burden result in a bevy of adverse consequences. First and foremost is monetary cost in the form of extra IT staff that must be retained to maintain and administer the system. Realistically, this issue is a bigger concern for management than for day-to-day engineers, but it is a topic everybody should be cognizant of. The second consequence is best described by the oft-used idiom “Time is Money”. Let’s say, for the sake of argument, management has hired a high-priced administrator into the IT department. Is it more cost effective for that administrator to spend their time maintaining the NMS system or solving the problems that the NMS system brings to light? A third consequence could be considered a hybrid of the first two: service level costs. Time spent caring and feeding for an organization’s NMS tool is time taken from some other task. Perhaps that task is responding to end-user requests? Or, perhaps, addressing a customer’s technical needs? That’s the definition reduced service levels:  not serving the customer/end-user. Failure to hit SLAs will directly impact the bottom line of an organization.

We delineated ‘why’ picking tools with a lower care and feeding burden makes administrators more effective, but let’s dig a little bit deeper. What’s necessary for your typical administrator to improve their effectiveness? To answer that question we need to analyze what elements of typical monitoring tools engineers spend the most time tweaking and tuning.

Personnel changes are one such example. A revolving door of IT staffers means that unless there are processes in place to administer and control alert contacts, logins/privileges, report recipients (just to name a few things) a monitoring tool can get messy in a hurry. However, even with processes in place, consider the amount of time spent policing the configuration of your NMS.

An even bigger issue than personnel changes is the state of current infrastructure technology. Areas like virtualization, SD-WAN, and SaaS/PaaS/IaaS are dynamic in nature.  Translation: Services based on those technologies are constantly changing. What do you suppose that does to an NMS, whose sole reason for existence is to monitor and alert on those dynamic resources? Unless an organization has a dedicated person (or team of people) to make sure the display dashboards of their monitoring applications match the current state of the infrastructure, it will lead to confusion. However, today it is less and less common to find environments that are completely static and don’t take advantage of offerings from AWS, Meraki, and their ilk.

We’ve looked at what NMS administrators likely spend their time on and why we should strive for low care and feeding requirements in our tools. However, the $64,000 question is how do we get there?  Follow these suggestions:

Codify It

I’ve written on this topic in the past, but priority number 1 in making any NMS application work well is to establish an IT alerting policy for your organization.

Less is More

Processes are only part of the equation. You also want to seek out tools that can get you as close to all-in-one capabilities as possible. There is no magic bullet here. What you’ll likely find is that the wider the capabilities “net” a tool casts, the more you will have to sacrifice on point solutions for specific applications. That isn’t necessarily a bad thing because the issue we’re addressing here is how administrators can be more effective. What better way than to reduce the overall number of tools to maintain?

How Low Can You Go?

When analyzing solutions to meet your NMS needs you want to seek tools with low “costs” associated with them. However, the costs I’m speaking of here are the investments in time necessary to get up and running quickly. For example, does the targeted solution have the capabilities to allow the end user to customize it for their needs themselves (i.e. poll their own variables, easily create their own SPOG)?

Wrapped Up in a Nice Package

Enterprise IT monitoring doesn’t have to be difficult or complex. How many systems (UI, middleware, database, etc. etc.) do you have to deploy to get the visibility you need? Seek out tools that are self-contained and deployed as single virtual machines.  Where that’s not possible due to environment size or security requirements make sure when the solution scales it is still administered through a single, unified UI.

Connect the Pieces

One of the original philosophies that came out of the development of the Unix operating system many moons ago is a very simple “Do one task and do it well.”  This philosophy is also how effective administrators approach enterprise monitoring in their organization. Your NMS tool should be separate and distinct from your ticketing system, which should be separate and distinct from your asset management application. Look for monitoring tools that expose both inbound and outbound API access so you can integrate them into your other systems. It might sound complex, but what’s a bigger hassle when deploying a new router for a remote site? Adding a record manually to your NMS, CMDB, and ITSM tools, or entering it one place and having integration automatically propagate the necessary changes elsewhere? Not only does it save time, but it also reduces human error.

Lather, Rinse, Repeat

Seek out tools that allow you to templatize as much of your NMS configuration as possible. What does that last sentence actually mean?  Think through your monitoring strategy from start to finish. Come up with a blueprint on how you would like everything configured (contacts, thresholds, log monitoring, etc.)  Effective network and systems administrators can apply and configure that blueprint to hundreds and even thousands of devices in their NMS application with only a handful of mouse clicks.

See a Change, Make a Change

Last, but certainly not least, is to find an NMS application that can make configuration changes to itself based on information it learns or is presented. Your tool just discovered a new router with a hostname of “LA-CORE-4506”? Great. Automatically place it in the functional category “Core Routers”. Virtualization monitoring detected that a guest OS was intentionally shutdown by an administrator? There is no use opening a trouble ticket since the action was taken on purpose. Let’s just suppress all alerts for that device until we detect it’s back “UP”.  I mentioned earlier in this article the dynamic nature of many infrastructure technologies found today. These are “must have” features to reduce the NMS care and feeding burden placed on your IT staff.

Ideally, each of these features and suggestions are present in your chosen solution set.  Additionally, they have ability to play nice with one another. So, once you decide on your monitoring strategy, it’s just a matter of “flipping the switch” and your enterprise IT monitoring is largely a hands-off endeavor.

Now, if I could only find a software application to keep my daughter’s room as neat and tidy as the day we moved in, I’d be sitting pretty.

Ready to get started? Get in touch or schedule a demo.