The operational success of any large-scale enterprise IT environment begins and ends with the Network Operations Center. In spite of this fact the NOC is often littered with inefficiencies and chaos.
A NOC is, by its nature, a chaotic place. It’s an interwoven and interdependent orchestra of people, processes, and tools: Flat-screen monitors (sometimes 15 or more deep), real-time flash actionable notifications and critical alerts. All of this data is presented in various cryptic ways from a passel of non-integrated tools. Already overworked NOC personnel struggle to keep their eyes trained on this dizzying array of content in the hope of spotting outages and anomalies as they occur. At the same time they must also manage the flood of informational and related data streams into their inboxes and SMS feeds. Rarely is there any intelligent filtering or aggregation on this inbound tsunami of data. Industry studies show that a high percentage of this deluge is erroneous, extraneous, and unnecessary.
In most cases scripted protocols for incident management must be strictly followed as problematic issues are distilled out of the fray and isolated for remediation. Service Level Agreements (SLA) depend upon such a regimented approach. In the midst of the aforementioned chaos user, customer, and partner communities rely on the efficient and secure operation of the overall environment. SLA attainment necessitates remediation be transparently completed as close to instantaneous as possible.
In spite of these seemingly insurmountable challenges, success is attainable. When reviewing the efficiency of your NOC consider the following points:
- A fully-integrated or consolidated tool-suite, optimally designed by a single manufacturer or at minimum integrated via APIs to the other ‘cogs’ in the wheel. This integration will minimize the optical touch-points, eliminate finger-pointing between products, and ensure a consistent interface into collected data.
- Evaluate technologies that have the ability to intelligently filter, sort, and aggregate large amounts of data. The goal is to separate out elements you need to draw attention to. Examples are technologies such as root-cause analysis, event correlation, and automatic anomaly detection. However, it’s imperative these technologies avoid the trap of requiring more time to implement than they save during the troubleshooting process. These technologies must be fully-automatic.
- Look toward dynamic products that are able to roll-up enterprise-wide visibility into a finite visual presentation. You can always drill down into the weeds later (and when you choose to do so it should also be intelligent).
- Stretch your NOC personnel by making them subject-matter experts on the new set of integrated tools. Radically improve their individual efficiency statistics by distilling the content they receive. Remove the noise.
- Where practicable, electronically embed process information across the fabric of the NOC. Naturally, hard-copy backup documentation is OK, but to the extent that you are able to intelligently and automatically embed required information such as workflow, procedures, and NOC wikis for anytime, anywhere access, crisis scenarios become a lot more manageable.
A chaotic NOC environment doesn’t have to be status quo. There is a better way. Based on our experience, we’ve seen that with the right combination of tools, process and personnel they can be well-oiled machines.