While this question is hardly as timeless as the words spoken by Shakespeare’s Hamlet, it’s still a question relevant to those of us who live in the day-to-day world of systems and network management. What is the best way to get statistics and status from all the disparate elements that make up the systems we’re supposed to be monitoring? Is one method better than the other? As is so often the case in I.T. the answer is “It Depends.” You’re probably thinking “Thanks Wise-Guy … depends on what?”. I’ll explain, but first a little background.
When I speak of “pulling” data into a monitoring system it means our tool(s) are actively querying targeted elements for some kind of statistic or status on a regular interval, bringing the data back, and doing something with it. Common protocols here are SNMP, WMI, and custom-created REST APIs. In contrast, when data is “pushed” it means our tools are passively waiting and listening for information to arrive. Once that information shows up it can be processed, analyzed, and acted upon. Netflow/sFlow, SNMP Trap, and syslog are the protocols most of often identified as “push” technologies.
There are pros and cons for each method. On the “Pull” side of the ledger you’ve got the advantage of being able centrally configure alerting and monitoring for all devices. In an SNMP-Trap-only environment receiving meaningful alerts means individually touching every device you care about for trap destination, alert contact, and threshold level. That’s not trivial in a large infrastructure. Additionally, your monitoring system controls the pace of monitoring and allows NOC personnel to be proactive in problem resolution. In a trap-only environment unless a threshold is set just right, then by the time NOC is notified, your end-users will already have begun their pitchfork-wielding march to your office with blood lust on their minds.
All of the above said there are still advantages to getting data pushed to your monitoring system. First, in terms of scale, an NMS can process considerably more data being pushed than if it’s only being pulled. The bottleneck for pull-only environments is the scheduling and processing power needed to get *all* the data from an infrastructure. Often times there isn’t enough to go around. However, when data is pushed, the NMS is effectively opening the front door and yelling “C’mon In, I’m Open For Business!” This “Open for Business” scenario leads into another advantage, which is that when something bad happens a trap can be fired off immediately. With pull-only monitoring unless that “bad thing” happened at an interval that coincided with a scheduled check you may miss it.
So getting back to the original point of this tome, what does the “push” or “pull” question really depend on? There are actually two more things we need to ask. Stick with me here because now we’re getting to the meat of the issue. The first is what problem are we trying to solve?” Looking to get trend information for a given process running on a Windows Server CPU? We’ll want to pull that data down from the target device. Need to get a notification when a user logs into configuration mode on a remote switch? This type of event is best handled by either a Syslog or SNMP message getting pushed to you. The second question is to ask ourselves is “What technology do we have available for the task at hand?” Some vendors (I’m lookin’ at you EMC) don’t play nice with SNMP and don’t support any kind of active querying of their devices. All information from must be gathered via trap processing. In other cases, such as with VMWare, vendors intentionally obfuscate SNMP setup and thereby push customers towards built-in APIs for access to rich diagnostic information.
Finally, at the end of our Matryoshkan-inspired journey into network and systems monitoring I think we know enough to decide To Pull or Not to Pull. However, you better make up your mind in a hurry, I think I hear your end-users coming down the hallway.