7 Habits of Highly Effective Network and System Administrators – Part II
It’s no secret that Network and System Administrators are being asked to manage more resources than ever before, including networks, servers, apps, virtual and cloud resources, and more. But what does management mean?
Looking at server workload as an example, as a Network or System Administrator, you may have to:
- Monitor the server for health and performance.
- Set up and triage alerts.
- Set up event notifications.
- Debug and “root cause” issues.
- Fix issues (or engage the right people to do so).
- Report on the status of the system.
By themselves, each problem isn’t too difficult to solve.
What makes these tasks more complex than they need to be, is that each technology comes with its own tool. And each tool comes with its own dashboard. For example, to monitor apps, you may be using New Relic. For virtual hosts, vCenter may be your tool of choice.
In effect, the more technologies/resources you manage, the more dashboards you need. This is one of the biggest issues for administrators. All these disparate monitoring tools add to the complexity of their work because:
- Too many dashboards lead to too many statuses.
- It’s hard to determine the state of the end to end system.
- Conflicting statuses often lead to the blame game.
- It’s difficult to do a quick root cause analysis.
The solution: Deploy a SPOG (Single Pane of Glass).
SPOG is a phrase used in IT to describe a management tool—such as a unified console or dashboard—that integrates information from varied sources across multiple applications and environments into a single display.
But SPOG solutions have been elusive because the promises of vendors often don’t match the expectations of the users. As a result, users often find themselves owning as many as 10 SPOG products because every vendor sold them “the only one they’ll ever need!”
To be clear, the vendors are not entirely wrong. For example, vCenter is a Single Pane of Glass for vSphere-based virtual resources.
But, are you using AMIs or OpenStack-based workflows, as well? Then, vCenter may not work for you.
What about apps running on bare metal? Well, vCenter isn’t the right solution for those, either.
So, the question becomes: How can you determine if you have a real SPOG?
The Promise of the Single Pane of Glass
To get an understanding of what makes a good SPOG, let’s start by agreeing on a definition. Here’s the one we recommend (from webopedia):
“A Single Pane of Glass is a dashboard that shows Brian the real time status of his eight most important systems. It lets him zoom in/out to debug issues and perform root cause analysis. It also allows him to generate reports for all the relevant stakeholders. Additionally, as tools get added/removed, the SPOG update takes no more than a few minutes.”
To understand and internalize this definition, one must ask:
- Who is “Brian?” – Is he a Network or System Admin, a DBA, an application owner, a DevOps manager?
- What are his eight most important systems?
- What is “real time?” Is it live, or does it have a 5-minute delay?
- To what degree can you zoom in/out?
- Can the SPOG be reused across many teams with very little reconfiguration?
- Can you train all IT personnel using a single SPOG?
- Is your SPOG a single source for “Tear Away” reporting?
The Problem of a Single Pane of Glass
The biggest challenge of setting up a workable SPOG is that one little word—”everything.”
“Our tool can show you the status of everything!” most vendors selling their SPOG would say.
Nobody can manage everything. If it were possible, the entire IT department could be shrunk down to 2 people—the CIO and the person who manages everything.
- A SPOG by itself is a complex, enterprise-grade application. Think of it as a Java app with an Oracle backend and Weblogic/Websphere middleware.
- The alert mechanism required for the SPOG is a separate application that you either buy or build. A homegrown alerting app will need a new messaging bus supported by an events database.
- An agent will poll the backend of every tool in use to collect all the data. Be aware that constant polling may impact system performance.
- Scrub the data to take useless data out (and there’s a lot of useless data).
- Normalize the data to remove redundancies.
- Agree on a visualization approach across all stakeholders.
- Develop a presentation layer (you’ll need stakeholder inputs and approvals).
- Identify the owners of every type of alert.
- Agree on thresholds.
- Set up automated alerts.
And that was the simple approach!
If you go this route, plan to fund a team of 10 developers working for 9-12 months. A team of 3-5 engineers can maintain it after the initial development and testing.
The cost plan for a homegrown system should also include:
- The cost of Oracle licenses, visualization tool and alerting solution.
- The time spent (3-6 months) to get all the teams to agree on this approach and give you access to their data. This is by far one of the hardest things to achieve.
Back-of-the-envelope calculations come to about $2 million in development cost. Add approximately $750k in annualized care and feeding of the system.
Cost is not the only—or the biggest—problem of the homegrown solutions. In all of the cases we’ve seen, a homegrown SPOG is a compromise that makes everybody equally unhappy.
Now that you understand what a SPOG is (and what isn’t), it’s time to find the one that works for you and your team. Mind you, we’re not claiming it will be perfect. But it should be good enough for you to use and get value from—every day.
Here’s how to start:
- Define your outcome. What do you want from your SPOG? What does “good” look like? Be as specific as possible. Even develop wireframes if it helps you visualize the solution of your dreams.
- Define your “user.” Is it you? The Network or System Administrator? DevOps manager? Developer? All of these?
- Outline what will be seen on the dashboard.
- Define the range of statuses the SPOG should show.
- Articulate your customization needs.
- Consolidate as many tools as you can. A SPOG can sit on many tools. But, if you can, merge as many tools as possible into a single platform. Consolidation is not same as buying all the tools from a single vendor. It means that one single tool can do all the things you need. For example, instead of buying an NPM and a server monitoring tool, look for one that can do both.
- Develop a “build vs buy” strategy. Do a cost/benefit analysis. Is it worth developing something that everyone will hate anyway?
If you decide to buy a solution, make sure that it is complete and painless. A complete solution is the one that meets your needs as discussed above. A painless solution is one that:
- Has a low deployment cost and a quick RIO (think minutes/hours, not months).
- Needs low/no customization for initial usage.
- Displays statuses clearly.
- Has easy (one-click) drill down for root cause analysis.
- Has no feeding and caring costs.
- Can integrate with other tools in your environment.
Single Pane of Glass is a complex and very contentious topic. It is also very hard to achieve, with some calling it the biggest myth of IT. But, as this post hopefully made clear, you have at least two alternatives: develop a homegrown solution or buy something that works for you. Depending upon your requirements, budget and time, either strategy can work. Whichever option you choose, make sure that you have a complete and painless solution for all of your target personas.
For Network and System Administrators, here is one option that works right out of the box. It consolidates multiple tools into one platform, is easily customizable, needs no complex infrastructure, and has little to no care and feeding costs.