What is Full-Stack Observability and Do I Really Need It?

Monitoring and visibility are dead. If you don’t have Full Stack Observability (FSO) you may as well just pack up and go home. Your business will fail, and you will be unemployed with no hope for the future. At least, that is what vendors currently pitching FSO would have you believe. But what is full-stack observability?

Observability is the current buzzword in the monitoring industry, and full-stack observability is what vendors are currently focusing on. The good news is that whether you have FSO or not has very little to do with your business succeeding and your future employment opportunities. Let’s take a look at why that is the case.

Traditional Observability

The term observability has been used in engineering and control theory circles for a long time. Only recently has observability been introduced to the world of IT visibility and monitoring. And, when the term was first being used in the IT context it made some sense. If we take a look at the definition of “observability” from Wikipedia, we see the following:

Observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. In control theory, the observability and controllability of a linear system are mathematical duals. The concept of observability was introduced by the Hungarian-American engineer Rudolf E. Kálmán for linear dynamic systems.[1][2] A dynamical system designed to estimate the state of a system from measurements of the outputs is called a state observer or simply an observer for that system.

The first sentence is really what we need to consider. Observability allows you to understand how well a system reports its current internal states. It actually has nothing to do with the state of the system itself but simply how well a system can let you know how it is doing.

Measuring Observability in IT

This leads to the question of how you measure how observable a system is. Fortunately, Wikipedia can help us here also with the definition from the Observability article:

Consider a physical system modeled in state-space representation. A system is said to be observable if, for every possible evolution of state and control vectors, the current state can be estimated using only the information from outputs (physically, this generally corresponds to information obtained by sensors). In other words, one can determine the behavior of the entire system from the system’s outputs. On the other hand, if the system is not observable, there are state trajectories that are not distinguishable by only measuring the outputs.

Great. As long as information on every possible state for a system is provided, we can say it is completely observable. When the amount of information available drops, the “observability” of the system drops as well. A system that is unable to report any information on its current state would be considered not at all observable. There is no specific, standardized scale for observability, so these somewhat vague terms are the best we can do.

Observability is really just how well a system reports its internal state information. Let’s take a look at some of the different areas within IT Operations Monitoring (ITOM).

Network Traffic

For the purposes of this post, I am going to assume that network traffic is discussing the actual packet data traversing the network but not the actual infrastructure that makes the traversal possible. Network traffic is, and almost always has been, extremely observable. I don’t know if it’s possible for a system to be completely observable, but network traffic comes pretty close.

So why is network traffic so observable? In order for something to be observable, as we just discussed, it has to be able to provide its own state information to an external collector of some sort. That is exactly what networks do. You may need a tool to aggregate the data (i.e. Wireshark or one of a variety of tools that collect and analyze packets), but those packets themself, the very things that make up network traffic, are providing the state information. Networks are incredibly observable by their very nature.

Verdict: Natively observable

Network Infrastructure

Following network traffic, it makes sense to take a look at the hardware that allows the network to operate. Devices such as routers, switches, wireless access points, firewalls and more. All of these devices are required for a network to function. Knowing how well they are operating is critical to the success of your network. So, is my network infrastructure observable?

Fortunately, network infrastructure is also highly observable. Through technologies like Simple Network Management Protocol (SNMP) and streaming network telemetry, devices are able to provide significant state information. This information can include everything from how physical devices are operating to how well traffic traversing the network is performing. But the good news is that your network devices are all perfectly capable of providing extensive information into their state to external tools.

Verdict: Natively observable

Physical and Virtual Server Infrastructure

We talked about the network infrastructure, but what about the servers on which our applications run? These can be physical or virtual, running a variety of operating systems and are critical to the operating and performance of the applications that businesses depend on. But, are they observable?

Fortunately for us, generally servers (whether physical or virtual) are considered pretty darned observable. Whether leveraging SNMP, Windows Management Instrumentation (WMI) or something else, servers are able to provide a fairly large amount of information on its operation and performance.

Verdict: Natively observable

Application Code

When we start discussing application code and how the application performs (at a code-level having little to do with the physical hardware it is running on), things start getting really interesting. The concept of observability in ITOM really started in the application side of things and for very good reasons. Whereas the network (both infrastructure and traffic) has almost always been observable, applications are a very different scenario.

Within the network, I could look at packets, flow data, SNMP or other pieces of information that provided state information. Applications are much closer to black boxes. Unless a developer intentionally had a system give state information (i.e. via error logs), there are very few ways to determine what is happening within the application.

This challenge is one of the things that gave rise to Application Performance Monitoring (APM) solutions and vendors. These vendors (including Netreo with its Retrace APM solution) allow you to instrument your application (traditionally using proprietary agents) and determine state information for the application. The amount of information gathered is dependent on the type of application, functionality of the agent and many other factors.

Verdict: Natively not observable

That’s four different aspects of the IT environment, and while not an exhaustive list, gives us a good starting point. There will always be scenarios where my analysis is not accurate (for example, some applications do an excellent job of outputting their state, and some network devices lack support for common monitoring protocols such as SNMP).

Of the four areas we looked at, only one is not natively observable. That makes sense, since the whole idea of IT observability really started around the application. Vendors provided the tools to allow the black box that is the application to provide its own state information.

While many of the original APM tools still use proprietary agents, new technologies like OpenTelemetry are starting to change things. Instead of having to purchase an APM tool from a vendor and leverage their existing, likely proprietary agent, OpenTelemetry allows you to install an open-standards based agent that can interoperate with numerous other vendors’ APM solutions. Instead of vendor lock-in (via the agent), you have the flexibility to deal with any vendor that supports OpenTelemetry, which is something more and more vendors are doing.

So, What is Full-Stack Observability?

We have discussed observability and how it relates to some of the pillars of ITOM. Let’s take a look at full-stack observability and where it does or does not fit.

Lots of vendors are offering what they call full-stack observability solutions. A quick Google search on the term brings up links from several vendors including: Cisco, AppDynamics (owned by Cisco) and New Relic. What doesn’t come up is an easy definition of what FSO is (though that question was listed by Google under “People also ask” and that answer is helpful to us). The definition of what is full-stack observability, according to dzone.com:

If we compare this definition for FSO to the definition of observability we discussed earlier, we can see some differences. Observability focuses on the ability of a system to express its internal state to external interested parties. Full-stack observability is focused on the ability to understand what is happening across an entire technology stack.

This is important because it means that FSO really has nothing to do with observability. Instead it is trying to take advantage of the excitement around observability and OpenTelemetry to convince people that they need something newer, better, shinier. Such claims try to convince people that their “old” monitoring and visibility solutions are just not as good as an “observability” solution would be. But is that the case?

Do I Need Full-Stack Observability?

As with many things in technology, the evidence does not point to any significant benefit to a full-stack observability solution over someone pitching full-stack monitoring or full-stack visibility. The “observability” piece is not actually bringing you anything that you either don’t already have or couldn’t already get.

For example, let’s take a look at Netreo’s offering of full-stack visibility and compare it to other full-stack observability offerings. Netreo offers products in the ITIM, NPM and APM spaces, plus others. For the purposes of this discussion, let’s just focus on these three areas. If you were to purchase Netreo for full-stack visibility, you get insights into your network, at both a network traffic and network infrastructure level (via the same techniques that FSO companies use such as flow data and SNMP), your servers (again, via the same techniques that FSO companies are using such as SNMP or WMI) and applications (once again, via the same techniques that FSO vendors are using such as code instrumentation). All the data is collected, correlated, aggregated and analyzed to provide insights into how your entire environment is functioning.

Pulling the next sentence from the full-stack observability information at DZone, we see the following:

Which sounds almost exactly like what Netreo is doing across the various systems. Amazing that a tool that is not FSO can still provide similar functionality.

In fact, if we look at the entire paragraph DZone put together around full-stack observability, we see something interesting.

Full-stack observability is the ability to understand at any time what is happening across a technology stack. By collecting, correlating, and aggregating all telemetry in the components, it provides insight into the behavior, performance, and health of a system. Through full-stack observability, teams can deeply understand the dependencies across domains and system topology. Contrary to traditional monitoring systems, full-stack observability enables IT teams to proactively react, predict, and prevent problems using artificial intelligence and machine learning, which is all but a requirement considering that the amount of data collected would be nearly impossible otherwise. Presenting this data in the form of unified analytics and dashboards can give the observer a complete picture into the health of the system, for example, where issues occurred and solutions were introduced.

Apparently the difference between FSO and traditional monitoring systems is that FSO allows you to “proactively react, predict, and prevent problems using artificial intelligence and machine learning.” But what if a traditional monitoring system allows you to do that as well? Does that mean it is not a traditional monitoring system but instead FSO? Or does it mean that FSO isn’t really anything but a bunch of technologies put together to provide a more functional solution for those responsible for the performance and operation of complex IT systems?

The point of all this is simple. Choose the best technology to meet your organization’s specific needs and requirements irrespective of the buzzword of the day. If you are getting the tools you need to do your job and ensure the operation of your systems, whether they are called full-stack observability or full-stack visibility is irrelevant. What matters most is how those tools work, what they do for your organization and how they improve your job. Or better still, schedule a Netreo demo today!

Ready to get started? Get in touch or schedule a demo.