What is Network Telemetry?
Telemetry is the collection of measurements or other data at remote or inaccessible points and their automatic transmission to receiving equipment for monitoring. In network telemetry, network devices such as routers, firewalls and switches push real-time data continuously to one or more centralized locations for storage, processing and analysis.
Network Telemetry is a critical component of any modern IT data center operation. Having visibility into what a network is doing and how it is being utilized is the basis for effective network automation and analytics. Telemetry, or streaming telemetry to be accurate, has the potential to help accelerate network troubleshooting, anticipate network capacity growth and baseline network performance. The right kind of telemetry data enables network operators to address network blind spots proactively and keep their business systems operating efficiently.
How does Telemetry work?
A typical Network Telemetry setup consists of three key functional components:
- Data Exporter: The data exporter can be any type of network device that generates data. Common data exporters aggregate packet metadata into what are known as flows. A flow is a unidirectional sequence of packets that share some common attributes, such as source and destination IP addresses, IP protocol, etc. In many modern telemetry systems, the user is able to configure the exact type of flow data that needs to be exported.
- Data Collector: The data collector is a part of a network management system that gathers data from one or more data sources, processes the data and stores it in a suitable format.
- Data Analyzer: The data analyzer processes data from one or more data collectors and provides actionable insight. This can range from generating simple statistical reports to assessing network capacity to predicting future network problems.
Unlike traditional OAM techniques used in network diagnostics and analytics, Network telemetry offers several benefits.
Conventional data collection mechanisms do not scale well. Many of them use a pull technology where a collector has to request data at regular intervals. This technique is resource intensive and can result in information that is not granular enough.
Network telemetry uses a push approach achieving more efficient data collection. The data collector ‘subscribes’ to data from one or more data sources and is streamed network health and performance data in near real-time. Standards based data models (using YANG/NETCONF) allow the user to subscribe to specific data items they need, thus helping the data collector further achieve better efficiency and optimization.
Traditional OAM based data collection methods are targeted towards human operators. They enable human operators to monitor and diagnose the networks and initiate manual corrective actions. Network telemetry, on the other hand, leverages its JSON/XML data models to drive automated actions. Since the telemetry data is intended to be consumed by machines, enormous amounts of data can be processed and analyzed quickly. By pairing the telemetry system with an appropriate downstream AI/ML algorithm, insightful analytics can be generated and automated actions can be performed on the fly.
Finally, the telemetry system can be enhanced with data from multiple data sources. For instance, data from multiple network, storage and compute devices as well as other environmental data can be collected as time series data and correlated to provide system wide insights and analytics. This is not an easy task with conventional OAM systems, with myriad unstructured data formats and non-granular data collection.
Telemetry Protocols and Standards
NetFlow is the original network telemetry technology, in which devices collect IP traffic statistics on enabled interfaces and export those statistics as NetFlow records toward one more Collectors. Many other vendors also support NetFlow as a de facto standard, although there are also other vendor specific implementations such as JFlow, RFlow and NetStream. Netflow v5 is one of the most common deployed versions, although it supports only IPv4 flows. NetFlow v9 supports IPv6 and MPLS flows as well as template based records.
As interface speeds grew from 10Mbps and 100Mbps to 10Gbps and beyond, the processing power required to collect and export packet metadata statistics from the high speed interfaces became expensive. sFlow, or sampled flow, was a technology invented to address this problem by sampling and streaming only 1 out of every n packets to the collector. This makes sFlow a scalable technology and better than NetFlow for certain applications where real-time traffic visibility is important. However, since the measurements provided by sFlow are only an approximation of the real traffic, it is not suitable for digital forensics and network troubleshooting use cases that rely on the accuracy of NetFlow.
Internet Protocol Flow Information eXport (IPFIX), based on NetFlow v9, is an IETF standard specification that consolidates many of the capabilities of NetFlow. IPFIX is backward-compatible with NetFlow, but extends it with numerous additional data types as well as capabilities such as variable length field support, allowing for significant flexibility and extensibility. IPFIX also adds in support for SCTP as the underlying transport, enabling congestion avoidance and bandwidth optimization. IPFIX enjoys wide acceptance in the industry, with support from most major networking vendors.
Cloud providers such as AWS, Azure and Google Cloud offer their own versions of telemetry in the form of VPC flow logs, which provide similar information as NetFlow records, but in a proprietary format.
Telemetry & AI
Thanks to the advances in Artificial Intelligence (AI) and Machine Learning (ML) technologies and affordable compute and storage platforms, retention of large volumes of flow records and real-time analysis of the data is within the grasp of many of today’s enterprises. Software tools can now quickly analyze telemetry data and provide unprecedented insights into the operation of the network, automatically correct routine issues and even predict future events. Deep forensic analysis of enterprise network data; correlation of a multitude of data series, including exogenous data; detection of hard-to-find security threats, trends and anomalies; and eventual realization of many of the touted benefits of Intent based networking will all become a reality in the near future. For network operators, it is important to be at the forefront of this revolution in order to be strategic contributors to their organizations.
Network telemetry solutions range from simple single-protocol flow exporters to full-featured ASIC-based packet-level (rather than flow-level) capture and analysis systems with inline AI/ML capabilities. IT managers will need to employ the right-sized solution based on their enterprise’s requirements and strategic needs.
Netreo offers a full-stack monitoring solution with an intuitively easy way to collect and analyze flow records from devices supporting all of the major telemetry protocols. Netreo’s solution is light-weight, plug-and-play easy, and offers insightful analysis of the key application traffic flows in the network in a single pane of glass approach. Netreo’s recently introduced AIOps engine delivers simple answers from 20 years’ worth of historical data and trends.