IT operations data grows by the year. Some estimates suggest that the average IT operations team watches their operational data volume double or triple every year. The result of this flood is that IT teams are grasping for any method they can find to make sense of all this data. Many teams are landing on AIOps as their solution to parse and categorize all of these events. AIOps isn’t a perfect fit for every organization, but it is a great fit for many. In this post, we’re going to talk about what AIOps is and what it promises. We’ll also talk about how to formulate a strategy for leveraging AIOps in your organization.
What Is AIOps?
Before you can build an AIOps strategy, you first need to know what AIOps even is. At a high level, AIOps is the use of artificial intelligence to enhance IT operations. You may have heard of the DevOps philosophy, an attempt to unlock velocity and quality by melding together development and operations teams. The goal of a DevOps team is to ship better software more quickly. That’s great! But it comes with a cost. As we noted above, operations events are expanding at an increasing pace. More releases for more applications running on more servers keep piling data about your applications and services and servers higher and higher.
AIOps is an answer to that increasing growth. Instead of trying to reduce the number of events, AIOps suggests that you can combine this big-data approach to operations events with machine learning to cut through the noise and focus on the events that mean the most to your business.
What Does It Mean to Have an AI Strategy?
If your team doesn’t have an AI strategy today, it’s likely you’ll be developing one in the next couple of years. AI unlocks a lot of computing problems that are traditionally considered quite difficult. New uses of AI emerge every day, but classic applications include things like image recognition and text parsing. While traditional computation focuses on strict logic, modern uses of AI skew more toward things like pattern matching.
However, AI applications come with a downside: they’re computationally expensive. While there are a variety of AI approaches, the cornerstone of each is training an AI model out of a whole bunch of data. AI computation often requires dedicated, specialized hardware and piles of computational resources like CPU cores and memory. It’s true that all major cloud providers today provide options for dedicated AI hardware. You need a strategy to know how you’ll approach your AI adoption. Developing an AI strategy means answering the following three questions, at a minimum:
- What business value does our AI adoption generate?
- What data will we use to train our AI models?
- How will we know when our AI models are producing quality results?
How Do We Develop an AIOps Strategy?
Like we noted before, every AI strategy needs to answer at least three questions. Your AIOps strategy is no different. Fortunately, because we’re limiting the scope of this AI engagement down to IT operations, that makes answering those questions easier. Let’s walk through them, and as we go, we’ll explore a little more about each question. By the end, you should have a good idea of how to answer these questions yourself, putting you well on the way to developing your own AI strategy.
What Business Value Does AIOps Adoption Generate?
This question is first because it’s by far the most important question to answer. Just like with any other adoption of new technology, you should think long and hard about what value the adoption of AIOps brings your company. As we noted, AI hardware and adoption is not a cheap exercise. While it’s unquestionably cool tech, AI is just like any other kind of software application. It only answers the questions you know how to ask, so you want to make sure the questions you’re asking are valuable.
Teams that adopt an AIOps strategy are very commonly looking to cut through the noise of piles of operational events and logs. Usually, these integrations are led by operations teams who have way too much noise, and not nearly enough signal. They’re looking to identify which events suggest things like servers crashing or malicious intruders compromising applications. If you’re dealing with that kind of problem, AIOps might be a great fit for you. But AIOps isn’t just a fancier monitoring system. A well-trained AIOps system recognizes patterns that lead to failures before they happen. That same system will also recognize odd behavior within a system that someone might need to look at. And AIOps models can learn which events need to be routed to which team, cutting down on redundancy and noise.
In short, an AIOps strategy is designed to make your operations teams more effective and more efficient. If that’s something your operations teams could use, then formulating an AIOps strategy is likely a good move for your company.
What Data Will We Use to Train Our AI Models?
This is the part where a whole bunch of AI strategies fall down. AI is unquestionably cool tech, but it’s only as good as the data you use to train it. While you can try to train machine learning models without a well-defined data set using a technique like unsupervised machine learning, you’ll likely find it difficult to draw high-quality conclusions. In reality, you’ll need to work hard to identify the data that you’ll use to train your AI models. This means spending time manually poring over and categorizing operations events and logs. This kind of work is incredibly time-consuming, but it’s necessary to build out a good AI platform.
This is a place where Netreo can help. Netreo’s AIOps: Autopilot model works off two decades’ worth of existing operations data. The hard work of categorizing events is something Netreo has already done, making it easy for new customers to pop in and start getting the benefits of an AIOps integration on day one.
If you’re not choosing to build your AIOps platform on an existing model, remember that you’ll need a lot of data to effectively train a model. Once you start training, the training phase has considerable overhead, too. Expect that it will be some time before you’re ready to put your new AIOps platform to use.
How Will We Know When Our AI Models Are Producing Quality Results?
A classic problem with machine learning is overfitting, or training a model that works great—as long as it’s only looking at your sample data. When you’re developing an AIOps strategy, you’re looking for an AI integration that works on more than just the data you already have. After all, you’re not building your AIOps strategy just for the next 6 or 12 months. You’re thinking on a much broader timeline, and you need your tech to evolve as your organization grows.
The way to tune an AI model is to train it on data, then turn it loose on novel data. You’ll quickly notice that the model does some things very well, and some things very poorly. In operations terms, it might correctly identify some key events, but miss others entirely. So, you do what any software team does: you iterate. As we noted, iterating on an AI model means collecting mountains of diverse data, then spending CPU time analyzing it to develop a model. When you think of it that way, continually iterating on a model sounds painful. The good news is that as your model matures, you’ll spend less time iterating on it.
While you can push some of this work to the left by identifying sources of operational data early in your project, you won’t hit everything. Your AIOps strategy should include time spent iterating on your models and identifying overlooked sources of data. Moreover, your strategy should include regular testing of models with novel data to ensure the model returns the results you expect. The only way that you’ll know your models are producing quality results is by testing them regularly. Without those quality results, you’re just throwing CPU cycles away.
What’s Your AIOps Strategy?
By now, you should have a good idea of your basic AIOps strategy outline. You might not know every detail. You might need to consult with your team to figure out if AIOps is the right fit for your operations. But if you can answer the three big questions, you’re well on your way. If some of those questions seem hard to answer, Netreo is here to help. Netreo’s AIOps: Autopilot platform simplifies data collection and model training. We have decades of data, and we’ve already iterated our models many times. We’ve seen just about everything, so if you need help building your AIOps strategy, we’d love to talk.
This post was written by Eric Boersma. Eric is a software developer and development manager who’s done everything from IT security in pharmaceuticals to writing intelligence software for the US government to building international development teams for non-profits. He loves to talk about the things he’s learned along the way, and he enjoys listening to and learning from others as well.