Discover how to improve your Dev Environment
DevOps automation solves common challenges that revolve around a lack of visibility to the entire environment. A lack of visibility, non-discrete tools, and a lack of hard data to capacity plan or assess success in your dev environment, often leads to using several tools that tend to contradict each other.
In order to avoid down response time, over-saturating infrastructure, and unnecessary costs, we bring this second article in our “Eliminating DevOps Monitoring Challenges” blog series, to improve your DevOps with automation.
Leveraging DevOps Automation to Improve Monitoring
DevOps Automation is the only way to make sure that your visibility is an asset to all the teams, and a key to that is making sure that your monitoring system can stay up to date whether you’re doing 50 deployments a day or a thousand. Netreo has easy to use web-based APIs that allow you to link your provisioning process directly into your monitoring, and automatically discover newly-deployed resources via API, and then drive that configuration forward using rules to define the category, business workflows, and templates that control every aspect of monitoring configuration with minimal overhead. You can even link your deployments with automatic maintenance windows. Less time spent putting new systems into monitoring and making sure the right settings are applied means more time to focus on moving the organization forward.
Netreo uses a rule-based auto-configuration system to automate your configuration. Auto-configuration comes with a set of rules to help you get started, and they can easily be customized or created to fit your environment.
You can use these to set device attributes, like categories, site, and application groups based on any device criteria – including things like which processes are running, what ports are open, what the device is named, or SNMP values. This makes sure that there’s no manual step in your provisioning process. This way you can easily ensure the devices end up in the correct reports, and that they always get the right settings applied. These are automatically applied to devices as they’re discovered, and can also be re-applied as desired, so if you want to make sure everything stays configured the way you want, you can enforce that.
Netreo can dynamically apply all the relevant templates to your devices based on any device criteria – name, location, running processes, listening ports, just about anything you can detect on the device. Multiple templates will be automatically applied, and the settings from each of them will intelligently roll down onto the devices. This way, a new SQL database coming online would not only get the basic Windows server settings you want but would also get the SQL-specific application checks and settings. And might even get some special settings applied like different latency thresholds due to its geographic location.
You can align your templates with your monitoring plan to make sure they set the appropriate escalation timings, and have them define the correct authentication, escalation, and even which automated actions to take. It’s designed to be flexible enough to meet the needs and scale to a fully-deployed Dev environment, while still being simple enough for a small enterprise IT department to use without a lot of training or dedicated personnel.
Detecting unusual behavior, instead of just relying on static alarm settings, is a key way to get proactive with your monitoring – to really get out of fire fighting mode, and start on preventing the fires in the first place.
With Netreo, you can easily use anomaly detection to find changes in application behavior, and it can be applied almost anywhere – CPU, memory use, running processes, event log messages. If your application goes from 10 login failures an hour to 1000, that last deployment may not have gone as smoothly as you expected, and now you have a starting place to troubleshoot.
Netreo will automatically generate a baseline behavior model using the large volume of historical data we retain, automatically adapting the baseline as your environment dynamically changes and evolves. Anomalies can be detected based on changes in baseline behavior looking at the time of day, day of the week, or even hour by hour. This allows you to find unexpected impacts, like a software change causing unusual behavior in the CPU on a back-end SQL server.
One customer of ours discovered an issue where normally at 10 am on a Wednesday the database server runs at 50-60%, but suddenly it was running at 15%. It turns out that a change to the user interface made it confusing for customers to complete their transactions. That sort of anomaly is the kind of unusual behavior that would never trigger a static threshold, but in this case, revealed a problem long before they noticed the sharp drop in orders that would have resulted.
Also, a great way to get control of your alerting is to integrate automation. Automation comes in a few flavors, and the first place to look at is automating the response to alerts so we can eliminate the need to send a notification at all, in many cases.
You can link in CLI commands like SSH or PowerShell, or use APIs like webhooks so your monitoring system can restart ports, deploy additional containers, retest applications, or even dump real-time diagnostics in response to an issue.
Some people are uncomfortable with automatically executing commands, so Netreo allows you to control those commands manually through operator intervention if you prefer. That way, if you want to add a ‘click to restart server’ function directly into the web interface of the monitoring system, and limit access to just administrators, there’s an easy way to do so.
One customer even set the system up to restart the server automatically after hours, but only notify the NOC during business hours, so they can make the decision when they’re available.
However keep in mind that if you’re automating response, maintenance windows become absolutely critical. Otherwise, a scheduled software upgrade may not go as you expect, as your monitoring system starts taking action in the background. Nothing is more frustrating than pausing an application service for deployment and having the server suddenly reboot.
Make sure to keep up to date with our blogs for the next parts of this series on how to improve your DevOps Automation.