Windows Services are critical components of many IaaS systems running in Azure cloud or on-premise. Processes like SQL Server or IIS run as Windows Services. Moreover, organizations often rely on custom-developed Windows Services to perform important background tasks.
In this article, we’ll show you how to monitor and automatically restart crashed Windows Services using Netreo. If you are not familiar with Netreo, it is a cloud monitoring service that provides a set of sophisticated automation and monitoring capabilities for the Azure platform and on-premise Windows Servers.
In order to start monitoring and configure self-healing automation for Windows Services follow these 4 simple steps:
1. Run Netreo Setup Wizard to connect to an Azure environment (optional)
If you aren’t using Netreo yet, request a demo, and our expert team can run you through the procress.
Since monitoring and restarting of crashed Windows Services happens through manually deployed Netreo agent, Azure authorization step is not required. In fact, users can utilize this same procedure to restart Windows services on any Windows VMs, not just the ones deployed to Azure.
2. Deploy Netreo agent to the VM
You can download Netreo Windows Agent from the Resources or Dashboard screens in the Netreo portal. The agent is automatically configured to send data to your Netreo account.
After downloading, follow the installation instructions in the archive or learn more here.
Once the agent is installed, it is automatically registered with Netreo. Simply refresh the Netreo portal to see it in the dashboard.
3. Define a metric tracking Windows Service status
Once VM resource has been brought into Netreo, define a new metric that tracks the status of the chosen Windows Service:
- Open configuration dialog for the VM resource.
- In the Metrics tab, add a new metric of type “WindowsServiceState” and select service from the dropdown.
- Specify metric name, e.g. “MSDTCstatus” and save.
4. Define a self-healing action
To define self-healing actions for a Windows Service:
- In the “Actions” tab add a new action that will execute the “PowershellRestartService” command based on a custom expression. The action should be executed whenever the value of the previously defined metric is other than “Running”.
- Specify a meaningful Suspended period for the action, e.g. 20 min. This will allow the action to not be re-executed again within that time period and allow for the service status to stabilize.
- Give action a name and save.
The same approach can be used to automate other recovery procedures or maintenance tasks. Netreo can execute actions based on any metric captured anywhere in your Azure environment or according to a schedule.
You can learn more about Netreo automation here.