Keep Up: How to configure SCOM to monitor the running state of services and restart them when they stop

Windows runs on services. Don’t believe me? Open your Services console and count just how many are running at any given time. Of course, some of them are more important than others… especially when you are talking about servers that are critical to your organization.

A new customer recently called me for a DEAR Call (emergency visit) because their business critical application was not working, and they couldn’t figure it out. I logged into the server, and at first glance there didn’t appear to be anything wrong on the application server. However I knew that the application used SQL Server, and I did not see any SQL instances on the machine. A quick investigation revealed that there was an external SQL Server running on another server, and it only took a few seconds to see why the application was failing.

Very simply put, the service was not started. I selected it, clicked Start the service, and in a few seconds the state changed:

A quick look showed that their business critical application (in this case SharePoint 2010) was working properly again.

My customer, who was thrilled to be back in business, was also angry with me. ‘We spent tens of thousands of dollars on System Center Operations Manager so that we could monitor our environment, and what good does it do me? I have to call you in when things stop working!’

Yell as much as you like I told him, but please remember the old truism… if you think it is expensive hiring professionals, try hiring amateurs. After he had learned about the benefits of implementing a proper monitoring solution he told his IT guy to install it… and that is exactly what he did.

System Center Operations Manager (SCOM) is a monitoring framework, and really quite a good one. In fact, if Microsoft included the tools within the product itself to monitor every component that it is capable of monitoring, it would have to come in a much bigger box. Instead, what it gives you is the ability to import or create Management Packs (MPs) to monitor aspects of your IT environment. It is up to you to then implement those MPs so that SCOM can monitor the many components of your infrastructure… and take the appropriate action when things go wrong.

Of course, there are much more in-depth MPs for monitoring Microsoft SQL Server, but for those IT generalists who do not need the in-depth knowledge of what their SQL is doing, simply knowing that the services are running is often good enough… and monitoring those services is the exact same step you would take to monitor the DNS Server service.

Although it is long, following these relatively simple steps will do exactly what you need.

1) Open the Operations Manager console.

2) In the Operations Manager console open the Authoring context.

3) In the navigation pane expand Management Pack Objects and click on Monitors.

4) Rick-click on Monitors and select Create a Monitor – Unit Monitor…

5) At the bottom of the Create a unit monitor window select the Management Pack you are going to save this to. I never save to the default management packs – create your own, it is safer (and easier to recover when you hork something up).

6) In the Select the type of monitor to create section of the screen expand Windows Services and select Basic Service Monitor. Click Next.

7) In the General Properties window name your monitor. Make sure you name it something that you will recognize and remember easily.

8) Under Monitor target click Select… From the list select the target that corresponds to the service you will be monitoring. Click OK.

9) Back in the General Properties window uncheck the Monitor is enabled checkbox. Leaving this enabled will try to monitor this service on every server, not just the one where it resides. Click Next.

10) In the Service Details window click the ellipsis button (…) next to Service Name.

11) In the Select Windows Service window either type the name of the target server, or click the ellipsis button and select the computer from the list. Then select the service you wish to monitor from the list under Select service. Click OK.

12) Back in the Service Details window the Service name window should be populated. Click Next.

13) In the Map monitor conditions to health states window accept the defaults… unless of course you want to make sure that a service is NEVER started, at which point you can change that here. Click Next.

14) In the Alert settings window select the Generate alerts for this monitor checkbox. You can also put in a useful description of the alert in the appropriate box. Click Create.

The saving process may take a minute or two, but when it is done search for it in the Monitors list.

14) Right-click on your custom monitor. select Overrides – Override the Monitor – For a specific object of class: <Name of the product group>

15) In the Select Object window select the service you are monitoring and click OK

16) In the Override Properties window, under the Override-controlled parameters list, scroll for the parameter named Enabled and make the following changes:

a) Select the Override checkbox.

b) Change the Override Value to True.

c) Click Apply

d) Click Show Monitor Properties…

17) In the Monitor Properties window click the Diagnostic and Recovery tab.

18) Under Configure recovery tasks click Add… and when it appears click Recovery for critical health state.

19) Under the Create Recovery Task Wizard click Run Command and click Next.

20) In the Recovery Task Name and Description window

a) enter a Recovery name (Re-Start Service works for me!).

b) Select the checkbox Recalculate monitor state after recovery finishes.

c) Click Next.

21) In the Configure Command Line Execution Settings window enter the following information:

Full path to file: %windir%\System32\Net.exe

Parameters: start <service name>

Working directory: %windir%

Timeout (in seconds): 120

22) Click Create.

23) Close the Monitor Properties window.

24) In the Override Properties window click Apply then OK.

—

The doing is done, but before you pat yourself on the back, you have to test it. I always recommend running these tests during off-hours for non-redundant servers.

1) Open the services.msc console.

2) Right-click on Services (Local) and click Connect to another computer…

3) Connect to the server where your monitored service is running.

4) Right-click on the service and click Stop Service.

It may take a couple of minutes, but if you get up and go for a walk, maybe make a cup of coffee or tea… by the time you get back, the service should be restarted.

—

There seems to be a reality in the world of IT that the more expensive something costs, the less it is likely to do out of the box. It is great to have a monitoring infrastructure in place, but without configuring it to properly monitor the systems you have it can be a dangerous tool, because you will have a false sense that your systems are protected when they really aren’t. Make sure that the solution you have is properly configured and tested, so that when something does go wrong you will know about it immediately… otherwise it will just end up costing you more.

Rate this:

Oakville, ON, Canada

The World According to Mitch