Pleased to inform you that I am starting a new job soon with the awesome team @ PowerOn Platforms and blogging will be heavily on the menu 🙂
Really happy to big up Steve Beaumont for getting his MVP renewed for the 4th Year – well done mate:)
Have a few days spare until I meet up with them – I thought what better use of my time to stand up a local & portable Lab in which I can use to recreate SCOM/OMS/Azure Portal scenarios and get properly friendly with it all. Traditionally, I have had a VMWare Workstation LAB in order to nest the Hyper-V boxes but with the onset on Windows 10 I am going to be transitioning this over very soon once I let go of my 8.1 environment. Have always run it in NAT mode for the networking and not had any issues until now 🙂
Up until this point I have run the AD and SQL Assessments a few times now and NEVER had a problem with them. Until now 🙁
Was running the SQL locally using the monitoring RUNAS Accounts methodology using Kev Holmans brill new SID Monitoring system but only in “Low Privilege” mode.
This loaded up the SQL Assessment solution but returned no data after 3 hours and even then adjusting the SQL permissions of the account, removing the Solution / OMS Workspace didn’t seem to help.
I stood up a new independant SCOM environment (LAB2 – as pictured in the diagrams) in on the Laptop and tried again – still no joy 🙁
These assessments only run once a week – yes I know and I think this is where my problem is – its just not running on the right box – or seeming to run at all once it failed for this environment / Management Group / OMS Workspace… Running the Troubleshooting Query showed that it wasn’t even running against the box, or so I thought.
No errors, no logs to go by – was starting to get really mad at this one….
Like a good SCOM admin I flushed and cleaned the health service of the SQL box but to no avail the darn thing just will not run and seems stuck once a “NO Data” results are returned for both the AD & SQL Assessments. These were both refusing to run and the Lab was running under a VMWare workstation environment. I suspected issues with the networking as I was getting a ton of HTTP Module alerts and Internet resolution was taking far too long, so… recreated an SCOM/OMS hybrid environment (again) and tried it on a “proper” LAB server at home which is running HyperV & 2012 R2 on an old PowerEdge 1950 gen II. No Networking issues now and recreated a SCOM Server and OMS workspace and worked after an hour or so. Go figure. Must be something with either the SCOM environment or the Agents themselves.
Returned to my Mobile lab which I need to run on a laptop when working away and to remove the VMWare angle from the diagnosis so migrated the VMWare server to HyperV VHD (thanks StarWind) and retried with the same machines. No luck with the SQL Assessment and again no data 🙁 Even tried a direct agent – no joy still 🙂
So. Logged onto the SQL Instance to see if there were any failed logons etc. But to no avail – apart from some memory pressure issues reported. Shut down the box and gave 6GB to the VM. Restarted and continued.
So like a good consultant I then threw caution to the wind and threw on the other Solutions to the workspace to start getting a lot more familiar with them – including the AD Assessment. After an hour or so, and a couple of “No Data” panes on the Antimalware, Change Tracking etc. I noticed the AD Assessment then populated. – Success. Went to bed and expected to see the SQL Assessment follow suit but sadly no….
– but the others that had previously indicated “NO Data” were now populated:
Returned to the problem next day and started digging and attacking as a normal SCOM issue which was probably permissions / discovery related
This is a lab server so running SQL locally on the box – so the default agent will be the MSA account which in our case is one – lab2\SVC-SCOM
Ran the script from the OMS Assessment page to ensure that the account had the necessary SQL Server Instance permissions (i.e. not SA) and restarted the HealthService
— Replace <UserName> with the actual user name being used as Run As Account.
— Create login for the user, comment this line if login is already created.
— CREATE LOGIN [LAB2\SVC-SCOM] FROM WINDOWS
— Grant permissions to user.
GRANT VIEW SERVER STATE TO [LAB2\SVC-SCOM]
GRANT VIEW ANY DEFINITION TO [LAB2\SVC-SCOM]
GRANT VIEW ANY DATABASE TO [LAB2\SVC-SCOM]
— Add database user for all the databases on SQL Server Instance, this is required for connecting to individual databases.
— NOTE: This command must be run anytime new databases are added to SQL Server instances.
EXEC sp_msforeachdb N’USE [?]; CREATE USER [<UserName>] FOR LOGIN [LAB2\SVC-SCOM];’
Had overriden the SQL Assessment Feature rule in order to reduce it from a week to every two hours:
The full rule to search for (for the SQL Assessment anyway is:
Restarted the health service and still no joy 🙁
Clutching at straws here I turned to the settings on the local box and the health Service in case It was a permission issue which I had hopefully addressed by running the SQL Script to give the SVC-SCOM Account the right SQL Server perms to run and found this under Services:
HKLM\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\Management Groups\<Your MG Name Here>\Solutions\SQLAssessment
Removed the last run time (Copied it to a notepad in case) – hopefully this should spur it into action and flushed and restarted the HealthService on the SQL Box:
Waited for the MPs to trickle down and then refreshed:
SO IT IS RUNNING 🙂 – some confirmation at least is good….
Showing the running rules & monitors for this agent:
Is this a glimmer of hope? – I think so 🙂
So using the tool that OMS is I reran the query for getting the Assessment tool queries in 🙂
With some success – this now shows the Assessment Operations are occurring as they should:
Type=Operation and Solution=SQLAssessment | sort OperationStatus
The problem here was that the permissions when the Solution was instansiated weren’t correct for the solution (my bad) but enough to not generate logon errors. The rule then wasn’t rerunning, even when forced (i.e. with HealthService restarts) due to the above timestamp in the registry – I can only assume that the MMA will assume a successful run at this point and then not continue with the assessment rule – hence no further entries or assessment data. The reason the AD Assessment worked first time is that I hadn’t run on this particular Management Group so no timestamp was created. Looking at the AD Box – sure enough there is a corresponding registry timestamp for the AD Assessment:
HKLM\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\Management Groups\<Your MG Name Here>\Solutions\ADAssessment
Correcting the permissions, removing the timestamp & rerunning the rule (by flushing the healthservice) seemed to fix this for me, so VMWare, networks and other factors were simply red herrings – but good to rule out anyhow 🙂
So the fix was:
- Fix the permissions for the Monitoring Account using the SQL Script as per here:
- Remove the last run time timestamp on the offending box’s HealthService Registry
- Stop the Health Service & Remove the HealthService Folder (i.e. flush the local cache)
- Monitor for the Operations Entries in OMS
- Enjoy your assessment 🙂
Hope that helps someone else and will serve to remind me of what I did to fix it,