Three quarters of 2015, my IT career and various ramblings

September is over. The first three quarters of 2015 are over.
This has been a very important year so far – difficult, but revealing. Everything has been about change, healing and renewal.

We moved back to Europe first, and you might have now also read my other post about leaving Microsoft, more recently.

This was a hard choice – it took many months to reach the conclusion this is what I needed to do.

Most people have gone thru strong programming: they think you have to be 'successful' at something. Success is externally defined, anyhow (as opposed to satisfaction which we define ourselves) and therefore you are supposed to study in college a certain field, then use that at work to build your career in the same field… and keep doing the same thing.

I was never like that – I didn't go to college, I didn't study as an 'engineer'. I just saw there was a market opportunity to find a job when I started, studied on the job, eventually excelled at it. But it never was *the* road. It just was one road; it has served me well so far, but it was just one thing I tried, and it worked out.
How did it start? As a pre-teen, I had been interested in computers, then left that for a while, did 'normal' high school (in Italy at the time, this was really non-technological), then I tried to study sociology for a little bit – I really enjoyed the Cultural Anthropology lessons there, and we were smoking good weed with some folks outside of the university, but I really could not be asked to spend the following 5 or 10 years or my life just studying and 'hanging around' – I wanted money and independence to move out of my parent's house.

So, without much fanfare, I revived my IT knowledge: upgraded my skill from the 'hobbyist' world of the Commodore 64 and Amiga scene (I had been passionate about modems and the BBS world then), looked at the PC world of the time, rode the 'Internet wave' and applied for a simple job at an IT company.

A lot of my friends were either not even searching for a job, with the excuse that there weren't any, or spending time in university, in a time of change, where all the university-level jobs were taken anyway so that would have meant waiting even more after they had finished studying… I am not even sure they realized this until much later.
But I just applied, played my cards, and got my job.

When I went to sign it, they also reminded me they expected hard work at the simplest and humblest level: I would have to fix PC's, printers, help users with networking issues and tasks like those – at a customer of theirs, a big company.
I was ready to roll up my sleeves and help that IT department however I would be capable of, and I did.
It all grew from there.

And that's how my IT career started. I learned all I know of IT on the job and by working my ass off and studying extra hours and watching older/more expert colleagues and making experience.

I am not an engineer.
I am, at most, a mechanic.
I did learn a lot of companies and the market, languages, designs, politics, the human and technical factors in software engineering and the IT marketplace/worlds, over the course of the past 18 years.

But when I started, I was just trying to lend a honest hand, to get paid some money in return – isn't that what work was about?

Over time IT got out of control. Like Venom, in the Marvel comics, that made its appearance as a costume that SpiderMan started wearing… and it slowly took over, as the 'costume' was in reality some sort of alien symbiotic organism (like a pest).

You might be wondering what I mean. From the outside I was a successful Senior Program Manager of a 'hot' Microsoft product.
Someone must have mistaken my diligence and hard work for 'talent' or 'desire of career' – but it never was.
I got pushed up, taught to never turn down 'opportunities'.

But I don't feel this is my path anymore.
That type of work takes too much metal energy off me, and made me neglect myself and my family. Success at the expense of my own health and my family's isn't worth it. Some other people wrote that too – in my case I stopped hopefully earlier.

So what am I doing now?

First and foremost, I am taking time for myself and my family.
I am reading (and writing)
I am cooking again
I have been catching up on sleep – and have dreams again
I am helping my father in law to build a shed in his yard
We bought a 14-years old Volkswagen van that we are turning into a Camper
I have not stopped building guitars – in fact I am getting setup to do it 'seriously' – so I am also standing up a separate site to promote that activity
I am making music and discovering new music and instruments
I am meeting new people and new situations

There's a lot of folks out there who either think I am crazy (they might be right, but I am happy this way), or think this is some sort of lateral move – I am not searching for another IT job, thanks. Stop the noise on LinkedIn please: I don't fit in your algorithms, I just made you believe I did, all these years.

SCOM Tools

When I was working at Microsoft, I used to maintain a few tools related to System Center Operations Manager.

You can still find them at the following links, but I have not touched them in a long time:

Repost: Useful SetSPN tips

I just saw that my former colleague (PFE) Tristan has posted an interesting note about the use of SetSPN “–A” vs SetSPN “–S”. I normally don’t repost other people’s content, but I thought this would be useful as there are a few SPN used in OpsMgr and it is not always easy to get them all right… and you can find a few tricks I was not aware of, by reading his post.

Check out the original post at http://blogs.technet.com/b/tristank/archive/2011/10/10/psa-you-really-need-to-update-your-kerberos-setup-documentation.aspx

I have been chosen; Farewell my friends…

I have been in Premier Field Engineering for nearly 7 years (it was not even called PFE when I joined – it was just "another type of support"…) and I have to admit that it has been a fun, fun ride: I worked with awesome people and managed to make a difference with our products and services for many customers – directly working with some of those customers, as well as indirectly thru the OpsMgr Health Check program – the service I led for the last 3+ years, which nowadays gets delivered hundreds of times a year around the globe by my other fellow PFEs.

But it is time to move on: I have decided to go thru a big life change for me and my family, and I won't be working as a Premier Field Engineer anymore as of next week.

But don't panic – I am staying at Microsoft!

I have actually never been closer to Microsoft than now: we are packing and moving to Seattle the coming weekend, and on July 18th I will start working as a Program Manager in the Operations Manager product team, in Redmond. I am hoping this will enable me to make a difference with even more customers.

Exciting times ahead – wish me luck!

 

That said – PFE is hiring! If you are interested in working for Microsoft – we have open positions (including my vacant position in Italy) for almost all the Microsoft technologies. Simply visit http://careers.microsoft.com and search on “PFE”.

As for the OpsMgr Health Check, don't you worry: it will continue being improved – I left it in the hands of some capable colleagues: Bruno Gabrielli, Stefan Stranger and Tim McFadden – and they have a plan and commitment to update it to OpsMgr 2012.

Improved ACS Partitions Query

This has been sitting on my hard drive for a long time. Long story short, the report I posted at Permanent Link to Audit Collection Services Database Partitions Size Report had a couple of bugs:

  1. it did not consider the size of the dtString_XXX tables but only the size of dtEvent_XXX tables – this would still give you an idea of the trends, but it could lead to quite different SIZE calculations
  2. the query was failing on some instances that have been installed with the wrong (unsupported) Collation settings.

I fixed both bugs, but I don’t have a machine with SQL 2005 and Visual Studio 2005 anymore… so I can’t rebuild my report – but I don’t want to distribute one that only works on SQL 2008 because I know that SQL2005 is still out there. This is partially the reason that held this post back.

Without waiting so much longer, therefore, I decided I’ll just give you the fixed query. Enjoy Smile

--Query to get the Partition Table
--for each partition we launch the sp_spaceused stored procedure to determine the size and other info

--partition list
select PartitionId,Status,PartitionStartTime,PartitionCloseTime 
into #t1
from dbo.dtPartition with (nolock)
order by PartitionStartTime Desc 

--sp_spaceused holder table for dtEvent
create table #t2 (
    PartitionId nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    rows nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    reserved nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    data nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    index_size nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    unused nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS    
)

--sp_spaceused holder table for dtString
create table #t3 (
    PartitionId nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    rows nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    reserved nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    data nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    index_size nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    unused nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS    
)

set nocount on

--vars used for building Partition GUID and main table name
declare @partGUID nvarchar(MAX)
declare @tblName nvarchar(MAX)
declare @tblNameComplete nvarchar(MAX)
declare @schema nvarchar(MAX)
DECLARE @vQuery NVARCHAR(MAX)

--cursor
declare c cursor for 
    select PartitionID from #t1
open c
fetch next from c into @partGUID

--start cursor usage
while @@FETCH_STATUS = 0
begin

--tblName - first usage for dtEvent
set @tblName = 'dtEvent_' + @partGUID

--retrieve the schema name
SET @vQuery = 'SELECT @dbschema = TABLE_SCHEMA from INFORMATION_SCHEMA.tables where TABLE_NAME = ''' + @tblName + ''''
EXEC sp_executesql @vQuery,N'@dbschema nvarchar(max) out, @dbtblName nvarchar(max)',@schema out, @tblname

--tblNameComplete
set @tblNameComplete = @schema + '.' + @tblName

INSERT #t2 
    EXEC sp_spaceused @tblNameComplete

--tblName - second usage for dtString
set @tblName = 'dtString_' + @partGUID

--retrieve the schema name
SET @vQuery = 'SELECT @dbschema = TABLE_SCHEMA from INFORMATION_SCHEMA.tables where TABLE_NAME = ''' + @tblName + ''''
EXEC sp_executesql @vQuery,N'@dbschema nvarchar(max) out, @dbtblName nvarchar(max)',@schema out, @tblname

--tblNameComplete
set @tblNameComplete = @schema + '.' + @tblName

INSERT #t3 
    EXEC sp_spaceused @tblNameComplete

fetch next from c into @partGUID
end
close c
deallocate c

--select * from #t2
--select * from #t3

--results
select #t1.PartitionId, 
    #t1.Status, 
    #t1.PartitionStartTime, 
    #t1.PartitionCloseTime, 
    #t2.rows,
    (CAST(LEFT(#t2.reserved,LEN(#t2.reserved)-3) AS NUMERIC(18,0)) + CAST(LEFT(#t2.reserved,LEN(#t2.reserved)-3) AS NUMERIC(18,0))) as 'reservedKB', 
    (CAST(LEFT(#t2.data,LEN(#t2.data)-3) AS NUMERIC(18,0)) + CAST(LEFT(#t3.data,LEN(#t3.data)-3) AS NUMERIC(18,0)))as 'dataKB', 
    (CAST(LEFT(#t2.index_size,LEN(#t2.index_size)-3) AS NUMERIC(18,0)) + CAST(LEFT(#t3.index_size,LEN(#t3.index_size)-3) AS NUMERIC(18,0))) as 'indexKB', 
    (CAST(LEFT(#t2.unused,LEN(#t2.unused)-3) AS NUMERIC(18,0)) + CAST(LEFT(#t3.unused,LEN(#t3.unused)-3) AS NUMERIC(18,0))) as 'unusedKB'
from #t1
join #t2
on #t2.PartitionId = ('dtEvent_' + #t1.PartitionId)
join #t3
on #t3.PartitionId = ('dtString_' + #t1.PartitionId)
order by PartitionStartTime desc

--cleanup
drop table #t1
drop table #t2
drop table #t3

OpsMgr Agents and Gateways Failover Queries

The following article by Jimmy Harper explains very well how to set up agents and gateways’ failover paths thru Powershell http://blogs.technet.com/b/jimmyharper/archive/2010/07/23/powershell-commands-to-configure-gateway-server-agent-failover.aspx . This is the approach I also recommend, and that article is great – I encourage you to check it out if you haven’t done it yet!

Anyhow, when checking for the actual failover paths that have been configured, the use of Powershell suggested by Jimmy is rather slow – especially if your agent count is high. In the Operations Manager Health Check tool I was also using that technique at the beginning, but eventually moved to the use of SQL queries just for performance reasons. Since then, we have been using these SQL queries quite successfully for about 3 years now.

But this the season of giving… and I guess SQL Queries can be a gift, right? Therefore I am now donating them as Christmas Gift to the OpsMrg community Smile

Enjoy – and Merry Christmas!

 

--GetAgentForWhichServerIsPrimary
SELECT SourceBME.DisplayName as Agent,TargetBME.DisplayName as Server
FROM Relationship R WITH (NOLOCK) 
JOIN BaseManagedEntity SourceBME 
ON R.SourceEntityID = SourceBME.BaseManagedEntityID 
JOIN BaseManagedEntity TargetBME 
ON R.TargetEntityID = TargetBME.BaseManagedEntityID 
WHERE R.RelationshipTypeId = dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceCommunication() 
AND SourceBME.DisplayName not in (select DisplayName 
from dbo.ManagedEntityGenericView WITH (NOLOCK) 
where MonitoringClassId in (select ManagedTypeId 
from dbo.ManagedType WITH (NOLOCK) 
where TypeName = 'Microsoft.SystemCenter.GatewayManagementServer') 
and IsDeleted ='0') 
AND SourceBME.DisplayName not in (select DisplayName from dbo.ManagedEntityGenericView WITH (NOLOCK) 
where MonitoringClassId in (select ManagedTypeId from dbo.ManagedType WITH (NOLOCK) 
where TypeName = 'Microsoft.SystemCenter.ManagementServer') 
and IsDeleted ='0') 
AND R.IsDeleted = '0'


--GetAgentForWhichServerIsFailover
SELECT SourceBME.DisplayName as Agent,TargetBME.DisplayName as Server
FROM Relationship R WITH (NOLOCK) 
JOIN BaseManagedEntity SourceBME 
ON R.SourceEntityID = SourceBME.BaseManagedEntityID 
JOIN BaseManagedEntity TargetBME 
ON R.TargetEntityID = TargetBME.BaseManagedEntityID 
WHERE R.RelationshipTypeId = dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceSecondaryCommunication() 
AND SourceBME.DisplayName not in (select DisplayName 
from dbo.ManagedEntityGenericView WITH (NOLOCK) 
where MonitoringClassId in (select ManagedTypeId 
from dbo.ManagedType WITH (NOLOCK) 
where TypeName = 'Microsoft.SystemCenter.GatewayManagementServer') 
and IsDeleted ='0') 
AND SourceBME.DisplayName not in (select DisplayName 
from dbo.ManagedEntityGenericView WITH (NOLOCK) 
where MonitoringClassId in (select ManagedTypeId 
from dbo.ManagedType WITH (NOLOCK) 
where TypeName = 'Microsoft.SystemCenter.ManagementServer') 
and IsDeleted ='0') 
AND R.IsDeleted = '0'


--GetGatewayForWhichServerIsPrimary
SELECT SourceBME.DisplayName as Gateway, TargetBME.DisplayName as Server
FROM Relationship R WITH (NOLOCK) 
JOIN BaseManagedEntity SourceBME 
ON R.SourceEntityID = SourceBME.BaseManagedEntityID 
JOIN BaseManagedEntity TargetBME 
ON R.TargetEntityID = TargetBME.BaseManagedEntityID 
WHERE R.RelationshipTypeId = dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceCommunication() 
AND SourceBME.DisplayName in (select DisplayName 
from dbo.ManagedEntityGenericView WITH (NOLOCK) 
where MonitoringClassId in (select ManagedTypeId 
from dbo.ManagedType WITH (NOLOCK) 
where TypeName = 'Microsoft.SystemCenter.GatewayManagementServer') 
and IsDeleted ='0') 
AND R.IsDeleted = '0'
    

--GetGatewayForWhichServerIsFailover
SELECT SourceBME.DisplayName As Gateway, TargetBME.DisplayName as Server
FROM Relationship R WITH (NOLOCK) 
JOIN BaseManagedEntity SourceBME 
ON R.SourceEntityID = SourceBME.BaseManagedEntityID 
JOIN BaseManagedEntity TargetBME 
ON R.TargetEntityID = TargetBME.BaseManagedEntityID 
WHERE R.RelationshipTypeId = dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceSecondaryCommunication() 
AND SourceBME.DisplayName in (select DisplayName 
from dbo.ManagedEntityGenericView WITH (NOLOCK) 
where MonitoringClassId in (select ManagedTypeId 
from dbo.ManagedType WITH (NOLOCK) 
where TypeName = 'Microsoft.SystemCenter.GatewayManagementServer') 
and IsDeleted ='0') 
AND R.IsDeleted = '0'


--xplat agents
select bme2.DisplayName as XPlatAgent, bme.DisplayName as Server
from dbo.Relationship r with (nolock) 
join dbo.RelationshipType rt with (nolock) 
on r.RelationshipTypeId = rt.RelationshipTypeId 
join dbo.BasemanagedEntity bme with (nolock) 
on bme.basemanagedentityid = r.SourceEntityId 
join dbo.BasemanagedEntity bme2 with (nolock) 
on r.TargetEntityId = bme2.BaseManagedEntityId 
where rt.RelationshipTypeName = 'Microsoft.SystemCenter.HealthServiceManagesEntity' 
and bme.IsDeleted = 0 
and r.IsDeleted = 0 
and bme2.basemanagedtypeid in (SELECT DerivedTypeId 
FROM DerivedManagedTypes with (nolock) 
WHERE BaseTypeId = (select managedtypeid 
from managedtype where typename = 'Microsoft.Unix.Computer') 
and DerivedIsAbstract = 0)

Got Orphaned OpsMgr Objects?

Have you ever wondered what would happen if, in Operations Manager, you’d delete a Management Server or Gateway that managed objects (such as network devices) or has agents pointing uniquely to it as their primary server?

The answer is simple, but not very pleasant: you get ORPHANED objects, which will linger in the database but you won’t be able to “see” or re-assign anymore from the GUI.

So the first thing I want to share is a query to determine IF you have any of those orphaned agents. Or even if you know, since you are not able to "see" them from the console, you might have to dig their name out of the database. Here's a query I got from a colleague in our reactive support team:


-- Check for orphaned health services (e.g. agent).
declare @DiscoverySourceId uniqueidentifier;
SET @DiscoverySourceId = dbo.fn_DiscoverySourceId_User();
SELECT TME.[TypedManagedEntityid], HS.PrincipalName
FROM MTV_HealthService HS
INNER JOIN dbo.[BaseManagedEntity] BHS WITH(nolock)
ON BHS.[BaseManagedEntityId] = HS.[BaseManagedEntityId]
-- get host managed computer instances
INNER JOIN dbo.[TypedManagedEntity] TME WITH(nolock)
ON TME.[BaseManagedEntityId] = BHS.[TopLevelHostEntityId]
AND TME.[IsDeleted] = 0
INNER JOIN dbo.[DerivedManagedTypes] DMT WITH(nolock)
ON DMT.[DerivedTypeId] = TME.[ManagedTypeId]
INNER JOIN dbo.[ManagedType] BT WITH(nolock)
ON DMT.[BaseTypeId] = BT.[ManagedTypeId]
AND BT.[TypeName] = N'Microsoft.Windows.Computer'
-- only with missing primary
LEFT OUTER JOIN dbo.Relationship HSC WITH(nolock)
ON HSC.[SourceEntityId] = HS.[BaseManagedEntityId]
AND HSC.[RelationshipTypeId] = dbo.fn_RelationshipTypeId_HealthServiceCommunication()
AND HSC.[IsDeleted] = 0
INNER JOIN DiscoverySourceToTypedManagedEntity DSTME WITH(nolock)
ON DSTME.[TypedManagedEntityId] = TME.[TypedManagedEntityId]
AND DSTME.[DiscoverySourceId] = @DiscoverySourceId
WHERE HS.[IsAgent] = 1
AND HSC.[RelationshipId] IS NULL;

Once you have identified the agent you need to re-assign to a new management server, this is doable from the SDK. Below is a powershell script I wrote which will re-assign it to the RMS. It has to run from within the OpsMgr Command Shell.
You still need to change the logic which chooses which agent – this is meant as a starting base… you could easily expand it into accepting parameters and/or consuming an input text file, or using a different Management Server than the RMS… you get the point.

  1. $mg = (get-managementgroupconnection).managementgroup
  2. $mrc = Get-RelationshipClass | where {$_.name –like "*Microsoft.SystemCenter.HealthServiceCommunication*"}
  3. $cmro = new-object Microsoft.EnterpriseManagement.Monitoring.CustomMonitoringRelationshipObject($mrc)
  4. $rms = (get-rootmanagementserver).HostedHealthService
  5. $deviceclass = $mg.getmonitoringclass(“HealthService”)
  6. $mc = Get-connector | where {$_.Name –like “*MOM Internal Connector*”}
  7. Foreach ($obj in $mg.GetMonitoringObjects($deviceclass))
  8. {
  9.     #the next line should be changed to pick the right agent to re-assign
  10.     if ($obj.DisplayName -match 'dsxlab')
  11.     {
  12.                 Write-host $obj.displayname
  13.                 $imdd = new-object Microsoft.EnterpriseManagement.ConnectorFramework.IncrementalMonitoringDiscoveryData
  14.                 $cmro.SetSource($obj)
  15.                 $cmro.SetTarget($rms)
  16.                 $imdd.Add($cmro)
  17.                 $imdd.Commit($mc)
  18.     }
  19. }

Similarly, you might get orphaned network devices. The script below is used to re-assign all Network Devices to the RMS. This script is actually something I have had even before the other one (yes, it has been sitting in my "digital drawer" for a couple of years or more…) and uses the same concept – only you might notice that the relation's source and target are "reversed", since the relationships are different:

  • the Management Server (source) "manages" the Network Device (target)
  • the Agent (source) "talks" to the Management Server (target)

With a bit of added logic it should be easy to have it work for specific devices.

  1. $mg = (get-managementgroupconnection).managementgroup
  2. $mrc = Get-RelationshipClass | where {$_.name –like "*Microsoft.SystemCenter.HealthServiceShouldManageEntity*"}
  3. $cmro = new-object Microsoft.EnterpriseManagement.Monitoring.CustomMonitoringRelationshipObject($mrc)
  4. $rms = (get-rootmanagementserver).HostedHealthService
  5. $deviceclass = $mg.getmonitoringclass(“NetworkDevice”)
  6. Foreach ($obj in $mg.GetMonitoringObjects($deviceclass))
  7. {
  8.                 Write-host $obj.displayname
  9.                 $imdd = new-object Microsoft.EnterpriseManagement.ConnectorFramework.IncrementalMonitoringDiscoveryData
  10.                 $cmro.SetSource($rms)
  11.                 $cmro.SetTarget($obj)
  12.                 $imdd.Add($cmro)
  13.                 $mc = Get-connector | where {$_.Name –like “*MOM Internal Connector*”}
  14.                 $imdd.Commit($mc)
  15. }

Disclaimer

The information in this weblog is provided "AS IS" with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my own personal opinion. All code samples are provided "AS IS" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.

Does anyone have a new System Center sticker for me?

Does anyone have a new System Center sticker?

I got this sticker last APRIL at MMS2010 in JUST ONE COPY, and I waited till I got a NEW laptop in SEPTEMBER to actually use that…
It also took a while to stick it on properly (other than to re-install the PC as I wanted…),  but this week they told me that, for an error, I got given the wrong machine (they did it all themselves, tho – I did not ask for any specific one) and this one needs to be replaced!!!!

This is WORSE than any hardware FAILure, as the machine just works very well and I was expecting to keep it for the next two years 🙁

Can anyone be so nice to send me one of those awesome stickers again? 🙂

Microsoft.Linux.RHEL.5.LogicalDisk.DiskBytesPerSecond Type Mismatch

I have had the following in my notes for a while… and I have not blogged in a while (been too busy) so I decided to blog it today, before the topic gets too old and starts stinking Smile

 

It all started when a customer showed me an Alert he was seeing in his environment from some XPlat workflow. The alert looks like the following:

Generic Performance Mapper Module Failed Execution
Alert Description Source: RLWSCOM02.domain.dom
Module was unable to convert parameter to a double value
Original parameter: '$Data///*[local-name()="BytesPerSecond"]$'
Parameter after $Data replacement: "
Error: 0x80020005
Details: Type mismatch.
One or more workflows were affected by this.
Workflow name: Microsoft.Linux.RHEL.5.LogicalDisk.DiskBytesPerSecond.Collection
Instance name: /
Instance ID: {4F6FA8F5-C56F-4C9B-ED36-12DAFF4073D1}
Management group: DataCenter
Path: RLWSCOM02.domain.dom\RLWSCOM02.domain.dom Alert Rule: Generic Performance Mapper Module Runtime Failure Created: 6/28/2010 11:30:28 PM

 

First I stumbled into this forum post which mentions he same symptom http://social.technet.microsoft.com/Forums/en-US/crossplatformgeneral/thread/62e0bf3e-be6f-4218-a37b-f1e66f02aa49 – but when looking at the resolution, the locale on the customer machine was good (== set to US settings), so I concluded that it was not the same root cause.

 

Then I looked at what that rule was supposed to do, and queried the same CIM class both remotely thru WS-Man and locally via CIM, and concluded that my issue was that certain values were returning as NULL while we were expecting to see a number on the Management Server – therefore the Type Mismatch!

I have explained previously how to run CIM queries against the XPlat agent; in this case it was the following one:

winrm enumerate http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_FileSystemStatisticalInformation?__cimnamespace=root/scx -username:scomuser -password:password -r:https://rllspago01.domain.dom:1270/wsman -auth:basic –skipCACheck -skipCNCheck

 

SCX_FileSystemStatisticalInformation

AverageDiskQueueLength = null

AverageTransferTime = null

BytesPerSecond = null

Caption = File system information

Description = Performance statistics related to a logical unit of secondary storage

ElementName = null

FreeMegabytes = 4007

IsAggregate = false

IsOnline = true

Name = /

PercentBusyTime = null

PercentFreeSpace = 55

PercentIdleTime = null

PercentUsedSpace = 45

ReadBytesPerSecond = null

ReadsPerSecond = null

TransfersPerSecond = null

UsedMegabytes = 3278

WriteBytesPerSecond = null

WritesPerSecond = null

 

See the NULLs ? Those are our issue.

Now, before you continue reading, I will tell you that I have investigated this also internally, and apparently we have just (in Cumulative Update 3) changed this behaviour in our XPlat modules, so that when NULL is returned, we consider it to be ZERO. Good or bad that is, it will at least take care of the error. But if you don’t get any data from the Unix system… well, you are not getting any data – so that might cause a surprise later on when you go and look at those charts and expect to see your disk “performance counters” but in fact all you have is a bunch of ZERO’s (how very interesting!). So, basically, the fix in CU3 suppresses the symptom, but does not address the cause.

So, let’s see what is actually causing this, as you might well want to get those statistics, or probably you would not be monitoring that server!

I looked at the Cimd.log (set to verbose) only says the following (basically not much: is getting info for 3 partitions… and the provider code is working)

2010-09-01T08:38:32,796Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances()

2010-09-01T08:38:33,359Z Trace      [scx.core.providers.diskprovider:5964:3086830480] Object Path = //rllspago01.domain.dom/root/scx:SCX_FileSystemStatisticalInformation

2010-09-01T08:38:33,359Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances() – Calling DoEnumInstances()

2010-09-01T08:38:33,359Z Trace      [scx.core.providers.diskprovider:5964:3086830480] DiskProvider DoEnumInstances

2010-09-01T08:38:33,359Z Trace      [scx.core.providers.diskprovider:5964:3086830480] DiskProvider GetDiskEnumeration – type 3

2010-09-01T08:38:33,360Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances() – DoEnumInstances() returned – 3

2010-09-01T08:38:33,360Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances() – Call ReturnDone

2010-09-01T08:38:33,360Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances() – return OK

2010-09-01T08:38:33,360Z Trace      [scx.core.provsup.cmpibase.singleprovider.DiskProvider:5964:3086830480] SingleProvider::EnumInstances() – Returning – 0

 

but it still did not give me an idea as to why we would not get data for those “counters”. A this point I stopped using complex troubleshooting techniques and simply turned intuition on, and tried with some help from a search engine: http://www.bing.com/search?q=How+do+I+find+out+Linux+Disk+utilization 

the results I got all mentioned that on Linux you would use the “iostat” command.

So I tried to use and… lol and behold: the iostat commend was NOT INSTALLED on that machine!

Guess what? We installed it (it is included in the “sysstat” package for RedHat linux, so a simple “yum install sysstat” took care of this) and the counters started working!

Hope that is useful to some.

OpsMgr Event IDs Spreadsheet

I work in support (mostly with System Center Operations Manager, as you know), and I work with event logs every day. The following are typical situations:

  1. I get a colleague or a customer telling me “I am having a problem and the SCOM agent is showing 21037 events and 20002 events.  What’s wrong with it?”   
  2. I want to tune an OpsMgr environment and reduce load on the database by turning off a few event collections, as my friend Kevin Holman suggests here http://blogs.technet.com/kevinholman/archive/2009/11/25/tuning-tip-turning-off-some-over-collection-of-events.aspx .
  3. I am analyzing, sorting and grouping Events with Powershell like I have written on my blog lately http://www.muscetta.com/2009/12/16/opsmgr-eventlog-analysis-with-powershell/ but I can’t read those long descriptions properly.
  4. I exported an EVT from a customer environment and I load it on a machine that does not have OpsMgr message DLLs installed – all I see are EventIDs and type (Warning, Error) – but no real description – and I still want to figure out what those events are trying to tell me.

Getting to the point: I, like everyone – don’t have every OpsMgr event memorized.

This is why I thought of building this spreadsheet, and I hope it might come in handy to more people.

The spreadsheet contains an “AllEvents” list – and then the same events are broken down by event source as well:

clip_image002

When you want to search for an events (in one of the situations described above) just open up the spreadsheet, go to the “AllEvents” tab, hit CTRL+F (“Find”) and type in the Event ID you are searching for:

clip_image004

And this will take you to the row containing the event, so you can look up its description:

clip_image006

The description shows the event standard text (which is in the message DLL, therefore is the part you will not see if opening an EVT on another machine that does not have OpsMgr installed), and where the event parameters are (%1, %2, etc – which will be the strings you see in the EVT anyway).

That way you can get an understanding of what the original message would have looked like on the original machine.

This is just one possible usage pattern of this reference. It can also be useful to just read/study the events, learning about new ones you have never encountered, or remembering those you HAVE seen in the past but did not quite remember. And of course you can also find other creative ways to use it.

You can get it from here.

 

A few last words to give due credit: this spreadsheet has been compiled by using Eventlog Explorer (http://blogs.technet.com/momteam/archive/2008/04/02/eventlog-explorer.aspx ) to extract the event information out of the message DLLs on a OpsMgr2007 R2 installation. That info has been then copied and pasted in Excel in order to have an “offline” reference. Also I would like to thank Kevin Holman for pointing me to Eventlog Explorer first, and then for insisting I should not keep this spreadsheet in my drawer, as it could be useful to more people!