Archive for the 'OpsMgr2007' Category

RSS Feed for the 'OpsMgr2007' Category

Repost: Useful SetSPN tips

Wednesday, October 19th, 2011

I just saw that my former colleague (PFE) Tristan has posted an interesting note about the use of SetSPN “–A” vs SetSPN “–S”. I normally don’t repost other people’s content, but I thought this would be useful as there are a few SPN used in OpsMgr and it is not always easy to get them all right… and you can find a few tricks I was not aware of, by reading his post.

Check out the original post at http://blogs.technet.com/b/tristank/archive/2011/10/10/psa-you-really-need-to-update-your-kerberos-setup-documentation.aspx

I have been chosen; Farewell my friends…

Thursday, July 7th, 2011

I have been in Premier Field Engineering for nearly 7 years (it was not even called PFE when I joined – it was just "another type of support"…) and I have to admit that it has been a fun, fun ride: I worked with awesome people and managed to make a difference with our products and services for many customers – directly working with some of those customers, as well as indirectly thru the OpsMgr Health Check program – the service I led for the last 3+ years, which nowadays gets delivered hundreds of times a year around the globe by my other fellow PFEs.

But it is time to move on: I have decided to go thru a big life change for me and my family, and I won't be working as a Premier Field Engineer anymore as of next week.

But don't panic – I am staying at Microsoft!

I have actually never been closer to Microsoft than now: we are packing and moving to Seattle the coming weekend, and on July 18th I will start working as a Program Manager in the Operations Manager product team, in Redmond. I am hoping this will enable me to make a difference with even more customers.

Exciting times ahead – wish me luck!

 

That said – PFE is hiring! If you are interested in working for Microsoft – we have open positions (including my vacant position in Italy) for almost all the Microsoft technologies. Simply visit http://careers.microsoft.com and search on “PFE”.

As for the OpsMgr Health Check, don't you worry: it will continue being improved – I left it in the hands of some capable colleagues: Bruno Gabrielli, Stefan Stranger and Tim McFadden – and they have a plan and commitment to update it to OpsMgr 2012.

Improved ACS Partitions Query

Wednesday, May 4th, 2011

This has been sitting on my hard drive for a long time. Long story short, the report I posted at Permanent Link to Audit Collection Services Database Partitions Size Report had a couple of bugs:

  1. it did not consider the size of the dtString_XXX tables but only the size of dtEvent_XXX tables – this would still give you an idea of the trends, but it could lead to quite different SIZE calculations
  2. the query was failing on some instances that have been installed with the wrong (unsupported) Collation settings.

I fixed both bugs, but I don’t have a machine with SQL 2005 and Visual Studio 2005 anymore… so I can’t rebuild my report – but I don’t want to distribute one that only works on SQL 2008 because I know that SQL2005 is still out there. This is partially the reason that held this post back.

Without waiting so much longer, therefore, I decided I’ll just give you the fixed query. Enjoy Smile

--Query to get the Partition Table
--for each partition we launch the sp_spaceused stored procedure to determine the size and other info

--partition list
select PartitionId,Status,PartitionStartTime,PartitionCloseTime 
into #t1
from dbo.dtPartition with (nolock)
order by PartitionStartTime Desc 

--sp_spaceused holder table for dtEvent
create table #t2 (
    PartitionId nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    rows nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    reserved nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    data nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    index_size nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    unused nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS    
)

--sp_spaceused holder table for dtString
create table #t3 (
    PartitionId nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    rows nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    reserved nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    data nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    index_size nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    unused nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS    
)

set nocount on

--vars used for building Partition GUID and main table name
declare @partGUID nvarchar(MAX)
declare @tblName nvarchar(MAX)
declare @tblNameComplete nvarchar(MAX)
declare @schema nvarchar(MAX)
DECLARE @vQuery NVARCHAR(MAX)

--cursor
declare c cursor for 
    select PartitionID from #t1
open c
fetch next from c into @partGUID

--start cursor usage
while @@FETCH_STATUS = 0
begin

--tblName - first usage for dtEvent
set @tblName = 'dtEvent_' + @partGUID

--retrieve the schema name
SET @vQuery = 'SELECT @dbschema = TABLE_SCHEMA from INFORMATION_SCHEMA.tables where TABLE_NAME = ''' + @tblName + ''''
EXEC sp_executesql @vQuery,N'@dbschema nvarchar(max) out, @dbtblName nvarchar(max)',@schema out, @tblname

--tblNameComplete
set @tblNameComplete = @schema + '.' + @tblName

INSERT #t2 
    EXEC sp_spaceused @tblNameComplete

--tblName - second usage for dtString
set @tblName = 'dtString_' + @partGUID

--retrieve the schema name
SET @vQuery = 'SELECT @dbschema = TABLE_SCHEMA from INFORMATION_SCHEMA.tables where TABLE_NAME = ''' + @tblName + ''''
EXEC sp_executesql @vQuery,N'@dbschema nvarchar(max) out, @dbtblName nvarchar(max)',@schema out, @tblname

--tblNameComplete
set @tblNameComplete = @schema + '.' + @tblName

INSERT #t3 
    EXEC sp_spaceused @tblNameComplete

fetch next from c into @partGUID
end
close c
deallocate c

--select * from #t2
--select * from #t3

--results
select #t1.PartitionId, 
    #t1.Status, 
    #t1.PartitionStartTime, 
    #t1.PartitionCloseTime, 
    #t2.rows,
    (CAST(LEFT(#t2.reserved,LEN(#t2.reserved)-3) AS NUMERIC(18,0)) + CAST(LEFT(#t2.reserved,LEN(#t2.reserved)-3) AS NUMERIC(18,0))) as 'reservedKB', 
    (CAST(LEFT(#t2.data,LEN(#t2.data)-3) AS NUMERIC(18,0)) + CAST(LEFT(#t3.data,LEN(#t3.data)-3) AS NUMERIC(18,0)))as 'dataKB', 
    (CAST(LEFT(#t2.index_size,LEN(#t2.index_size)-3) AS NUMERIC(18,0)) + CAST(LEFT(#t3.index_size,LEN(#t3.index_size)-3) AS NUMERIC(18,0))) as 'indexKB', 
    (CAST(LEFT(#t2.unused,LEN(#t2.unused)-3) AS NUMERIC(18,0)) + CAST(LEFT(#t3.unused,LEN(#t3.unused)-3) AS NUMERIC(18,0))) as 'unusedKB'
from #t1
join #t2
on #t2.PartitionId = ('dtEvent_' + #t1.PartitionId)
join #t3
on #t3.PartitionId = ('dtString_' + #t1.PartitionId)
order by PartitionStartTime desc

--cleanup
drop table #t1
drop table #t2
drop table #t3

OpsMgr Agents and Gateways Failover Queries

Thursday, December 23rd, 2010

The following article by Jimmy Harper explains very well how to set up agents and gateways’ failover paths thru Powershell http://blogs.technet.com/b/jimmyharper/archive/2010/07/23/powershell-commands-to-configure-gateway-server-agent-failover.aspx . This is the approach I also recommend, and that article is great – I encourage you to check it out if you haven’t done it yet!

Anyhow, when checking for the actual failover paths that have been configured, the use of Powershell suggested by Jimmy is rather slow – especially if your agent count is high. In the Operations Manager Health Check tool I was also using that technique at the beginning, but eventually moved to the use of SQL queries just for performance reasons. Since then, we have been using these SQL queries quite successfully for about 3 years now.

But this the season of giving… and I guess SQL Queries can be a gift, right? Therefore I am now donating them as Christmas Gift to the OpsMrg community Smile

Enjoy – and Merry Christmas!

 

--GetAgentForWhichServerIsPrimary
SELECT SourceBME.DisplayName as Agent,TargetBME.DisplayName as Server
FROM Relationship R WITH (NOLOCK) 
JOIN BaseManagedEntity SourceBME 
ON R.SourceEntityID = SourceBME.BaseManagedEntityID 
JOIN BaseManagedEntity TargetBME 
ON R.TargetEntityID = TargetBME.BaseManagedEntityID 
WHERE R.RelationshipTypeId = dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceCommunication() 
AND SourceBME.DisplayName not in (select DisplayName 
from dbo.ManagedEntityGenericView WITH (NOLOCK) 
where MonitoringClassId in (select ManagedTypeId 
from dbo.ManagedType WITH (NOLOCK) 
where TypeName = 'Microsoft.SystemCenter.GatewayManagementServer') 
and IsDeleted ='0') 
AND SourceBME.DisplayName not in (select DisplayName from dbo.ManagedEntityGenericView WITH (NOLOCK) 
where MonitoringClassId in (select ManagedTypeId from dbo.ManagedType WITH (NOLOCK) 
where TypeName = 'Microsoft.SystemCenter.ManagementServer') 
and IsDeleted ='0') 
AND R.IsDeleted = '0'


--GetAgentForWhichServerIsFailover
SELECT SourceBME.DisplayName as Agent,TargetBME.DisplayName as Server
FROM Relationship R WITH (NOLOCK) 
JOIN BaseManagedEntity SourceBME 
ON R.SourceEntityID = SourceBME.BaseManagedEntityID 
JOIN BaseManagedEntity TargetBME 
ON R.TargetEntityID = TargetBME.BaseManagedEntityID 
WHERE R.RelationshipTypeId = dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceSecondaryCommunication() 
AND SourceBME.DisplayName not in (select DisplayName 
from dbo.ManagedEntityGenericView WITH (NOLOCK) 
where MonitoringClassId in (select ManagedTypeId 
from dbo.ManagedType WITH (NOLOCK) 
where TypeName = 'Microsoft.SystemCenter.GatewayManagementServer') 
and IsDeleted ='0') 
AND SourceBME.DisplayName not in (select DisplayName 
from dbo.ManagedEntityGenericView WITH (NOLOCK) 
where MonitoringClassId in (select ManagedTypeId 
from dbo.ManagedType WITH (NOLOCK) 
where TypeName = 'Microsoft.SystemCenter.ManagementServer') 
and IsDeleted ='0') 
AND R.IsDeleted = '0'


--GetGatewayForWhichServerIsPrimary
SELECT SourceBME.DisplayName as Gateway, TargetBME.DisplayName as Server
FROM Relationship R WITH (NOLOCK) 
JOIN BaseManagedEntity SourceBME 
ON R.SourceEntityID = SourceBME.BaseManagedEntityID 
JOIN BaseManagedEntity TargetBME 
ON R.TargetEntityID = TargetBME.BaseManagedEntityID 
WHERE R.RelationshipTypeId = dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceCommunication() 
AND SourceBME.DisplayName in (select DisplayName 
from dbo.ManagedEntityGenericView WITH (NOLOCK) 
where MonitoringClassId in (select ManagedTypeId 
from dbo.ManagedType WITH (NOLOCK) 
where TypeName = 'Microsoft.SystemCenter.GatewayManagementServer') 
and IsDeleted ='0') 
AND R.IsDeleted = '0'
    

--GetGatewayForWhichServerIsFailover
SELECT SourceBME.DisplayName As Gateway, TargetBME.DisplayName as Server
FROM Relationship R WITH (NOLOCK) 
JOIN BaseManagedEntity SourceBME 
ON R.SourceEntityID = SourceBME.BaseManagedEntityID 
JOIN BaseManagedEntity TargetBME 
ON R.TargetEntityID = TargetBME.BaseManagedEntityID 
WHERE R.RelationshipTypeId = dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceSecondaryCommunication() 
AND SourceBME.DisplayName in (select DisplayName 
from dbo.ManagedEntityGenericView WITH (NOLOCK) 
where MonitoringClassId in (select ManagedTypeId 
from dbo.ManagedType WITH (NOLOCK) 
where TypeName = 'Microsoft.SystemCenter.GatewayManagementServer') 
and IsDeleted ='0') 
AND R.IsDeleted = '0'


--xplat agents
select bme2.DisplayName as XPlatAgent, bme.DisplayName as Server
from dbo.Relationship r with (nolock) 
join dbo.RelationshipType rt with (nolock) 
on r.RelationshipTypeId = rt.RelationshipTypeId 
join dbo.BasemanagedEntity bme with (nolock) 
on bme.basemanagedentityid = r.SourceEntityId 
join dbo.BasemanagedEntity bme2 with (nolock) 
on r.TargetEntityId = bme2.BaseManagedEntityId 
where rt.RelationshipTypeName = 'Microsoft.SystemCenter.HealthServiceManagesEntity' 
and bme.IsDeleted = 0 
and r.IsDeleted = 0 
and bme2.basemanagedtypeid in (SELECT DerivedTypeId 
FROM DerivedManagedTypes with (nolock) 
WHERE BaseTypeId = (select managedtypeid 
from managedtype where typename = 'Microsoft.Unix.Computer') 
and DerivedIsAbstract = 0)

Got Orphaned OpsMgr Objects?

Friday, December 17th, 2010

Have you ever wondered what would happen if, in Operations Manager, you’d delete a Management Server or Gateway that managed objects (such as network devices) or has agents pointing uniquely to it as their primary server?

The answer is simple, but not very pleasant: you get ORPHANED objects, which will linger in the database but you won’t be able to “see” or re-assign anymore from the GUI.

So the first thing I want to share is a query to determine IF you have any of those orphaned agents. Or even if you know, since you are not able to "see" them from the console, you might have to dig their name out of the database. Here's a query I got from a colleague in our reactive support team:


-- Check for orphaned health services (e.g. agent).
declare @DiscoverySourceId uniqueidentifier;
SET @DiscoverySourceId = dbo.fn_DiscoverySourceId_User();
SELECT TME.[TypedManagedEntityid], HS.PrincipalName
FROM MTV_HealthService HS
INNER JOIN dbo.[BaseManagedEntity] BHS WITH(nolock)
ON BHS.[BaseManagedEntityId] = HS.[BaseManagedEntityId]
-- get host managed computer instances
INNER JOIN dbo.[TypedManagedEntity] TME WITH(nolock)
ON TME.[BaseManagedEntityId] = BHS.[TopLevelHostEntityId]
AND TME.[IsDeleted] = 0
INNER JOIN dbo.[DerivedManagedTypes] DMT WITH(nolock)
ON DMT.[DerivedTypeId] = TME.[ManagedTypeId]
INNER JOIN dbo.[ManagedType] BT WITH(nolock)
ON DMT.[BaseTypeId] = BT.[ManagedTypeId]
AND BT.[TypeName] = N'Microsoft.Windows.Computer'
-- only with missing primary
LEFT OUTER JOIN dbo.Relationship HSC WITH(nolock)
ON HSC.[SourceEntityId] = HS.[BaseManagedEntityId]
AND HSC.[RelationshipTypeId] = dbo.fn_RelationshipTypeId_HealthServiceCommunication()
AND HSC.[IsDeleted] = 0
INNER JOIN DiscoverySourceToTypedManagedEntity DSTME WITH(nolock)
ON DSTME.[TypedManagedEntityId] = TME.[TypedManagedEntityId]
AND DSTME.[DiscoverySourceId] = @DiscoverySourceId
WHERE HS.[IsAgent] = 1
AND HSC.[RelationshipId] IS NULL;

Once you have identified the agent you need to re-assign to a new management server, this is doable from the SDK. Below is a powershell script I wrote which will re-assign it to the RMS. It has to run from within the OpsMgr Command Shell.
You still need to change the logic which chooses which agent – this is meant as a starting base… you could easily expand it into accepting parameters and/or consuming an input text file, or using a different Management Server than the RMS… you get the point.

  1. $mg = (get-managementgroupconnection).managementgroup
  2. $mrc = Get-RelationshipClass | where {$_.name –like "*Microsoft.SystemCenter.HealthServiceCommunication*"}
  3. $cmro = new-object Microsoft.EnterpriseManagement.Monitoring.CustomMonitoringRelationshipObject($mrc)
  4. $rms = (get-rootmanagementserver).HostedHealthService
  5. $deviceclass = $mg.getmonitoringclass(“HealthService”)
  6. $mc = Get-connector | where {$_.Name –like “*MOM Internal Connector*”}
  7. Foreach ($obj in $mg.GetMonitoringObjects($deviceclass))
  8. {
  9.     #the next line should be changed to pick the right agent to re-assign
  10.     if ($obj.DisplayName -match 'dsxlab')
  11.     {
  12.                 Write-host $obj.displayname
  13.                 $imdd = new-object Microsoft.EnterpriseManagement.ConnectorFramework.IncrementalMonitoringDiscoveryData
  14.                 $cmro.SetSource($obj)
  15.                 $cmro.SetTarget($rms)
  16.                 $imdd.Add($cmro)
  17.                 $imdd.Commit($mc)
  18.     }
  19. }

Similarly, you might get orphaned network devices. The script below is used to re-assign all Network Devices to the RMS. This script is actually something I have had even before the other one (yes, it has been sitting in my "digital drawer" for a couple of years or more…) and uses the same concept – only you might notice that the relation's source and target are "reversed", since the relationships are different:

  • the Management Server (source) "manages" the Network Device (target)
  • the Agent (source) "talks" to the Management Server (target)

With a bit of added logic it should be easy to have it work for specific devices.

  1. $mg = (get-managementgroupconnection).managementgroup
  2. $mrc = Get-RelationshipClass | where {$_.name –like "*Microsoft.SystemCenter.HealthServiceShouldManageEntity*"}
  3. $cmro = new-object Microsoft.EnterpriseManagement.Monitoring.CustomMonitoringRelationshipObject($mrc)
  4. $rms = (get-rootmanagementserver).HostedHealthService
  5. $deviceclass = $mg.getmonitoringclass(“NetworkDevice”)
  6. Foreach ($obj in $mg.GetMonitoringObjects($deviceclass))
  7. {
  8.                 Write-host $obj.displayname
  9.                 $imdd = new-object Microsoft.EnterpriseManagement.ConnectorFramework.IncrementalMonitoringDiscoveryData
  10.                 $cmro.SetSource($rms)
  11.                 $cmro.SetTarget($obj)
  12.                 $imdd.Add($cmro)
  13.                 $mc = Get-connector | where {$_.Name –like “*MOM Internal Connector*”}
  14.                 $imdd.Commit($mc)
  15. }

Disclaimer

The information in this weblog is provided "AS IS" with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my own personal opinion. All code samples are provided "AS IS" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.

Does anyone have a new System Center sticker for me?

Saturday, November 27th, 2010

Does anyone have a new System Center sticker?

I got this sticker last APRIL at MMS2010 in JUST ONE COPY, and I waited till I got a NEW laptop in SEPTEMBER to actually use that…
It also took a while to stick it on properly (other than to re-install the PC as I wanted…),  but this week they told me that, for an error, I got given the wrong machine (they did it all themselves, tho - I did not ask for any specific one) and this one needs to be replaced!!!!

This is WORSE than any hardware FAILure, as the machine just works very well and I was expecting to keep it for the next two years :-(

Can anyone be so nice to send me one of those awesome stickers again? :-)

Microsoft.Linux.RHEL.5.LogicalDisk.DiskBytesPerSecond Type Mismatch

Friday, October 22nd, 2010

I have had the following in my notes for a while… and I have not blogged in a while (been too busy) so I decided to blog it today, before the topic gets too old and starts stinking Smile

 

It all started when a customer showed me an Alert he was seeing in his environment from some XPlat workflow. The alert looks like the following:

Generic Performance Mapper Module Failed Execution
Alert Description Source: RLWSCOM02.domain.dom
Module was unable to convert parameter to a double value
Original parameter: '$Data///*[local-name()="BytesPerSecond"]$'
Parameter after $Data replacement: "
Error: 0×80020005
Details: Type mismatch.
One or more workflows were affected by this.
Workflow name: Microsoft.Linux.RHEL.5.LogicalDisk.DiskBytesPerSecond.Collection
Instance name: /
Instance ID: {4F6FA8F5-C56F-4C9B-ED36-12DAFF4073D1}
Management group: DataCenter
Path: RLWSCOM02.domain.dom\RLWSCOM02.domain.dom Alert Rule: Generic Performance Mapper Module Runtime Failure Created: 6/28/2010 11:30:28 PM

 

First I stumbled into this forum post which mentions he same symptom http://social.technet.microsoft.com/Forums/en-US/crossplatformgeneral/thread/62e0bf3e-be6f-4218-a37b-f1e66f02aa49 – but when looking at the resolution, the locale on the customer machine was good (== set to US settings), so I concluded that it was not the same root cause.

 

Then I looked at what that rule was supposed to do, and queried the same CIM class both remotely thru WS-Man and locally via CIM, and concluded that my issue was that certain values were returning as NULL while we were expecting to see a number on the Management Server – therefore the Type Mismatch!

I have explained previously how to run CIM queries against the XPlat agent; in this case it was the following one:

winrm enumerate http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_FileSystemStatisticalInformation?__cimnamespace=root/scx -username:scomuser -password:password -r:https://rllspago01.domain.dom:1270/wsman -auth:basic –skipCACheck -skipCNCheck

 

SCX_FileSystemStatisticalInformation

AverageDiskQueueLength = null

AverageTransferTime = null

BytesPerSecond = null

Caption = File system information

Description = Performance statistics related to a logical unit of secondary storage

ElementName = null

FreeMegabytes = 4007

IsAggregate = false

IsOnline = true

Name = /

PercentBusyTime = null

PercentFreeSpace = 55

PercentIdleTime = null

PercentUsedSpace = 45

ReadBytesPerSecond = null

ReadsPerSecond = null

TransfersPerSecond = null

UsedMegabytes = 3278

WriteBytesPerSecond = null

WritesPerSecond = null

 

See the NULLs ? Those are our issue.

Now, before you continue reading, I will tell you that I have investigated this also internally, and apparently we have just (in Cumulative Update 3) changed this behaviour in our XPlat modules, so that when NULL is returned, we consider it to be ZERO. Good or bad that is, it will at least take care of the error. But if you don’t get any data from the Unix system… well, you are not getting any data – so that might cause a surprise later on when you go and look at those charts and expect to see your disk “performance counters” but in fact all you have is a bunch of ZERO’s (how very interesting!). So, basically, the fix in CU3 suppresses the symptom, but does not address the cause.

So, let’s see what is actually causing this, as you might well want to get those statistics, or probably you would not be monitoring that server!

I looked at the Cimd.log (set to verbose) only says the following (basically not much: is getting info for 3 partitions… and the provider code is working)

2010-09-01T08:38:32,796Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances()

2010-09-01T08:38:33,359Z Trace      [scx.core.providers.diskprovider:5964:3086830480] Object Path = //rllspago01.domain.dom/root/scx:SCX_FileSystemStatisticalInformation

2010-09-01T08:38:33,359Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances() – Calling DoEnumInstances()

2010-09-01T08:38:33,359Z Trace      [scx.core.providers.diskprovider:5964:3086830480] DiskProvider DoEnumInstances

2010-09-01T08:38:33,359Z Trace      [scx.core.providers.diskprovider:5964:3086830480] DiskProvider GetDiskEnumeration – type 3

2010-09-01T08:38:33,360Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances() – DoEnumInstances() returned – 3

2010-09-01T08:38:33,360Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances() – Call ReturnDone

2010-09-01T08:38:33,360Z Trace      [scx.core.providers.diskprovider:5964:3086830480] BaseProvider::EnumInstances() – return OK

2010-09-01T08:38:33,360Z Trace      [scx.core.provsup.cmpibase.singleprovider.DiskProvider:5964:3086830480] SingleProvider::EnumInstances() - Returning – 0

 

but it still did not give me an idea as to why we would not get data for those “counters”. A this point I stopped using complex troubleshooting techniques and simply turned intuition on, and tried with some help from a search engine: http://www.bing.com/search?q=How+do+I+find+out+Linux+Disk+utilization 

the results I got all mentioned that on Linux you would use the “iostat” command.

So I tried to use and… lol and behold: the iostat commend was NOT INSTALLED on that machine!

Guess what? We installed it (it is included in the “sysstat” package for RedHat linux, so a simple “yum install sysstat” took care of this) and the counters started working!

Hope that is useful to some.

OpsMgr Event IDs Spreadsheet

Tuesday, June 22nd, 2010

I work in support (mostly with System Center Operations Manager, as you know), and I work with event logs every day. The following are typical situations:

  1. I get a colleague or a customer telling me “I am having a problem and the SCOM agent is showing 21037 events and 20002 events.  What’s wrong with it?”   
  2. I want to tune an OpsMgr environment and reduce load on the database by turning off a few event collections, as my friend Kevin Holman suggests here http://blogs.technet.com/kevinholman/archive/2009/11/25/tuning-tip-turning-off-some-over-collection-of-events.aspx .
  3. I am analyzing, sorting and grouping Events with Powershell like I have written on my blog lately http://www.muscetta.com/2009/12/16/opsmgr-eventlog-analysis-with-powershell/ but I can’t read those long descriptions properly.
  4. I exported an EVT from a customer environment and I load it on a machine that does not have OpsMgr message DLLs installed – all I see are EventIDs and type (Warning, Error) – but no real description – and I still want to figure out what those events are trying to tell me.

Getting to the point: I, like everyone – don’t have every OpsMgr event memorized.

This is why I thought of building this spreadsheet, and I hope it might come in handy to more people.

The spreadsheet contains an “AllEvents” list – and then the same events are broken down by event source as well:

clip_image002

When you want to search for an events (in one of the situations described above) just open up the spreadsheet, go to the “AllEvents” tab, hit CTRL+F (“Find”) and type in the Event ID you are searching for:

clip_image004

And this will take you to the row containing the event, so you can look up its description:

clip_image006

The description shows the event standard text (which is in the message DLL, therefore is the part you will not see if opening an EVT on another machine that does not have OpsMgr installed), and where the event parameters are (%1, %2, etc – which will be the strings you see in the EVT anyway).

That way you can get an understanding of what the original message would have looked like on the original machine.

This is just one possible usage pattern of this reference. It can also be useful to just read/study the events, learning about new ones you have never encountered, or remembering those you HAVE seen in the past but did not quite remember. And of course you can also find other creative ways to use it.

You can get it from here.

 

A few last words to give due credit: this spreadsheet has been compiled by using Eventlog Explorer (http://blogs.technet.com/momteam/archive/2008/04/02/eventlog-explorer.aspx ) to extract the event information out of the message DLLs on a OpsMgr2007 R2 installation. That info has been then copied and pasted in Excel in order to have an “offline” reference. Also I would like to thank Kevin Holman for pointing me to Eventlog Explorer first, and then for insisting I should not keep this spreadsheet in my drawer, as it could be useful to more people!

How to convert (and fixup) the RedHat RPM to run on Debian/Ubuntu

Monday, June 21st, 2010

In an earlier post I had shown how I got the Xplat agent running on Ubuntu. I perfected the technique over time, and what follows is a step-by-step process on how to convert and change the RedHat package to run on Debian/Ubuntu. Of course this is still a hack… but some people asked me to detail it a bit more. At the same time, the cross platform team is working to update the the source code on codeplex with extra bits that will make more straightforward to grab it, modify it and re-compile it than it is today. Until then, here is how I got it to work.

I assume you have already copied the right .RPM package off the OpsMgr server’s /AgentManagement directory to the Linux box here. The examples below refer to the 32bit package, but of course the same identical technique would work for the 64bit version.

We start by converting the RPM package to DEB format:

root# alien -k scx-1.0.4-258.rhel.5.x86.rpm –scripts

scx_1.0.4-258_i386.deb generated

 

Then we need to create a folder where we will extract the content of the package, modify stuff, and repackage it:

root# mkdir scx_1.0.4-258_i386

root# cd scx_1.0.4-258_i386

root# ar -x ../scx_1.0.4-258_i386.deb

root# mkdir debian

root# cd debian

root# mkdir DEBIAN

root# cd DEBIAN

root# cd ../..

root# rm debian-binary

root# mv control.tar.gz debian/DEBIAN/

root# mv data.tar.gz debian/

root# cd debian

root# tar -xvzf data.tar.gz

root# rm data.tar.gz

root# cd DEBIAN/

root# tar -xvzf control.tar.gz

root# rm control.tar.gz

Now we have the “skeleton” of the package easily laid out on the filesystem and we are ready to modify the package and add/change stuff to and in it.

 

First, we need to add some stuff to it, which is expected to be found on a redhat distro, but is not present in debian. In particular:

1. You should copy the file “functions” (that you can get from a redhat/centos box under /etc/init.d) under the debian/etc/init.d folder in our package folder. This file is required/included by our startup scripts, so it needs to be deployed too.

Then we need to chang some of the packacge behavior by editing files under debian/DEBIAN:

2. edit the “control” file (a file describing what the package is, and does):

clip_image002

3. edit the “preinst” file (pre-installation instructions): we need to add instructions to copy the “issue” file onto “redhat-release” (as the SCX_OperatingSystem class will look into that file, and this is hard-coded in the binary, we need to let it find it):

clip_image004

these are the actual command lines to add for both packages (DEBIAN or UBUNTU):

# symbolic links for libaries called differently on Ubuntu and Debian vs. RedHat

ln -s /usr/lib/libcrypto.so.0.9.8 /usr/lib/libcrypto.so.6

ln -s /usr/lib/libssl.so.0.9.8 /usr/lib/libssl.so.6

the following bit would be Ubuntu-specific:

#we need this file for the OS provider relies on it, so we convert what we have in /etc/issue

#this is ok for Ubuntu (“Ubuntu 9.0.4 \n \l” becomes “Ubuntu 9.0.4”)

cat /etc/issue | awk '/\\n/ {print $1, $2}' > /etc/redhat-release

while the following bit is Debian-specific:

#this is ok for Debian (“Debian GNU/Linux 5.0 \n \l” becomes “Debian GNU/Linux 5.0”)

cat /etc/issue | awk '/\\n/ {print $1, $2, $3}' > /etc/redhat-release

 

4. Then we edit/modify the “postinst” file (post-installation instructions) as follows:

a. remove the 2nd and 3rd lines which look like the following

RPM_INSTALL_PREFIX=

export RPM_INSTALL_PREFIX

as they are only useful for the RPM system, not DEB/APT, so we don’t need them.

b. change the following 2 functions which contain RedHat-specific commands:

configure_pegasus_service() {

           /usr/lib/lsb/install_initd /etc/init.d/scx-cimd

}

start_pegasus_service() {

           service scx-cimd start

}

c. We need to change in the Debian equivalents for registering a service in INIT and starting it:

configure_pegasus_service() {

               update-rc.d scx-cimd defaults

}

start_pegasus_service() {

              /etc/init.d/scx-cimd start

}

5. Modify the “prerm” file (pre-removal instructions):

a. Just like “postinst”, remove the lines

RPM_INSTALL_PREFIX=

export RPM_INSTALL_PREFIX

b. Locate the two functions stopping and un-installing the service

stop_pegasus_service() {

         service scx-cimd stop

}

unregister_pegasus_service() {

          /usr/lib/lsb/remove_initd /etc/init.d/scx-cimd

}

c. Change those two functions with the Debian-equivalent command lines

stop_pegasus_service() {

           /etc/init.d/scx-cimd stop

}

unregister_pegasus_service() {

           update-rc.d -f scx-cimd remove

}

At this point the change we needed have been put in place, and we can re-build the DEB package.

Move yourself in the main folder of the application (the scx_1.0.4-258_i386 folder):

root# cd ../..

Create the package starting from the folders

root# dpkg-deb –build debian

dpkg-deb: building package `scx' in `debian.deb'.

Rename the package (for Ubuntu)

root# mv debian.deb scx_1.0.4-258_Ubuntu_9_i386.deb

Rename the package (for Debian)

root# mv debian.deb scx_1.0.4-258_Debian_5_i386.deb

Install it

root# dpkg -i scx_1.0.4-258_Platform_Version_i386.deb

All done! It should install and work!

 

Next step would be creating a Management Pack to monitor Debian and Ubuntu. It is pretty similar to what Robert Hearn has described step by step for CentOS, but with some different replacements of strings, as you can imagine. I have done this but have not written down the procedure yet, so I will post another article on how to do this as soon as I manage to get it standardized and reliable. There is a bit more work involved for Ubuntu/Debian… as some of the daemons/services have different names, and certain files too… but nothing terribly difficult to change so you might want to try it already and have a go at it!

In the meantime, as a teaser, here’s my server’s (http://www.muscetta.com) performance, being monitored with this “hack”:

image

 

Disclaimer

The information in this weblog is provided "AS IS" with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my own personal opinion. All code samples are provided "AS IS" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.
THIS WORK IS NOT ENDORSED AND NOT EVEN CHECKED, AUTHORIZED, SCRUTINIZED NOR APPROVED BY MY EMPLOYER, AND IT ONLY REPRESENT SOMETHING WHICH I'VE DONE IN MY FREE TIME. NO GUARANTEE WHATSOEVER IS GIVEN ON THIS. THE AUTHOR SHALL NOT BE MADE RESPONSIBLE FOR ANY DAMAGE YOU MIGHT INCUR WHEN USING THIS INFORMATION. The solution presented here IS NOT SUPPORTED by Microsoft.

Audit Collection Services Database Partitions Size Report

Wednesday, May 5th, 2010

A number of people I have talked to liked my previous post on ACS sizing. One thing that was not extremely easy or clear to them in that post was *how* exactly I did one thing I wrote:

[…] use the dtEvent_GUID table to get the number of events for that day, and use the stored procedure “sp_spaceused”  against that same table to get an overall idea of how much space that day is taking in the database […]

To be completely honest, I do not expect people to do this manually a hundred times if they have a hundred partitions. In fact, I have been doing this for a while with a script which will do the looping for me and run that sp_spaceused for me a number of time. I cannot share that script, but I do realize that this automation is very useful, therefore I wrote a “stand-alone” SQL query which, using a couple of temporary tables, produces a similar type of output. I also went a step further and packaged it into a SQL Server Reporting Services Report for everyone’s consumption. The report should look like the following screenshot, featuring a chart and the table with the numerical information about each and every partition in the database:

ACS Partitions Report

You can download the report from here.

You need to upload it to your report server, and change the data source to the shared Data Source that also the built-in ACS Reports use, and it should work.

[NOTE/UPDATE May 4th 2011: This report has a few bugs. I have posted the updated query on http://www.muscetta.com/2011/05/04/improved-acs-partitions-query/ . I am sorry I can't provide a ready made report with the fix right now. Make sure you understand this and don't implement it without testing.]

Enjoy!