I just saw that my former colleague (PFE) Tristan has posted an interesting note about the use of SetSPN “–A” vs SetSPN “–S”. I normally don’t repost other people’s content, but I thought this would be useful as there are a few SPN used in OpsMgr and it is not always easy to get them all right… and you can find a few tricks I was not aware of, by reading his post.
Anyhow, when checking for the actual failover paths that have been configured, the use of Powershell suggested by Jimmy is rather slow – especially if your agent count is high. In the Operations Manager Health Check tool I was also using that technique at the beginning, but eventually moved to the use of SQL queries just for performance reasons. Since then, we have been using these SQL queries quite successfully for about 3 years now.
But this the season of giving… and I guess SQL Queries can be a gift, right? Therefore I am now donating them as Christmas Gift to the OpsMrg community
Enjoy – and Merry Christmas!
--GetAgentForWhichServerIsPrimarySELECT SourceBME.DisplayName as Agent,TargetBME.DisplayName as Server
FROM Relationship RWITH(NOLOCK)JOIN BaseManagedEntity SourceBME
ONR.SourceEntityID = SourceBME.BaseManagedEntityID
JOIN BaseManagedEntity TargetBME
ONR.TargetEntityID = TargetBME.BaseManagedEntityID
WHERER.RelationshipTypeId = dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceCommunication()AND SourceBME.DisplayName notin(select DisplayName
from dbo.ManagedEntityGenericView WITH(NOLOCK)where MonitoringClassId in(select ManagedTypeId
from dbo.ManagedType WITH(NOLOCK)where TypeName ='Microsoft.SystemCenter.GatewayManagementServer')and IsDeleted ='0')AND SourceBME.DisplayName notin(select DisplayName from dbo.ManagedEntityGenericView WITH(NOLOCK)where MonitoringClassId in(select ManagedTypeId from dbo.ManagedType WITH(NOLOCK)where TypeName ='Microsoft.SystemCenter.ManagementServer')and IsDeleted ='0')ANDR.IsDeleted ='0'--GetAgentForWhichServerIsFailoverSELECT SourceBME.DisplayName as Agent,TargetBME.DisplayName as Server
FROM Relationship RWITH(NOLOCK)JOIN BaseManagedEntity SourceBME
ONR.SourceEntityID = SourceBME.BaseManagedEntityID
JOIN BaseManagedEntity TargetBME
ONR.TargetEntityID = TargetBME.BaseManagedEntityID
WHERER.RelationshipTypeId = dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceSecondaryCommunication()AND SourceBME.DisplayName notin(select DisplayName
from dbo.ManagedEntityGenericView WITH(NOLOCK)where MonitoringClassId in(select ManagedTypeId
from dbo.ManagedType WITH(NOLOCK)where TypeName ='Microsoft.SystemCenter.GatewayManagementServer')and IsDeleted ='0')AND SourceBME.DisplayName notin(select DisplayName
from dbo.ManagedEntityGenericView WITH(NOLOCK)where MonitoringClassId in(select ManagedTypeId
from dbo.ManagedType WITH(NOLOCK)where TypeName ='Microsoft.SystemCenter.ManagementServer')and IsDeleted ='0')ANDR.IsDeleted ='0'--GetGatewayForWhichServerIsPrimarySELECT SourceBME.DisplayName as Gateway, TargetBME.DisplayName as Server
FROM Relationship RWITH(NOLOCK)JOIN BaseManagedEntity SourceBME
ONR.SourceEntityID = SourceBME.BaseManagedEntityID
JOIN BaseManagedEntity TargetBME
ONR.TargetEntityID = TargetBME.BaseManagedEntityID
WHERER.RelationshipTypeId = dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceCommunication()AND SourceBME.DisplayName in(select DisplayName
from dbo.ManagedEntityGenericView WITH(NOLOCK)where MonitoringClassId in(select ManagedTypeId
from dbo.ManagedType WITH(NOLOCK)where TypeName ='Microsoft.SystemCenter.GatewayManagementServer')and IsDeleted ='0')ANDR.IsDeleted ='0'--GetGatewayForWhichServerIsFailoverSELECT SourceBME.DisplayName As Gateway, TargetBME.DisplayName as Server
FROM Relationship RWITH(NOLOCK)JOIN BaseManagedEntity SourceBME
ONR.SourceEntityID = SourceBME.BaseManagedEntityID
JOIN BaseManagedEntity TargetBME
ONR.TargetEntityID = TargetBME.BaseManagedEntityID
WHERER.RelationshipTypeId = dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceSecondaryCommunication()AND SourceBME.DisplayName in(select DisplayName
from dbo.ManagedEntityGenericView WITH(NOLOCK)where MonitoringClassId in(select ManagedTypeId
from dbo.ManagedType WITH(NOLOCK)where TypeName ='Microsoft.SystemCenter.GatewayManagementServer')and IsDeleted ='0')ANDR.IsDeleted ='0'--xplat agentsselect bme2.DisplayName as XPlatAgent, bme.DisplayName as Server
from dbo.Relationship rwith(nolock)join dbo.RelationshipType rt with(nolock)onr.RelationshipTypeId = rt.RelationshipTypeId
join dbo.BasemanagedEntity bme with(nolock)on bme.basemanagedentityid =r.SourceEntityId
join dbo.BasemanagedEntity bme2 with(nolock)onr.TargetEntityId = bme2.BaseManagedEntityId
where rt.RelationshipTypeName ='Microsoft.SystemCenter.HealthServiceManagesEntity'and bme.IsDeleted =0andr.IsDeleted =0and bme2.basemanagedtypeid in(SELECT DerivedTypeId
FROM DerivedManagedTypes with(nolock)WHERE BaseTypeId =(select managedtypeid
from managedtype where typename ='Microsoft.Unix.Computer')and DerivedIsAbstract =0)
I work in support (mostly with System Center Operations Manager, as you know), and I work with event logs every day. The following are typical situations:
I get a colleague or a customer telling me “I am having a problem and the SCOM agent is showing 21037 events and 20002 events. What’s wrong with it?”
I exported an EVT from a customer environment and I load it on a machine that does not have OpsMgr message DLLs installed – all I see are EventIDs and type (Warning, Error) – but no real description – and I still want to figure out what those events are trying to tell me.
Getting to the point: I, like everyone – don’t have every OpsMgr event memorized.
This is why I thought of building this spreadsheet, and I hope it might come in handy to more people.
The spreadsheet contains an “AllEvents” list – and then the same events are broken down by event source as well:
When you want to search for an events (in one of the situations described above) just open up the spreadsheet, go to the “AllEvents” tab, hit CTRL+F (“Find”) and type in the Event ID you are searching for:
And this will take you to the row containing the event, so you can look up its description:
The description shows the event standard text (which is in the message DLL, therefore is the part you will not see if opening an EVT on another machine that does not have OpsMgr installed), and where the event parameters are (%1, %2, etc – which will be the strings you see in the EVT anyway).
That way you can get an understanding of what the original message would have looked like on the original machine.
This is just one possible usage pattern of this reference. It can also be useful to just read/study the events, learning about new ones you have never encountered, or remembering those you HAVE seen in the past but did not quite remember. And of course you can also find other creative ways to use it.
A few last words to give due credit: this spreadsheet has been compiled by using Eventlog Explorer (http://blogs.technet.com/momteam/archive/2008/04/02/eventlog-explorer.aspx ) to extract the event information out of the message DLLs on a OpsMgr2007 R2 installation. That info has been then copied and pasted in Excel in order to have an “offline” reference. Also I would like to thank Kevin Holman for pointing me to Eventlog Explorer first, and then for insisting I should not keep this spreadsheet in my drawer, as it could be useful to more people!
A number of people I have talked to liked my previous post on ACS sizing. One thing that was not extremely easy or clear to them in that post was *how* exactly I did one thing I wrote:
[…] use the dtEvent_GUID table to get the number of events for that day, and use the stored procedure “sp_spaceused” against that same table to get an overall idea of how much space that day is taking in the database […]
To be completely honest, I do not expect people to do this manually a hundred times if they have a hundred partitions. In fact, I have been doing this for a while with a script which will do the looping for me and run that sp_spaceused for me a number of time. I cannot share that script, but I do realize that this automation is very useful, therefore I wrote a “stand-alone” SQL query which, using a couple of temporary tables, produces a similar type of output. I also went a step further and packaged it into a SQL Server Reporting Services Report for everyone’s consumption. The report should look like the following screenshot, featuring a chart and the table with the numerical information about each and every partition in the database:
You need to upload it to your report server, and change the data source to the shared Data Source that also the built-in ACS Reports use, and it should work.
[NOTE/UPDATE May 4th 2011: This report has a few bugs. I have posted the updated query on http://www.muscetta.com/2011/05/04/improved-acs-partitions-query/ . I am sorry I can’t provide a ready made report with the fix right now. Make sure you understand this and don’t implement it without testing.]
People were already collecting logs with MOM, so why not the security log? Some people were doing that, but it did not scale enough; for this reason, a few years ago Eric Fitzgerald announced that he was working on Microsoft Audit Collection System. Anyhow, the tool as it was had no interface… and the rest is history: it has been integrated into System Center Operations Manager. Anyhow, ACS remains a lesser-known component of OpsMgr.
There are a number of resources on the web that is worth mentioning and linking to:
and, of course, many more, I cannot link them all.
As for myself, I have been playing with ACS since those early beta days (before I joined Microsoft and before going back to MOM, when I was working in Security), but I never really blogged about this piece.
Since I have been doing quite a lot of work around ACS lately, again, I thought it might be worth consolidating some thoughts about it, hence this post.
Anatomy of an “Online” Sizing Calculation
What I would like to explain here is the strategy and process I go thru when analyzing the data stored in a ACS database, in order to determine a filtering strategy: what to keep and what not to keep, by applying a filter on the ACS Collector.
So, the first thing I usually start with is using one of the many “ACS sizer” Excel spreadsheets around… which usually tell you that you need more space than it really is necessary… basically giving you a “worst case” scenario. I don’t know how some people can actually do this from a purely theoretical point of view, but I usually prefer a bottom up approach: I look at the actual data that the ACS is collecting without filters, and start from there for a better/more accurate sizing.
In the case of a new install this is easy – you just turn ACS on, set the retention to a few days (one or two weeks maximum), give the DB plenty of space to make sure it will make it, add all your forwarders… sit back and wait.
Then you come back 2 weeks later and start looking at the data that has been collected.
What/How much data are we collecting?
First of all, if we have not changed the default settings, the grooming and partitioning algorithm will create new partitioned tables every day. So my first step is to see how big each “partition” is.
But… what is a partition, anyway? A partition is a set of 4 tables joint together:
dtEvent_GUID
dtEventData_GUID
dtPrincipal_GUID
dtSTrings_GUID
where GUID is a new GUID every day, and of course the 4 tables that make up a daily partition will have the same GUID.
The dtPartition table contains a list of all partitions and their GUIDs, together with their start and closing time.
Just to get a rough estimate we can ignore the space used by the last three tables – which are usually very small – and only use the dtEvent_GUID table to get the number of events for that day, and use the stored procedure “sp_spaceused” against that same table to get an overall idea of how much space that day is taking in the database.
By following this process, I come up with something like the following:
Partition ID
Status
Partition Start Time
Partition Close Time
Rows
Reserved KB
Total GB
9b45a567_c848_4a32_9c35_39b402ea0ee2
0
2/1/2010 2:00
2/1/2010 2:00
29,749,366
7,663,488
7,484
8d8c8ee1_4c5c_4dea_b6df_82233c52e346
2
1/31/2010 2:00
2/1/2010 2:00
28,067,438
9,076,904
8,864
34ce995b_689b_46ae_b9d3_c644cfb66e01
2
1/30/2010 2:00
1/31/2010 2:00
30,485,110
9,857,896
9,627
bb7ea5d3_f751_473a_a835_1d1d42683039
2
1/29/2010 2:00
1/30/2010 2:00
48,464,952
15,670,792
15,304
ee262692_beae_4d81_8079_470a54567946
2
1/28/2010 2:00
1/29/2010 2:00
48,980,178
15,836,416
15,465
7984b5b8_ddea_4e9c_9e51_0ee7a413b4c9
2
1/27/2010 2:00
1/28/2010 2:00
51,295,777
16,585,408
16,197
d93b9f0e_2ec3_4f61_b5e0_b600bbe173d2
2
1/26/2010 2:00
1/27/2010 2:00
53,385,239
17,262,232
16,858
8ce1b69a_7839_4a05_8785_29fd6bfeda5f
2
1/25/2010 2:00
1/26/2010 2:00
55,997,546
18,105,840
17,681
19aeb336_252d_4099_9a55_81895bfe5860
2
1/24/2010 2:00
1/24/2010 2:00
28,525,304
7,345,120
7,173
1cf70e01_3465_44dc_9d5c_4f3700dc408a
2
1/23/2010 2:00
1/23/2010 2:00
26,046,092
6,673,472
6,517
f5ec207f_158c_47a8_b15f_8aab177a6305
2
1/22/2010 2:00
1/22/2010 2:00
47,818,322
12,302,208
12,014
b48dabe6_a483_4c60_bb4d_93b7d3549b3e
2
1/21/2010 2:00
1/21/2010 2:00
55,060,150
14,155,392
13,824
efe66c10_0cf2_4327_adbf_bebb97551c93
2
1/20/2010 2:00
1/20/2010 2:00
58,322,217
15,029,216
14,677
0231463e_8d50_4a42_a834_baf55e6b4dcd
2
1/19/2010 2:00
1/19/2010 2:00
61,257,393
15,741,248
15,372
510acc08_dc59_482e_a353_bfae1f85e648
2
1/18/2010 2:00
1/18/2010 2:00
64,579,122
16,612,512
16,223
If you have just installed ACS and let it run without filters with your agents for a couple of weeks, you should get some numbers like those above for your “couple of weeks” of analysis. If you graph your numbers in Excel (both size and number of rows/events per day) you should get some similar lines that show a pattern or trend:
So, in my example above, we can clearly observe a “weekly” pattern (monday-to-friday being busier than the weekend) and we can see that – for that environment – the biggest partition is roughly 17GB. If we round this up to 20GB – and also considering the weekends are much quieter – we can forecast 20*7 = 140GB per week. This has an excess “buffer” which will let the system survive event storms, should they happen. We also always recommend having some free space to allow for re-indexing operations.
In fact, especially when collecting everything without filters, the daily size is a lot less predictable: imagine worms “trying out” administrator account’s passwords, and so on… those things can easily create event storms.
Anyway, in the example above, the customer would have liked to keep 6 MONTHS (180days) of data online, which would become 20*180 = 3600GB = THREE TERABYTE and a HALF! Therefore we need a filtering strategy – and badly – to reduce this size.
[edited on May 7th 2010 – if you want to automate the above analysis and produce a table and graphs like those just shown, you should look at my following post.]
Filtering Strategies
Ok, then we need to look at WHAT actually comprises that amount of events we are collecting without filters. As I wrote above, I usually run queries to get this type of information.
I will not get into HOW TO write a filter here – a collector’s filter is a WMI notification query and it is already described pretty well elsewhere how to configure it.
Here, instead, I want to walk thru the process and the queries I use to understand where the noise comes from and what could be filtered – and get an estimate of how much space we could be saving if filter one way or another.
Number of Events per User
–event count by User (with Percentages) declare @total float select @total = count(HeaderUser) from AdtServer.dvHeader select count(HeaderUser),HeaderUser, cast(convert(float,(count(HeaderUser)) / (convert(float,@total)) * 100) as decimal(10,2)) from AdtServer.dvHeader group by HeaderUser order by count(HeaderUser) desc
In our example above, over the 14 days we were observing, we obtained percentages like the following ones:
#evt
HeaderUser Account
Percent
204,904,332
SYSTEM
40.79 %
18,811,139
LOCAL SERVICE
3.74 %
14,883,946
ANONYMOUS LOGON
2.96 %
10,536,317
appintrauser
2.09 %
5,590,434
mossfarmusr
…
Just by looking at this, it is pretty clear that filtering out events tracked by the accounts “SYSTEM”, “LOCAL SERVICE” and “ANONYMOUS”, we would save over 45% of the disk space!
Number of Events by EventID
Similarly, we can look at how different Event IDs have different weights on the total amount of events tracked in the database:
–event count by ID (with Percentages) declare @total float select @total = count(EventId) from AdtServer.dvHeader select count(EventId),EventId, cast(convert(float,(count(EventId)) / (convert(float,@total)) * 100) as decimal(10,2)) from AdtServer.dvHeader group by EventId order by count(EventId) desc
We would get some similar information here:
Event ID
Meaning
Sum of events
Percent
538
A user logged off
99,494,648
27.63
540
Successful Network Logon
97,819,640
27.16
672
Authentication Ticket Request
52,281,129
14.52
680
Account Used for Logon by (Windows 2000)
35,141,235
9.76
576
Specified privileges were added to a user’s access token.
26,154,761
7.26
8086
Custom Application ID
18,789,599
5.21
673
Service Ticket Request
10,641,090
2.95
675
Pre-Authentication Failed
7,890,823
2.19
552
Logon attempt using explicit credentials
4,143,741
1.15
539
Logon Failure – Account locked out
2,383,809
0.66
528
Successful Logon
1,764,697
0.49
Also, do not forget that ACS provides some report to do this type of analysis out of the box, even if for my experience they are generally slower – on large datasets – than the queries provided here. Also, a number of reports have been buggy over time, so I just prefer to run queries and be on the safe side.
Below an example of such report (even if run against a different environment – just in case you were wondering why the numbers were not the same ones :-)):
The numbers and percentages we got from the two queries above should already point us in the right direction about what we might want to adjust in either our auditing policy directly on Windows and/or decide if there is something we want to filter out at the collector level (here you should ask yourself the question: “if they aren’t worth collecting are they worth generating?” – but I digress).
Also, a permutation of the above two queries should let you see which user is generating the most “noise” in regards to some events and not other ones… for example:
–event distribution for a specific user (change the @user) – with percentages for the user and compared with the total #events in the DB declare @user varchar(255) set @user = ‘SYSTEM’ declare @total float select @total = count(Id) from AdtServer.dvHeader declare @totalforuser float select @totalforuser = count(Id) from AdtServer.dvHeader where HeaderUser = @user select count(Id), EventID, cast(convert(float,(count(Id)) / convert(float,@totalforuser) * 100) as decimal(10,2)) as PercentageForUser, cast(convert(float,(count(Id)) / (convert(float,@total)) * 100) as decimal(10,2)) as PercentageTotal from AdtServer.dvHeader where HeaderUser = @user group by EventID order by count(Id) desc
The above is particularly important, as we might want to filter out a number of events for the SYSTEM account (i.e. logons that occur when starting and stopping services) but we might want to keep other events that are tracked by the SYSTEM account too, such as an administrator having wiped the Security Log clean – which might be something you want to keep:
of course the amount of EventIDs 517 over the total of events tracked by the SYSTEM account will not be as many, and we can still filter the other ones out.
Number of Events by EventID and by User
We could also combine the two approaches above – by EventID and by User:
select count(Id),HeaderUser, EventId
from AdtServer.dvHeader
group by HeaderUser, EventId
order by count(Id) desc
This will produce a table like the following one
which can be easily copied/pasted into Excel in order to produce a pivot Table:
Cluster EventLog Replication
One more aspect that is less widely known, but I think is worth showing, is the way that clusters behave when in ACS. I don’t mean all clusters… but if you keep the “eventlog replication” feature of clusters enabled (you should disable it also from a monitoring perspective, but I digress), each cluster node’s security eventlog will have events not just for itself, but for all other nodes as well.
Albeit I have not found a reliable way to filter out – other than disabling eventlog replication altogether.
Anyway, just to get an idea of how much this type of “duplicate” events weights on the total, I use the following query, that tells you how many events for each machine are tracked by another machine:
–to spot machines that are cluster nodes with eventlog repliation and write duplicate events (slow)
select Count(Id) as Total,replace(right(AgentMachine, (len(AgentMachine) – patindex(‘%\%’,AgentMachine))),’$’,”) as ForwarderMachine, EventMachine
from AdtServer.dvHeader
–where ForwarderMachine <> EventMachine
group by EventMachine,replace(right(AgentMachine, (len(AgentMachine) – patindex(‘%\%’,AgentMachine))),’$’,”)
order by ForwarderMachine,EventMachine
Those presented above are just some of the approaches I usually look into at first. Of course there are a number more. Here I am including the same queries already shown in action, plus a few more that can be useful in this process.
I have even considered building a page with all these queries – a bit like those that Kevin is collecting for OpsMgr (we actually wrote some of them together when building the OpsMgr Health Check)… shall I move the below queries on such a page? I though I’d list them here and give some background on how I normally use them, to start off with.
Some more Useful Queries
–top event ids select count(EventId), EventId from AdtServer.dvHeader group by EventId order by count(EventId) desc
–event count by ID (with Percentages) declare @total float select @total = count(EventId) from AdtServer.dvHeader select count(EventId),EventId, cast(convert(float,(count(EventId)) / (convert(float,@total)) * 100) as decimal(10,2)) from AdtServer.dvHeader group by EventId order by count(EventId) desc
–which machines have ever written event 538 select distinct EventMachine, count(EventId) as total from AdtServer.dvHeader where EventID = 538 group by EventMachine
–machines select * from dtMachine
–machines (more readable) select replace(right(Description, (len(Description) – patindex(‘%\%’,Description))),’$’,”) from dtMachine
–events by machine select count(EventMachine), EventMachine from AdtServer.dvHeader group by EventMachine
–rows where EventMachine field not available (typically events written by ACS itself for chekpointing) select * from AdtServer.dvHeader where EventMachine = ‘n/a’
–event count by day select convert(varchar(20), CreationTime, 102) as Date, count(EventMachine) as total from AdtServer.dvHeader group by convert(varchar(20), CreationTime, 102) order by convert(varchar(20), CreationTime, 102)
–event count by day and by machine select convert(varchar(20), CreationTime, 102) as Date, EventMachine, count(EventMachine) as total from AdtServer.dvHeader group by EventMachine, convert(varchar(20), CreationTime, 102) order by convert(varchar(20), CreationTime, 102)
–event count by machine and by date (distinuishes between AgentMachine and EventMachine select convert(varchar(10),CreationTime,102),Count(Id),EventMachine,AgentMachine from AdtServer.dvHeader group by convert(varchar(10),CreationTime,102),EventMachine,AgentMachine order by convert(varchar(10),CreationTime,102) desc ,EventMachine
–event count by User select count(Id),HeaderUser from AdtServer.dvHeader group by HeaderUser order by count(Id) desc
–event count by User (with Percentages) declare @total float select @total = count(HeaderUser) from AdtServer.dvHeader select count(HeaderUser),HeaderUser, cast(convert(float,(count(HeaderUser)) / (convert(float,@total)) * 100) as decimal(10,2)) from AdtServer.dvHeader group by HeaderUser order by count(HeaderUser) desc
–event distribution for a specific user (change the @user) – with percentages for the user and compared with the total #events in the DB declare @user varchar(255) set @user = ‘SYSTEM’ declare @total float select @total = count(Id) from AdtServer.dvHeader declare @totalforuser float select @totalforuser = count(Id) from AdtServer.dvHeader where HeaderUser = @user select count(Id), EventID, cast(convert(float,(count(Id)) / convert(float,@totalforuser) * 100) as decimal(10,2)) as PercentageForUser, cast(convert(float,(count(Id)) / (convert(float,@total)) * 100) as decimal(10,2)) as PercentageTotal from AdtServer.dvHeader where HeaderUser = @user group by EventID order by count(Id) desc
–to spot machines that write duplicate events (such as cluster nodes with eventlog replication enabled) select Count(Id),EventMachine,AgentMachine from AdtServer.dvHeader group by EventMachine,AgentMachine order by EventMachine
–to spot machines that are cluster nodes with eventlog repliation and write duplicate events (better but slower) select Count(Id) as Total,replace(right(AgentMachine, (len(AgentMachine) – patindex(‘%\%’,AgentMachine))),’$’,”) as ForwarderMachine, EventMachine from AdtServer.dvHeader –where ForwarderMachine <> EventMachine group by EventMachine,replace(right(AgentMachine, (len(AgentMachine) – patindex(‘%\%’,AgentMachine))),’$’,”) order by ForwarderMachine,EventMachine
–which user and from which machine is target of elevation (network service doing “runas” is a 552 event) select count(Id),EventMachine, TargetUser from AdtServer.dvHeader where HeaderUser = ‘NETWORK SERVICE’ and EventID = 552 group by EventMachine, TargetUser order by count(Id) desc
–by hour, minute and user –(change the timestamp)… this query is useful to search which users are active in a given time period… –helpful to spot “peaks” of activities such as password brute force attacks, or other activities limited in time. select datepart(hour,CreationTime) as Hours, datepart(minute,CreationTime) as Minutes, HeaderUser, count(Id) as total from AdtServer.dvHeader where CreationTime < ‘2010-02-22T16:00:00.000’ and CreationTime > ‘2010-02-22T15:00:00.000’ group by datepart(hour,CreationTime), datepart(minute,CreationTime),HeaderUser order by datepart(hour,CreationTime), datepart(minute,CreationTime),HeaderUser
cmdlet Get-Credential at command pipeline position 1 Supply values for the following parameters: Credential
But we do get this error:
Test-WSMan : The server certificate on the destination computer (virtubuntu.huis.dom:1270) has the following errors: The SSL certificate could not be checked for revocation. The server used to check for revocation might be unreachable.
The SSL certificate is signed by an unknown certificate authority. At line:1 char:11 + test-wsman <<<< -computer virtubuntu.huis.dom -port 1270 -authentication basic -credential (get-credential) -usessl + CategoryInfo : InvalidOperation: (:) [Test-WSMan], InvalidOperationException + FullyQualifiedErrorId : WsManError,Microsoft.WSMan.Management.TestWSManCommand
The credentials above have to be a unix login. Which we typed correctly. But we still can’t get thru, as the certificate used by the agent is not trusted by our workstation. This seems to be the “usual” issue I first faced when testing SCX with WINRM in beta1. At the time I simply dismissed it with the following sentence
[…] Of course you have to solve some other things such as DNS resolution AND trusting the self-issued certificates that the agent uses, first. Once you have done that, you can run test queries from the Windows box towards the Unix ones by using WinRM. […]
and I sincerely thought that it would explain pretty well… but eventually a lot of people got confused by this and did not know what to do, especially for the part that goes about trusting the certificate. Anyway, in the following posts I figured out you could pass the –skipCACheck parameter to WINRM… which solved the issue with having to trust the certificate (which is fine for testing, but I would not use that for automations and scripts running in production… as it might expose your credentials to man-in-the-middle attacks).
So it seems that with the Powershell cmdlets we are back to that issue, as I can’t find a parameter to skip the CA check. Maybe it is there, but with PSv2 not having been released yet, I don’t know everything about it, and the CTP documentation is not yet complete. Therefore, back to trusting the certificate.
Trusting the certificate is actually very simple, but it can be a bit tricky when passing those certs back and forth from unix to windows. So let’s make the process a bit clearer.
All of the SCX-agents certificates are ultimately signed by a key on the Management server that has discovered them, but I don’t currently know where that certificate/key is stored on the management server. Anyway, you can get it from the agent certificate – as you only really need the public key, not the private signing key.
Use WinSCP or any other utility to copy the certificate off one of the agents. You can find that in the /etc/opt/microsoft/scx/ssl location:
that scx-host-computername.pem is your agent certificate.
Copy it to the Management server and change its extension from .pem to .cer. Now Windows will be happy to show it to you with the usual Certificate interface:
We need to go to the “Certification Path” tab, select the ISSUER certificate (the one called “SCX-Certificate”):
then go to the “Details” tab, and use the “Copy to File” button to export the certificate.
After you have the certificate in a .CER file, you can add it to the “trusted root certification authorities” store on the computer you are running your powershell tests from.
So after you have trusted it, the same command as above actually works now:
cmdlet Get-Credential at command pipeline position 1 Supply values for the following parameters: Credential
wsmid : http://schemas.dmtf.org/wbem/wsman/identify/1/wsmanidentity.xsd lang : ProtocolVersion : http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd ProductVendor : Microsoft System Center Cross Platform ProductVersion : 1.0.4-248
Ok, we can talk to it! Now we can do something funnier, like actually returning instances and/or calling methods:
This is far from exhaustive, but should get you started on a world of possibilities about automating diagnostics and responses with Powershell v2 towards the OpsMgr 2007 R2 Cross-Platform machines. Enjoy!
Disclaimer
The information in this weblog is provided “AS IS” with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my own personal opinion. All code samples are provided “AS IS” without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose. THIS WORK IS NOT ENDORSED AND NOT EVEN CHECKED, AUTHORIZED, SCRUTINIZED NOR APPROVED BY MY EMPLOYER, AND IT ONLY REPRESENT SOMETHING WHICH I’VE DONE IN MY FREE TIME. NO GUARANTEE WHATSOEVER IS GIVEN ON THIS. THE AUTHOR SHALL NOT BE MADE RESPONSIBLE FOR ANY DAMAGE YOU MIGHT INCUR WHEN USING THIS INFORMATION. The solution presented here IS NOT SUPPORTED by Microsoft.
As it is becoming a tradition for me with each new release, I want to take a look at the Unix Monitoring stuff like I did since beta1 of Xplat, passing thru beta2. I am an integration freak and I have always insisted that interoperability is key. I will leave the most obvious “release notes” kind of things out of here, such as saying that there are now agents for the x64 version of linux distro’s, and so on…. you can read this stuff in the release notes already and in a zillion of other places.
Let’s instead look at my first impression ( = I am amazed: this product is really getting awesome) and let’s do a bit of digging, mostly to note what changed since my previous posts on Xplat (which, by the way, is the MOST visited post on this blog I ever published) – of course there is A LOT more that has changed under the hood… but those are code changes, improvements, polishing of the product itself… while that would be interesting from a code perspective, here I am more interested in what the final user (the System Administrator) will ultimately interact with directly, and what he might need to troubleshoot and understand how the pieces fit together to realize Unix Monitoring in OpsMgr.
After having hacked the RedHat MP to work on my CentOS box (as usual), I started to take a look at what is installed on the Linux box. Here are the new services:
You will notice the daemons have changed names and get launched with new parameters.
Of course when you see who uses port 1270 everything becomes clearer:
Therefore I can place the two new names and understand that SCXCIMSERVER is the WSMAN implementation, while SCXCIMPROVAGT is the CIM/WBEM implementation.
There is one more difference at the “service” (or “daemon”) level: the fact that there is only ONE init script now: /etc/init.d/scx-cimd
So basically the SCX “Agent” will start and stop as a single thing, even if it is composed of multiple executables that will spawn various processes.
But that is because I have been told we now have a nice new tool called scxadmin under /opt/microsoft/scx/bin/tools/ , which will let you configure those things:
Therefore you would enable VERBOSE logging for all components by issuing the command
./scxadmin -log-set all verbose
and you will bring it back to a less noisy setting of logging only errors with
./scxadmin -log-set all errors
the logs will be written under /var/opt/microsoft/scx/log just like they did before.
Other than this, a lot of the troubleshooting techniques I showed in one of my previous posts, like how to query CIM classes directly or thru WSMAN remotely by using winrm – they should really stay the same. I will mention them again here for reference.
SCXCIMCLI is a useful and simple tool used to query CIM directly. You can roughly compare it to wbemtest.exe in the WIndows world (other than not having a UI). This utility can also be found in /opt/microsoft/scx/bin/tools
A couple of examples of the most common/useful things you would do with scxcimcli:
1) Enumerate all Classes whose name contains “SCX_” in the root/scx namespace (the classes our Management packs use):
./scxcimcli nc -n root/scx -di |grep SCX_ | sort
2) Execute a Query
./scxcimcli xq “select * from SCX_OperatingSystem” -n root/scx
Also another thing that you might want to test when troubleshooting discoveries, is running the same queries through WS-Man (possibly from the same Management Server that will or should be managing that unix box). I already showed this in the past, it is the following command:
but if you launch it that way it will now return an error like the following (or at least it did in my test lab):
Fault Code Value = SOAP-ENV:Sender Subcode Value = wsman:EncodingLimit Reason Text = UTF-16 is not supported; Please use UTF-8 Detail FaultDetail = http://schemas.dmtf.org/wbem/wsman/1/wsman/faultDetail/CharacterSet
Error number: -2144108468 0x8033804C The WS-Management service does not support the character set used in the request . Change the request to use UTF-8 or UTF-16.
the error message is pretty self explanatory: you need to specify the UTF-8 Character set. You can do it by adding the “-encoding” qualifier:
Hope the above is useful to figure out the differences between the earlier beta releases of the System Center CrossPlatform extensions and the version built in OpsMgr 2007 R2 Release Candidate.
There are obviously a million of other things in R2 worth writing about (either related to the Unix monitoring or to everything else) and I am sure posts will start to appear on the many, more active, blogs out there (they have already started appearing, actually). I have not had time to dig further, but will likely do so AFTER Easter – as the next couple of weeks I will be travelling, working some of the time (but without my test environment and good connectivity) AND visiting relatives the rest of the time.
One last thing I noticed about the Unix/Cross Platform Management Packs in R2 Release Candidate… their current “release date” exposed by the MP Catalog Web Service is the 20th of March…
…which happens to be my Birthday – therefore they must be a present for me! 🙂
Disclaimer
The information in this weblog is provided “AS IS” with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my own personal opinion. All code samples are provided “AS IS” without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose. THIS WORK IS NOT ENDORSED AND NOT EVEN CHECKED, AUTHORIZED, SCRUTINIZED NOR APPROVED BY MY EMPLOYER, AND IT ONLY REPRESENT SOMETHING WHICH I’VE DONE IN MY FREE TIME. NO GUARANTEE WHATSOEVER IS GIVEN ON THIS. THE AUTHOR SHALL NOT BE MADE RESPONSIBLE FOR ANY DAMAGE YOU MIGHT INCUR WHEN USING THIS INFORMATION. The solution presented here IS NOT SUPPORTED by Microsoft.
On this website we use first or third-party tools that store small files (cookie) on your device. Cookies are normally used to allow the site to run properly (technical cookies), to generate navigation usage reports (statistics cookies) and to suitable advertise our services/products (profiling cookies). We can directly use technical cookies, but you have the right to choose whether or not to enable statistical and profiling cookies. Enabling these cookies, you help us to offer you a better experience. Cookie and Privacy policy