Improved ACS Partitions Query

This has been sitting on my hard drive for a long time. Long story short, the report I posted at Permanent Link to Audit Collection Services Database Partitions Size Report had a couple of bugs:

  1. it did not consider the size of the dtString_XXX tables but only the size of dtEvent_XXX tables – this would still give you an idea of the trends, but it could lead to quite different SIZE calculations
  2. the query was failing on some instances that have been installed with the wrong (unsupported) Collation settings.

I fixed both bugs, but I don’t have a machine with SQL 2005 and Visual Studio 2005 anymore… so I can’t rebuild my report – but I don’t want to distribute one that only works on SQL 2008 because I know that SQL2005 is still out there. This is partially the reason that held this post back.

Without waiting so much longer, therefore, I decided I’ll just give you the fixed query. Enjoy Smile

--Query to get the Partition Table
--for each partition we launch the sp_spaceused stored procedure to determine the size and other info

--partition list
select PartitionId,Status,PartitionStartTime,PartitionCloseTime 
into #t1
from dbo.dtPartition with (nolock)
order by PartitionStartTime Desc 

--sp_spaceused holder table for dtEvent
create table #t2 (
    PartitionId nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    rows nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    reserved nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    data nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    index_size nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    unused nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS    
)

--sp_spaceused holder table for dtString
create table #t3 (
    PartitionId nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    rows nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    reserved nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    data nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    index_size nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS,
    unused nvarchar(MAX) Collate SQL_Latin1_General_CP1_CI_AS    
)

set nocount on

--vars used for building Partition GUID and main table name
declare @partGUID nvarchar(MAX)
declare @tblName nvarchar(MAX)
declare @tblNameComplete nvarchar(MAX)
declare @schema nvarchar(MAX)
DECLARE @vQuery NVARCHAR(MAX)

--cursor
declare c cursor for 
    select PartitionID from #t1
open c
fetch next from c into @partGUID

--start cursor usage
while @@FETCH_STATUS = 0
begin

--tblName - first usage for dtEvent
set @tblName = 'dtEvent_' + @partGUID

--retrieve the schema name
SET @vQuery = 'SELECT @dbschema = TABLE_SCHEMA from INFORMATION_SCHEMA.tables where TABLE_NAME = ''' + @tblName + ''''
EXEC sp_executesql @vQuery,N'@dbschema nvarchar(max) out, @dbtblName nvarchar(max)',@schema out, @tblname

--tblNameComplete
set @tblNameComplete = @schema + '.' + @tblName

INSERT #t2 
    EXEC sp_spaceused @tblNameComplete

--tblName - second usage for dtString
set @tblName = 'dtString_' + @partGUID

--retrieve the schema name
SET @vQuery = 'SELECT @dbschema = TABLE_SCHEMA from INFORMATION_SCHEMA.tables where TABLE_NAME = ''' + @tblName + ''''
EXEC sp_executesql @vQuery,N'@dbschema nvarchar(max) out, @dbtblName nvarchar(max)',@schema out, @tblname

--tblNameComplete
set @tblNameComplete = @schema + '.' + @tblName

INSERT #t3 
    EXEC sp_spaceused @tblNameComplete

fetch next from c into @partGUID
end
close c
deallocate c

--select * from #t2
--select * from #t3

--results
select #t1.PartitionId, 
    #t1.Status, 
    #t1.PartitionStartTime, 
    #t1.PartitionCloseTime, 
    #t2.rows,
    (CAST(LEFT(#t2.reserved,LEN(#t2.reserved)-3) AS NUMERIC(18,0)) + CAST(LEFT(#t2.reserved,LEN(#t2.reserved)-3) AS NUMERIC(18,0))) as 'reservedKB', 
    (CAST(LEFT(#t2.data,LEN(#t2.data)-3) AS NUMERIC(18,0)) + CAST(LEFT(#t3.data,LEN(#t3.data)-3) AS NUMERIC(18,0)))as 'dataKB', 
    (CAST(LEFT(#t2.index_size,LEN(#t2.index_size)-3) AS NUMERIC(18,0)) + CAST(LEFT(#t3.index_size,LEN(#t3.index_size)-3) AS NUMERIC(18,0))) as 'indexKB', 
    (CAST(LEFT(#t2.unused,LEN(#t2.unused)-3) AS NUMERIC(18,0)) + CAST(LEFT(#t3.unused,LEN(#t3.unused)-3) AS NUMERIC(18,0))) as 'unusedKB'
from #t1
join #t2
on #t2.PartitionId = ('dtEvent_' + #t1.PartitionId)
join #t3
on #t3.PartitionId = ('dtString_' + #t1.PartitionId)
order by PartitionStartTime desc

--cleanup
drop table #t1
drop table #t2
drop table #t3

OpsMgr Agents and Gateways Failover Queries

The following article by Jimmy Harper explains very well how to set up agents and gateways’ failover paths thru Powershell http://blogs.technet.com/b/jimmyharper/archive/2010/07/23/powershell-commands-to-configure-gateway-server-agent-failover.aspx . This is the approach I also recommend, and that article is great – I encourage you to check it out if you haven’t done it yet!

Anyhow, when checking for the actual failover paths that have been configured, the use of Powershell suggested by Jimmy is rather slow – especially if your agent count is high. In the Operations Manager Health Check tool I was also using that technique at the beginning, but eventually moved to the use of SQL queries just for performance reasons. Since then, we have been using these SQL queries quite successfully for about 3 years now.

But this the season of giving… and I guess SQL Queries can be a gift, right? Therefore I am now donating them as Christmas Gift to the OpsMrg community Smile

Enjoy – and Merry Christmas!

 

--GetAgentForWhichServerIsPrimary
SELECT SourceBME.DisplayName as Agent,TargetBME.DisplayName as Server
FROM Relationship R WITH (NOLOCK) 
JOIN BaseManagedEntity SourceBME 
ON R.SourceEntityID = SourceBME.BaseManagedEntityID 
JOIN BaseManagedEntity TargetBME 
ON R.TargetEntityID = TargetBME.BaseManagedEntityID 
WHERE R.RelationshipTypeId = dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceCommunication() 
AND SourceBME.DisplayName not in (select DisplayName 
from dbo.ManagedEntityGenericView WITH (NOLOCK) 
where MonitoringClassId in (select ManagedTypeId 
from dbo.ManagedType WITH (NOLOCK) 
where TypeName = 'Microsoft.SystemCenter.GatewayManagementServer') 
and IsDeleted ='0') 
AND SourceBME.DisplayName not in (select DisplayName from dbo.ManagedEntityGenericView WITH (NOLOCK) 
where MonitoringClassId in (select ManagedTypeId from dbo.ManagedType WITH (NOLOCK) 
where TypeName = 'Microsoft.SystemCenter.ManagementServer') 
and IsDeleted ='0') 
AND R.IsDeleted = '0'


--GetAgentForWhichServerIsFailover
SELECT SourceBME.DisplayName as Agent,TargetBME.DisplayName as Server
FROM Relationship R WITH (NOLOCK) 
JOIN BaseManagedEntity SourceBME 
ON R.SourceEntityID = SourceBME.BaseManagedEntityID 
JOIN BaseManagedEntity TargetBME 
ON R.TargetEntityID = TargetBME.BaseManagedEntityID 
WHERE R.RelationshipTypeId = dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceSecondaryCommunication() 
AND SourceBME.DisplayName not in (select DisplayName 
from dbo.ManagedEntityGenericView WITH (NOLOCK) 
where MonitoringClassId in (select ManagedTypeId 
from dbo.ManagedType WITH (NOLOCK) 
where TypeName = 'Microsoft.SystemCenter.GatewayManagementServer') 
and IsDeleted ='0') 
AND SourceBME.DisplayName not in (select DisplayName 
from dbo.ManagedEntityGenericView WITH (NOLOCK) 
where MonitoringClassId in (select ManagedTypeId 
from dbo.ManagedType WITH (NOLOCK) 
where TypeName = 'Microsoft.SystemCenter.ManagementServer') 
and IsDeleted ='0') 
AND R.IsDeleted = '0'


--GetGatewayForWhichServerIsPrimary
SELECT SourceBME.DisplayName as Gateway, TargetBME.DisplayName as Server
FROM Relationship R WITH (NOLOCK) 
JOIN BaseManagedEntity SourceBME 
ON R.SourceEntityID = SourceBME.BaseManagedEntityID 
JOIN BaseManagedEntity TargetBME 
ON R.TargetEntityID = TargetBME.BaseManagedEntityID 
WHERE R.RelationshipTypeId = dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceCommunication() 
AND SourceBME.DisplayName in (select DisplayName 
from dbo.ManagedEntityGenericView WITH (NOLOCK) 
where MonitoringClassId in (select ManagedTypeId 
from dbo.ManagedType WITH (NOLOCK) 
where TypeName = 'Microsoft.SystemCenter.GatewayManagementServer') 
and IsDeleted ='0') 
AND R.IsDeleted = '0'
    

--GetGatewayForWhichServerIsFailover
SELECT SourceBME.DisplayName As Gateway, TargetBME.DisplayName as Server
FROM Relationship R WITH (NOLOCK) 
JOIN BaseManagedEntity SourceBME 
ON R.SourceEntityID = SourceBME.BaseManagedEntityID 
JOIN BaseManagedEntity TargetBME 
ON R.TargetEntityID = TargetBME.BaseManagedEntityID 
WHERE R.RelationshipTypeId = dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceSecondaryCommunication() 
AND SourceBME.DisplayName in (select DisplayName 
from dbo.ManagedEntityGenericView WITH (NOLOCK) 
where MonitoringClassId in (select ManagedTypeId 
from dbo.ManagedType WITH (NOLOCK) 
where TypeName = 'Microsoft.SystemCenter.GatewayManagementServer') 
and IsDeleted ='0') 
AND R.IsDeleted = '0'


--xplat agents
select bme2.DisplayName as XPlatAgent, bme.DisplayName as Server
from dbo.Relationship r with (nolock) 
join dbo.RelationshipType rt with (nolock) 
on r.RelationshipTypeId = rt.RelationshipTypeId 
join dbo.BasemanagedEntity bme with (nolock) 
on bme.basemanagedentityid = r.SourceEntityId 
join dbo.BasemanagedEntity bme2 with (nolock) 
on r.TargetEntityId = bme2.BaseManagedEntityId 
where rt.RelationshipTypeName = 'Microsoft.SystemCenter.HealthServiceManagesEntity' 
and bme.IsDeleted = 0 
and r.IsDeleted = 0 
and bme2.basemanagedtypeid in (SELECT DerivedTypeId 
FROM DerivedManagedTypes with (nolock) 
WHERE BaseTypeId = (select managedtypeid 
from managedtype where typename = 'Microsoft.Unix.Computer') 
and DerivedIsAbstract = 0)

Got Orphaned OpsMgr Objects?

Have you ever wondered what would happen if, in Operations Manager, you’d delete a Management Server or Gateway that managed objects (such as network devices) or has agents pointing uniquely to it as their primary server?

The answer is simple, but not very pleasant: you get ORPHANED objects, which will linger in the database but you won’t be able to “see” or re-assign anymore from the GUI.

So the first thing I want to share is a query to determine IF you have any of those orphaned agents. Or even if you know, since you are not able to “see” them from the console, you might have to dig their name out of the database. Here’s a query I got from a colleague in our reactive support team:


-- Check for orphaned health services (e.g. agent).
declare @DiscoverySourceId uniqueidentifier;
SET @DiscoverySourceId = dbo.fn_DiscoverySourceId_User();
SELECT TME.[TypedManagedEntityid], HS.PrincipalName
FROM MTV_HealthService HS
INNER JOIN dbo.[BaseManagedEntity] BHS WITH(nolock)
ON BHS.[BaseManagedEntityId] = HS.[BaseManagedEntityId]
-- get host managed computer instances
INNER JOIN dbo.[TypedManagedEntity] TME WITH(nolock)
ON TME.[BaseManagedEntityId] = BHS.[TopLevelHostEntityId]
AND TME.[IsDeleted] = 0
INNER JOIN dbo.[DerivedManagedTypes] DMT WITH(nolock)
ON DMT.[DerivedTypeId] = TME.[ManagedTypeId]
INNER JOIN dbo.[ManagedType] BT WITH(nolock)
ON DMT.[BaseTypeId] = BT.[ManagedTypeId]
AND BT.[TypeName] = N'Microsoft.Windows.Computer'
-- only with missing primary
LEFT OUTER JOIN dbo.Relationship HSC WITH(nolock)
ON HSC.[SourceEntityId] = HS.[BaseManagedEntityId]
AND HSC.[RelationshipTypeId] = dbo.fn_RelationshipTypeId_HealthServiceCommunication()
AND HSC.[IsDeleted] = 0
INNER JOIN DiscoverySourceToTypedManagedEntity DSTME WITH(nolock)
ON DSTME.[TypedManagedEntityId] = TME.[TypedManagedEntityId]
AND DSTME.[DiscoverySourceId] = @DiscoverySourceId
WHERE HS.[IsAgent] = 1
AND HSC.[RelationshipId] IS NULL;

Once you have identified the agent you need to re-assign to a new management server, this is doable from the SDK. Below is a powershell script I wrote which will re-assign it to the RMS. It has to run from within the OpsMgr Command Shell.
You still need to change the logic which chooses which agent – this is meant as a starting base… you could easily expand it into accepting parameters and/or consuming an input text file, or using a different Management Server than the RMS… you get the point.

  1. $mg = (get-managementgroupconnection).managementgroup
  2. $mrc = Get-RelationshipClass | where {$_.name –like “*Microsoft.SystemCenter.HealthServiceCommunication*”}
  3. $cmro = new-object Microsoft.EnterpriseManagement.Monitoring.CustomMonitoringRelationshipObject($mrc)
  4. $rms = (get-rootmanagementserver).HostedHealthService
  5. $deviceclass = $mg.getmonitoringclass(“HealthService”)
  6. $mc = Get-connector | where {$_.Name –like “*MOM Internal Connector*”}
  7. Foreach ($obj in $mg.GetMonitoringObjects($deviceclass))
  8. {
  9.     #the next line should be changed to pick the right agent to re-assign
  10.     if ($obj.DisplayName -match ‘dsxlab’)
  11.     {
  12.                 Write-host $obj.displayname
  13.                 $imdd = new-object Microsoft.EnterpriseManagement.ConnectorFramework.IncrementalMonitoringDiscoveryData
  14.                 $cmro.SetSource($obj)
  15.                 $cmro.SetTarget($rms)
  16.                 $imdd.Add($cmro)
  17.                 $imdd.Commit($mc)
  18.     }
  19. }

Similarly, you might get orphaned network devices. The script below is used to re-assign all Network Devices to the RMS. This script is actually something I have had even before the other one (yes, it has been sitting in my “digital drawer” for a couple of years or more…) and uses the same concept – only you might notice that the relation’s source and target are “reversed”, since the relationships are different:

  • the Management Server (source) “manages” the Network Device (target)
  • the Agent (source) “talks” to the Management Server (target)

With a bit of added logic it should be easy to have it work for specific devices.

  1. $mg = (get-managementgroupconnection).managementgroup
  2. $mrc = Get-RelationshipClass | where {$_.name –like “*Microsoft.SystemCenter.HealthServiceShouldManageEntity*”}
  3. $cmro = new-object Microsoft.EnterpriseManagement.Monitoring.CustomMonitoringRelationshipObject($mrc)
  4. $rms = (get-rootmanagementserver).HostedHealthService
  5. $deviceclass = $mg.getmonitoringclass(“NetworkDevice”)
  6. Foreach ($obj in $mg.GetMonitoringObjects($deviceclass))
  7. {
  8.                 Write-host $obj.displayname
  9.                 $imdd = new-object Microsoft.EnterpriseManagement.ConnectorFramework.IncrementalMonitoringDiscoveryData
  10.                 $cmro.SetSource($rms)
  11.                 $cmro.SetTarget($obj)
  12.                 $imdd.Add($cmro)
  13.                 $mc = Get-connector | where {$_.Name –like “*MOM Internal Connector*”}
  14.                 $imdd.Commit($mc)
  15. }

Disclaimer

The information in this weblog is provided “AS IS” with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my own personal opinion. All code samples are provided “AS IS” without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.

Audit Collection Services Database Partitions Size Report

A number of people I have talked to liked my previous post on ACS sizing. One thing that was not extremely easy or clear to them in that post was *how* exactly I did one thing I wrote:

[…] use the dtEvent_GUID table to get the number of events for that day, and use the stored procedure “sp_spaceused”  against that same table to get an overall idea of how much space that day is taking in the database […]

To be completely honest, I do not expect people to do this manually a hundred times if they have a hundred partitions. In fact, I have been doing this for a while with a script which will do the looping for me and run that sp_spaceused for me a number of time. I cannot share that script, but I do realize that this automation is very useful, therefore I wrote a “stand-alone” SQL query which, using a couple of temporary tables, produces a similar type of output. I also went a step further and packaged it into a SQL Server Reporting Services Report for everyone’s consumption. The report should look like the following screenshot, featuring a chart and the table with the numerical information about each and every partition in the database:

ACS Partitions Report

You can download the report from here.

You need to upload it to your report server, and change the data source to the shared Data Source that also the built-in ACS Reports use, and it should work.

[NOTE/UPDATE May 4th 2011: This report has a few bugs. I have posted the updated query on http://www.muscetta.com/2011/05/04/improved-acs-partitions-query/ . I am sorry I can’t provide a ready made report with the fix right now. Make sure you understand this and don’t implement it without testing.]

Enjoy!

A few thoughts on sizing Audit Collection System

People were already collecting logs with MOM, so why not the security log? Some people were doing that, but it did not scale enough; for this reason, a few years ago Eric Fitzgerald announced that he was working on Microsoft Audit Collection System. Anyhow, the tool as it was had no interface… and the rest is history: it has been integrated into System Center Operations Manager. Anyhow, ACS remains a lesser-known component of OpsMgr.

There are a number of resources on the web that is worth mentioning and linking to:

and, of course, many more, I cannot link them all.

As for myself, I have been playing with ACS since those early beta days (before I joined Microsoft and before going back to MOM, when I was working in Security), but I never really blogged about this piece.

Since I have been doing quite a lot of work around ACS lately, again, I thought it might be worth consolidating some thoughts about it, hence this post.

Anatomy of an “Online” Sizing Calculation

What I would like to explain here is the strategy and process I go thru when analyzing the data stored in a ACS database, in order to determine a filtering strategy: what to keep and what not to keep, by applying a filter on the ACS Collector.

So, the first thing I usually start with is using one of the many “ACS sizer” Excel spreadsheets around… which usually tell you that you need more space than it really is necessary… basically giving you a “worst case” scenario. I don’t know how some people can actually do this from a purely theoretical point of view, but I usually prefer a bottom up approach: I look at the actual data that the ACS is collecting without filters, and start from there for a better/more accurate sizing.

In the case of a new install this is easy – you just turn ACS on, set the retention to a few days (one or two weeks maximum), give the DB plenty of space to make sure it will make it, add all your forwarders… sit back and wait.

Then you come back 2 weeks later and start looking at the data that has been collected.

What/How much data are we collecting?

First of all, if we have not changed the default settings, the grooming and partitioning algorithm will create new partitioned tables every day. So my first step is to see how big each “partition” is.

But… what is a partition, anyway? A partition is a set of 4 tables joint together:

  1. dtEvent_GUID
  2. dtEventData_GUID
  3. dtPrincipal_GUID
  4. dtSTrings_GUID

where GUID is a new GUID every day, and of course the 4 tables that make up a daily partition will have the same GUID.

The dtPartition table contains a list of all partitions and their GUIDs, together with their start and closing time.

Just to get a rough estimate we can ignore the space used by the last three tables – which are usually very small – and only use the dtEvent_GUID table to get the number of events for that day, and use the stored procedure “sp_spaceused”  against that same table to get an overall idea of how much space that day is taking in the database.

By following this process, I come up with something like the following:

Partition ID Status Partition Start Time Partition Close Time Rows Reserved  KB Total GB
9b45a567_c848_4a32_9c35_39b402ea0ee202/1/2010 2:002/1/2010 2:0029,749,3667,663,4887,484
8d8c8ee1_4c5c_4dea_b6df_82233c52e34621/31/2010 2:002/1/2010 2:0028,067,4389,076,9048,864
34ce995b_689b_46ae_b9d3_c644cfb66e0121/30/2010 2:001/31/2010 2:0030,485,1109,857,8969,627
bb7ea5d3_f751_473a_a835_1d1d4268303921/29/2010 2:001/30/2010 2:0048,464,95215,670,79215,304
ee262692_beae_4d81_8079_470a5456794621/28/2010 2:001/29/2010 2:0048,980,17815,836,41615,465
7984b5b8_ddea_4e9c_9e51_0ee7a413b4c921/27/2010 2:001/28/2010 2:0051,295,77716,585,40816,197
d93b9f0e_2ec3_4f61_b5e0_b600bbe173d221/26/2010 2:001/27/2010 2:0053,385,23917,262,23216,858
8ce1b69a_7839_4a05_8785_29fd6bfeda5f21/25/2010 2:001/26/2010 2:0055,997,54618,105,84017,681
19aeb336_252d_4099_9a55_81895bfe586021/24/2010 2:001/24/2010 2:0028,525,3047,345,1207,173
1cf70e01_3465_44dc_9d5c_4f3700dc408a21/23/2010 2:001/23/2010 2:0026,046,0926,673,4726,517
f5ec207f_158c_47a8_b15f_8aab177a630521/22/2010 2:001/22/2010 2:0047,818,32212,302,20812,014
b48dabe6_a483_4c60_bb4d_93b7d3549b3e21/21/2010 2:001/21/2010 2:0055,060,15014,155,39213,824
efe66c10_0cf2_4327_adbf_bebb97551c9321/20/2010 2:001/20/2010 2:0058,322,21715,029,21614,677
0231463e_8d50_4a42_a834_baf55e6b4dcd21/19/2010 2:001/19/2010 2:0061,257,39315,741,24815,372
510acc08_dc59_482e_a353_bfae1f85e64821/18/2010 2:001/18/2010 2:0064,579,12216,612,51216,223

If you have just installed ACS and let it run without filters with your agents for a couple of weeks, you should get some numbers like those above for your “couple of weeks” of analysis. If you graph your numbers in Excel (both size and number of rows/events per day) you should get some similar lines that show a pattern or trend:

Trend: Space user by day

Trend: Number of events by day

So, in my example above, we can clearly observe a “weekly” pattern (monday-to-friday being busier than the weekend) and we can see that – for that environment – the biggest partition is roughly 17GB. If we round this up to 20GB – and also considering the weekends are much quieter – we can forecast 20*7 = 140GB per week. This has an excess “buffer” which will let the system survive event storms, should they happen. We also always recommend having some free space to allow for re-indexing operations.

In fact, especially when collecting everything without filters, the daily size is a lot less predictable: imagine worms “trying out” administrator account’s passwords, and so on… those things can easily create event storms.

Anyway, in the example above, the customer would have liked to keep 6 MONTHS (180days) of data online, which would become 20*180 = 3600GB = THREE TERABYTE and a HALF! Therefore we need a filtering strategy – and badly – to reduce this size.

[edited on May 7th 2010 – if you want to automate the above analysis and produce a table and graphs like those just shown, you should look at my following post.]

Filtering Strategies

Ok, then we need to look at WHAT actually comprises that amount of events we are collecting without filters. As I wrote above, I usually run queries to get this type of information.

I will not get into HOW TO write a filter here – a collector’s filter is a WMI notification query and it is already described pretty well elsewhere how to configure it.

Here, instead, I want to walk thru the process and the queries I use to understand where the noise comes from and what could be filtered – and get an estimate of how much space we could be saving if filter one way or another.

Number of Events per User

–event count by User (with Percentages)
declare @total float
select @total = count(HeaderUser) from AdtServer.dvHeader
select count(HeaderUser),HeaderUser, cast(convert(float,(count(HeaderUser)) / (convert(float,@total)) * 100) as decimal(10,2))
from AdtServer.dvHeader
group by HeaderUser
order by count(HeaderUser) desc

In our example above, over the 14 days we were observing, we obtained percentages like the following ones:

#evt HeaderUser AccountPercent
204,904,332SYSTEM40.79 %
18,811,139LOCAL SERVICE3.74 %
14,883,946ANONYMOUS LOGON2.96 %
10,536,317appintrauser2.09 %
5,590,434mossfarmusr

Just by looking at this, it is pretty clear that filtering out events tracked by the accounts “SYSTEM”, “LOCAL SERVICE” and “ANONYMOUS”, we would save over 45% of the disk space!

Number of Events by EventID

Similarly, we can look at how different Event IDs have different weights on the total amount of events tracked in the database:

–event count by ID (with Percentages)
declare @total float
select @total = count(EventId) from AdtServer.dvHeader
select count(EventId),EventId, cast(convert(float,(count(EventId)) / (convert(float,@total)) * 100) as decimal(10,2))
from AdtServer.dvHeader
group by EventId
order by count(EventId) desc

We would get some similar information here:

Event ID Meaning Sum of events Percent
538A user logged off99,494,64827.63
540Successful Network Logon97,819,64027.16
672Authentication Ticket Request52,281,12914.52
680Account Used for Logon by (Windows 2000)35,141,2359.76
576Specified privileges were added to a user’s access token.26,154,7617.26
8086Custom Application ID18,789,5995.21
673Service Ticket Request10,641,0902.95
675Pre-Authentication Failed7,890,8232.19
552Logon attempt using explicit credentials4,143,7411.15
539Logon Failure – Account locked out2,383,8090.66
528Successful Logon1,764,6970.49

Also, do not forget that ACS provides some report to do this type of analysis out of the box, even if for my experience they are generally slower – on large datasets – than the queries provided here. Also, a number of reports have been buggy over time, so I just prefer to run queries and be on the safe side.

Below an example of such report (even if run against a different environment – just in case you were wondering why the numbers were not the same ones :-)):Event Counts ACS Default Report

The numbers and percentages we got from the two queries above should already point us in the right direction about what we might want to adjust in either our auditing policy directly on Windows and/or decide if there is something we want to filter out at the collector level (here you should ask yourself the question: “if they aren’t worth collecting are they worth generating?” – but I digress).

Also, a permutation of the above two queries should let you see which user is generating the most “noise” in regards to some events and not other ones… for example:

–event distribution for a specific user (change the @user) – with percentages for the user and compared with the total #events in the DB
declare @user varchar(255)
set @user = ‘SYSTEM’
declare @total float
select @total = count(Id) from AdtServer.dvHeader
declare @totalforuser float
select @totalforuser = count(Id) from AdtServer.dvHeader where HeaderUser = @user
select count(Id), EventID, cast(convert(float,(count(Id)) / convert(float,@totalforuser) * 100) as decimal(10,2)) as PercentageForUser, cast(convert(float,(count(Id)) / (convert(float,@total)) * 100) as decimal(10,2)) as PercentageTotal
from AdtServer.dvHeader
where HeaderUser = @user
group by EventID
order by count(Id) desc

The above is particularly important, as we might want to filter out a number of events for the SYSTEM account (i.e. logons that occur when starting and stopping services) but we might want to keep other events that are tracked by the SYSTEM account too, such as an administrator having wiped the Security Log clean – which might be something you want to keep:

Event ID 517 Audit Log was cleared

of course the amount of EventIDs 517 over the total of events tracked by the SYSTEM account will not be as many, and we can still filter the other ones out.

Number of Events by EventID and by User

We could also combine the two approaches above – by EventID and by User:

select count(Id),HeaderUser, EventId

from AdtServer.dvHeader

group by HeaderUser, EventId

order by count(Id) desc

This will produce a table like the following one

SQL Query: Events by EventID and by User

which can be easily copied/pasted into Excel in order to produce a pivot Table:

Pivot Table

Cluster EventLog Replication

One more aspect that is less widely known, but I think is worth showing, is the way that clusters behave when in ACS. I don’t mean all clusters… but if you keep the “eventlog replication” feature of clusters enabled (you should disable it also from a monitoring perspective, but I digress), each cluster node’s security eventlog will have events not just for itself, but for all other nodes as well.

Albeit I have not found a reliable way to filter out – other than disabling eventlog replication altogether.

Anyway, just to get an idea of how much this type of “duplicate” events weights on the total, I use the following query, that tells you how many events for each machine are tracked by another machine:

–to spot machines that are cluster nodes with eventlog repliation and write duplicate events (slow)

select Count(Id) as Total,replace(right(AgentMachine, (len(AgentMachine) – patindex(‘%\%’,AgentMachine))),’$’,”) as ForwarderMachine, EventMachine

from AdtServer.dvHeader

–where ForwarderMachine <> EventMachine

group by EventMachine,replace(right(AgentMachine, (len(AgentMachine) – patindex(‘%\%’,AgentMachine))),’$’,”)

order by ForwarderMachine,EventMachine

Cluster Events

Those presented above are just some of the approaches I usually look into at first. Of course there are a number more. Here I am including the same queries already shown in action, plus a few more that can be useful in this process.

I have even considered building a page with all these queries – a bit like those that Kevin is collecting for OpsMgr (we actually wrote some of them together when building the OpsMgr Health Check)… shall I move the below queries on such a page? I though I’d list them here and give some background on how I normally use them, to start off with.

Some more Useful Queries

–top event ids
select count(EventId), EventId
from AdtServer.dvHeader
group by EventId
order by count(EventId) desc

–event count by ID (with Percentages)
declare @total float
select @total = count(EventId) from AdtServer.dvHeader
select count(EventId),EventId, cast(convert(float,(count(EventId)) / (convert(float,@total)) * 100) as decimal(10,2))
from AdtServer.dvHeader
group by EventId
order by count(EventId) desc

–which machines have ever written event 538
select distinct EventMachine, count(EventId) as total
from AdtServer.dvHeader
where EventID = 538
group by EventMachine

–machines
select * from dtMachine

–machines (more readable)
select replace(right(Description, (len(Description) – patindex(‘%\%’,Description))),’$’,”)
from dtMachine

–events by machine
select count(EventMachine), EventMachine
from AdtServer.dvHeader
group by EventMachine

–rows where EventMachine field not available (typically events written by ACS itself for chekpointing)
select *
from AdtServer.dvHeader
where EventMachine = ‘n/a’

–event count by day
select convert(varchar(20), CreationTime, 102) as Date, count(EventMachine) as total
from AdtServer.dvHeader
group by convert(varchar(20), CreationTime, 102)
order by convert(varchar(20), CreationTime, 102)

–event count by day and by machine
select convert(varchar(20), CreationTime, 102) as Date, EventMachine, count(EventMachine) as total
from AdtServer.dvHeader
group by EventMachine, convert(varchar(20), CreationTime, 102)
order by convert(varchar(20), CreationTime, 102)

–event count by machine and by date (distinuishes between AgentMachine and EventMachine
select convert(varchar(10),CreationTime,102),Count(Id),EventMachine,AgentMachine
from AdtServer.dvHeader
group by convert(varchar(10),CreationTime,102),EventMachine,AgentMachine
order by convert(varchar(10),CreationTime,102) desc ,EventMachine

–event count by User
select count(Id),HeaderUser
from AdtServer.dvHeader
group by HeaderUser
order by count(Id) desc

–event count by User (with Percentages)
declare @total float
select @total = count(HeaderUser) from AdtServer.dvHeader
select count(HeaderUser),HeaderUser, cast(convert(float,(count(HeaderUser)) / (convert(float,@total)) * 100) as decimal(10,2))
from AdtServer.dvHeader
group by HeaderUser
order by count(HeaderUser) desc

–event distribution for a specific user (change the @user) – with percentages for the user and compared with the total #events in the DB
declare @user varchar(255)
set @user = ‘SYSTEM’
declare @total float
select @total = count(Id) from AdtServer.dvHeader
declare @totalforuser float
select @totalforuser = count(Id) from AdtServer.dvHeader where HeaderUser = @user
select count(Id), EventID, cast(convert(float,(count(Id)) / convert(float,@totalforuser) * 100) as decimal(10,2)) as PercentageForUser, cast(convert(float,(count(Id)) / (convert(float,@total)) * 100) as decimal(10,2)) as PercentageTotal
from AdtServer.dvHeader
where HeaderUser = @user
group by EventID
order by count(Id) desc

–to spot machines that write duplicate events (such as cluster nodes with eventlog replication enabled)
select Count(Id),EventMachine,AgentMachine
from AdtServer.dvHeader
group by EventMachine,AgentMachine
order by EventMachine

–to spot machines that are cluster nodes with eventlog repliation and write duplicate events (better but slower)
select Count(Id) as Total,replace(right(AgentMachine, (len(AgentMachine) – patindex(‘%\%’,AgentMachine))),’$’,”) as ForwarderMachine, EventMachine
from AdtServer.dvHeader
–where ForwarderMachine <> EventMachine
group by EventMachine,replace(right(AgentMachine, (len(AgentMachine) – patindex(‘%\%’,AgentMachine))),’$’,”)
order by ForwarderMachine,EventMachine

–which user and from which machine is target of elevation (network service doing “runas” is a 552 event)
select count(Id),EventMachine, TargetUser
from AdtServer.dvHeader
where HeaderUser = ‘NETWORK SERVICE’
and EventID = 552
group by EventMachine, TargetUser
order by count(Id) desc

–by hour, minute and user
–(change the timestamp)… this query is useful to search which users are active in a given time period…
–helpful to spot “peaks” of activities such as password brute force attacks, or other activities limited in time.
select datepart(hour,CreationTime) as Hours, datepart(minute,CreationTime) as Minutes, HeaderUser, count(Id) as total
from AdtServer.dvHeader
where CreationTime < ‘2010-02-22T16:00:00.000’
and CreationTime > ‘2010-02-22T15:00:00.000’
group by datepart(hour,CreationTime), datepart(minute,CreationTime),HeaderUser
order by datepart(hour,CreationTime), datepart(minute,CreationTime),HeaderUser

Protecting custom Resolution State in OpsMgr 2007

In System Center Operations Manager 2007, you can add and remove resolution states for your alerts at will. Other than states “0” (“New”) and “255” (“Closed”) you can create other 254 resolution states to suit your needs. This is a simple feature that was already present in previous MOM versions, and it is very useful to do a kind of tricks with your alerts. The amount of possible states you can create should be able to satisfy any kind of alert and incident management process you might have in place, and any kind of filtering or forwarding or escalation need you might want to perform by using resolution states.

image

By default, only OpsMgr Administrators can change these settings, with the exception of the two built-in states of “New” and “Closed”: those two states are REQUIRED if you want the product to continue working, therefore the GUI won’t let you change, edit or delete them. Which is good.

This is not true for your own resolution states, which can be edited or even deleted any time. All that is really saved in an alert when you change an alert’s resolution state is the NUMBER associated with it. In fact you even use that number when querying for alerts in the Command Shell:

get-alert | where {$_.resolutionstate -eq 0}

That means that if by accident you delete a resolution state you have defined, you won’t see its description anymore in the GUI. Also, if you try to re-organize your resolution state, you can easily change the IDs for existing ones… Sure, you need to have the permissions in order to change or delete them, but what if you have implemented your important Alert and Incident management process by using resolution states and you want a bit of extra protection from mistakes or unintended deletion for them?

Then you can protect them by making the product think they were “built-in” too, just like “New” and “Closed”.

How would you do this? In an UNSUPPORTED WAY: editing the database 🙂 In fact, those resolution states are written in a table in the database, called “ResolutionState” (who would have guessed it?), that looks like the following picture:

dbo.ResolutionState

Can you see the “IsPredefined” column? That can be set to “True” or “False” and that value is used by the SDK service to tell the GUI if that Resolution State can be edited/deleted or not.

Of course changing the database directly IS NOT SUPPORTED by Microsoft. You do this at your own risk, and if it was me, I would *NEVER* touch, change or remove the default two states (“New” and “Closed”) as THAT really would BREAK the product. For example, Alerts that are not set to “Closed” (255) won’t be ever groomed. And that is VERY BAD. NEVER, NEVER DO THAT.

On the other end, changing a custom Resolution State to make the product believe it is Predefined/Built-in has not had any negative impact in my (limited) testing so far, and has added the advantage of “protecting” my resolution state from unintended deletion, as shown below:

image

As usual, do this at your own risk. Remember what’s written in my Disclaimer:

The information in this weblog is provided “AS IS” with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my own personal opinion. All code samples are provided “AS IS” without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.
THIS WORK IS NOT ENDORSED AND NOT EVEN CHECKED, AUTHORIZED, SCRUTINIZED NOR APPROVED BY MICROSOFT, AND IT ONLY REPRESENT SOMETHING WHICH I’VE DONE IN MY FREE TIME. NO GUARANTEE WHATSOEVER IS GIVEN ON THIS. THE AUTHOR SHALL NOT BE MADE RESPONSIBLE FOR ANY DAMAGE YOU MIGHT INCUR WHEN USING THIS HACK.

On this website we use first or third-party tools that store small files (cookie) on your device. Cookies are normally used to allow the site to run properly (technical cookies), to generate navigation usage reports (statistics cookies) and to suitable advertise our services/products (profiling cookies). We can directly use technical cookies, but you have the right to choose whether or not to enable statistical and profiling cookies. Enabling these cookies, you help us to offer you a better experience. Cookie and Privacy policy