September is over. The first three quarters of 2015 are over. This has been a very important year so far – difficult, but revealing. Everything has been about change, healing and renewal.
This was a hard choice – it took many months to reach the conclusion this is what I needed to do.
Most people have gone thru strong programming: they think you have to be ‘successful’ at something. Success is externally defined, anyhow (as opposed to satisfaction which we define ourselves) and therefore you are supposed to study in college a certain field, then use that at work to build your career in the same field… and keep doing the same thing.
I was never like that – I didn’t go to college, I didn’t study as an ‘engineer’. I just saw there was a market opportunity to find a job when I started, studied on the job, eventually excelled at it. But it never was *the* road. It just was one road; it has served me well so far, but it was just one thing I tried, and it worked out. How did it start? As a pre-teen, I had been interested in computers, then left that for a while, did ‘normal’ high school (in Italy at the time, this was really non-technological), then I tried to study sociology for a little bit – I really enjoyed the Cultural Anthropology lessons there, and we were smoking good weed with some folks outside of the university, but I really could not be asked to spend the following 5 or 10 years or my life just studying and ‘hanging around’ – I wanted money and independence to move out of my parent’s house.
So, without much fanfare, I revived my IT knowledge: upgraded my skill from the ‘hobbyist’ world of the Commodore 64 and Amiga scene (I had been passionate about modems and the BBS world then), looked at the PC world of the time, rode the ‘Internet wave’ and applied for a simple job at an IT company.
A lot of my friends were either not even searching for a job, with the excuse that there weren’t any, or spending time in university, in a time of change, where all the university-level jobs were taken anyway so that would have meant waiting even more after they had finished studying… I am not even sure they realized this until much later. But I just applied, played my cards, and got my job.
When I went to sign it, they also reminded me they expected hard work at the simplest and humblest level: I would have to fix PC’s, printers, help users with networking issues and tasks like those – at a customer of theirs, a big company. I was ready to roll up my sleeves and help that IT department however I would be capable of, and I did. It all grew from there.
And that’s how my IT career started. I learned all I know of IT on the job and by working my ass off and studying extra hours and watching older/more expert colleagues and making experience.
I am not an engineer. I am, at most, a mechanic. I did learn a lot of companies and the market, languages, designs, politics, the human and technical factors in software engineering and the IT marketplace/worlds, over the course of the past 18 years.
But when I started, I was just trying to lend a honest hand, to get paid some money in return – isn’t that what work was about?
Over time IT got out of control. Like Venom, in the Marvel comics, that made its appearance as a costume that SpiderMan started wearing… and it slowly took over, as the ‘costume’ was in reality some sort of alien symbiotic organism (like a pest).
You might be wondering what I mean. From the outside I was a successful Senior Program Manager of a ‘hot’ Microsoft product. Someone must have mistaken my diligence and hard work for ‘talent’ or ‘desire of career’ – but it never was. I got pushed up, taught to never turn down ‘opportunities’.
First and foremost, I am taking time for myself and my family. I am reading (and writing) I am cooking again I have been catching up on sleep – and have dreams again I am helping my father in law to build a shed in his yard We bought a 14-years old Volkswagen van that we are turning into a Camper I have not stopped building guitars – in fact I am getting setup to do it ‘seriously’ – so I am also standing up a separate site to promote that activity I am making music and discovering new music and instruments I am meeting new people and new situations
There’s a lot of folks out there who either think I am crazy (they might be right, but I am happy this way), or think this is some sort of lateral move – I am not searching for another IT job, thanks. Stop the noise on LinkedIn please: I don’t fit in your algorithms, I just made you believe I did, all these years.
Lately this blog has been very personal. This post is about stuff I do at work, so if you are not one of my IT readers, don’t worry.
For my IT readers, an interruptions from guitars and music on this blog to share some personal reflection on OpInsights and SCOM.
SCOM is very powerful. You know I have always been a huge fan of 2007 and worked myself on the 2012 release. But, compared to its predecessor – MOM – in SCOM it has always been very hard to author management packs – multiple tools, a lot of documentation… here we are, more than 6 years later, and the first 2 comments on an old post on the momteam blog still strike me hard every time I read it:
You would think that things have changed, but SCOM is fundamentally complex, and even with the advances in tooling (VSAE, MPAuthor, etc) writing MPs is still black magic, if you ask some users.
Well, writing those alerting rules in SCOM needs a lot of complex XML – you might not need to know how to write it (but you often have to attempt dechipering it) and even if you create rules with a wizard, it will produce a lot of complex XML for you.
In the screenshot below, the large XML chunk that is needed to pick up a specific eventId from a specific log and a specific source: the key/important information is only a small fraction of it, while the rest is ‘packaging’:
I want OpInsights to be SIMPLE.
If there is onething I want the most for this project, is this.
That’s why the same rule can now be expressed with a simple filter search in OpInsights, where all you need is just that key information
and you essentially don’t have to care about any sort of packaging nor mess with XML.
Click, click – filters/facets in the UI let you refine your criteria. And your saved searches too. And they execute right away, there is not even a ‘Done’ button to press. You might just be watching those searches pinned to tiles in your dashboard. All it took was identify the three key pieces of info, no complex XML wrapping needed!
Ok, granted – there ARE legitimate, more complex, scenarios for which you need complex data sources/collectors and specialized/well thought data shaping, not just events – and we use those powerful capabilities of the MMA agent in intelligence packs. But at its core, the simple search language and explor-ability of the data are meant to bring back SIMPLE to the modern monitoring world. Help us prioritize what data sources you need first!
PS – if you have no idea what I was talking about – thanks for making it till here, but don’t worry: either you are not an IT person, which means simply ignore this; or – if you are an IT person – go check out Azure Operational Insights!
This is the first public release since I am part of the team (I started in this role the day after the team had shipped Beta) and this is the first release that contains some direct output of my work. It feels so good!
I got this sticker last APRIL at MMS2010 in JUST ONE COPY, and I waited till I got a NEW laptop in SEPTEMBER to actually use that… It also took a while to stick it on properly (other than to re-install the PC as I wanted…), but this week they told me that, for an error, I got given the wrong machine (they did it all themselves, tho – I did not ask for any specific one) and this one needs to be replaced!!!!
This is WORSE than any hardware FAILure, as the machine just works very well and I was expecting to keep it for the next two years 🙁
Can anyone be so nice to send me one of those awesome stickers again? 🙂
People were already collecting logs with MOM, so why not the security log? Some people were doing that, but it did not scale enough; for this reason, a few years ago Eric Fitzgerald announced that he was working on Microsoft Audit Collection System. Anyhow, the tool as it was had no interface… and the rest is history: it has been integrated into System Center Operations Manager. Anyhow, ACS remains a lesser-known component of OpsMgr.
There are a number of resources on the web that is worth mentioning and linking to:
and, of course, many more, I cannot link them all.
As for myself, I have been playing with ACS since those early beta days (before I joined Microsoft and before going back to MOM, when I was working in Security), but I never really blogged about this piece.
Since I have been doing quite a lot of work around ACS lately, again, I thought it might be worth consolidating some thoughts about it, hence this post.
Anatomy of an “Online” Sizing Calculation
What I would like to explain here is the strategy and process I go thru when analyzing the data stored in a ACS database, in order to determine a filtering strategy: what to keep and what not to keep, by applying a filter on the ACS Collector.
So, the first thing I usually start with is using one of the many “ACS sizer” Excel spreadsheets around… which usually tell you that you need more space than it really is necessary… basically giving you a “worst case” scenario. I don’t know how some people can actually do this from a purely theoretical point of view, but I usually prefer a bottom up approach: I look at the actual data that the ACS is collecting without filters, and start from there for a better/more accurate sizing.
In the case of a new install this is easy – you just turn ACS on, set the retention to a few days (one or two weeks maximum), give the DB plenty of space to make sure it will make it, add all your forwarders… sit back and wait.
Then you come back 2 weeks later and start looking at the data that has been collected.
What/How much data are we collecting?
First of all, if we have not changed the default settings, the grooming and partitioning algorithm will create new partitioned tables every day. So my first step is to see how big each “partition” is.
But… what is a partition, anyway? A partition is a set of 4 tables joint together:
dtEvent_GUID
dtEventData_GUID
dtPrincipal_GUID
dtSTrings_GUID
where GUID is a new GUID every day, and of course the 4 tables that make up a daily partition will have the same GUID.
The dtPartition table contains a list of all partitions and their GUIDs, together with their start and closing time.
Just to get a rough estimate we can ignore the space used by the last three tables – which are usually very small – and only use the dtEvent_GUID table to get the number of events for that day, and use the stored procedure “sp_spaceused” against that same table to get an overall idea of how much space that day is taking in the database.
By following this process, I come up with something like the following:
Partition ID
Status
Partition Start Time
Partition Close Time
Rows
Reserved KB
Total GB
9b45a567_c848_4a32_9c35_39b402ea0ee2
0
2/1/2010 2:00
2/1/2010 2:00
29,749,366
7,663,488
7,484
8d8c8ee1_4c5c_4dea_b6df_82233c52e346
2
1/31/2010 2:00
2/1/2010 2:00
28,067,438
9,076,904
8,864
34ce995b_689b_46ae_b9d3_c644cfb66e01
2
1/30/2010 2:00
1/31/2010 2:00
30,485,110
9,857,896
9,627
bb7ea5d3_f751_473a_a835_1d1d42683039
2
1/29/2010 2:00
1/30/2010 2:00
48,464,952
15,670,792
15,304
ee262692_beae_4d81_8079_470a54567946
2
1/28/2010 2:00
1/29/2010 2:00
48,980,178
15,836,416
15,465
7984b5b8_ddea_4e9c_9e51_0ee7a413b4c9
2
1/27/2010 2:00
1/28/2010 2:00
51,295,777
16,585,408
16,197
d93b9f0e_2ec3_4f61_b5e0_b600bbe173d2
2
1/26/2010 2:00
1/27/2010 2:00
53,385,239
17,262,232
16,858
8ce1b69a_7839_4a05_8785_29fd6bfeda5f
2
1/25/2010 2:00
1/26/2010 2:00
55,997,546
18,105,840
17,681
19aeb336_252d_4099_9a55_81895bfe5860
2
1/24/2010 2:00
1/24/2010 2:00
28,525,304
7,345,120
7,173
1cf70e01_3465_44dc_9d5c_4f3700dc408a
2
1/23/2010 2:00
1/23/2010 2:00
26,046,092
6,673,472
6,517
f5ec207f_158c_47a8_b15f_8aab177a6305
2
1/22/2010 2:00
1/22/2010 2:00
47,818,322
12,302,208
12,014
b48dabe6_a483_4c60_bb4d_93b7d3549b3e
2
1/21/2010 2:00
1/21/2010 2:00
55,060,150
14,155,392
13,824
efe66c10_0cf2_4327_adbf_bebb97551c93
2
1/20/2010 2:00
1/20/2010 2:00
58,322,217
15,029,216
14,677
0231463e_8d50_4a42_a834_baf55e6b4dcd
2
1/19/2010 2:00
1/19/2010 2:00
61,257,393
15,741,248
15,372
510acc08_dc59_482e_a353_bfae1f85e648
2
1/18/2010 2:00
1/18/2010 2:00
64,579,122
16,612,512
16,223
If you have just installed ACS and let it run without filters with your agents for a couple of weeks, you should get some numbers like those above for your “couple of weeks” of analysis. If you graph your numbers in Excel (both size and number of rows/events per day) you should get some similar lines that show a pattern or trend:
So, in my example above, we can clearly observe a “weekly” pattern (monday-to-friday being busier than the weekend) and we can see that – for that environment – the biggest partition is roughly 17GB. If we round this up to 20GB – and also considering the weekends are much quieter – we can forecast 20*7 = 140GB per week. This has an excess “buffer” which will let the system survive event storms, should they happen. We also always recommend having some free space to allow for re-indexing operations.
In fact, especially when collecting everything without filters, the daily size is a lot less predictable: imagine worms “trying out” administrator account’s passwords, and so on… those things can easily create event storms.
Anyway, in the example above, the customer would have liked to keep 6 MONTHS (180days) of data online, which would become 20*180 = 3600GB = THREE TERABYTE and a HALF! Therefore we need a filtering strategy – and badly – to reduce this size.
[edited on May 7th 2010 – if you want to automate the above analysis and produce a table and graphs like those just shown, you should look at my following post.]
Filtering Strategies
Ok, then we need to look at WHAT actually comprises that amount of events we are collecting without filters. As I wrote above, I usually run queries to get this type of information.
I will not get into HOW TO write a filter here – a collector’s filter is a WMI notification query and it is already described pretty well elsewhere how to configure it.
Here, instead, I want to walk thru the process and the queries I use to understand where the noise comes from and what could be filtered – and get an estimate of how much space we could be saving if filter one way or another.
Number of Events per User
–event count by User (with Percentages) declare @total float select @total = count(HeaderUser) from AdtServer.dvHeader select count(HeaderUser),HeaderUser, cast(convert(float,(count(HeaderUser)) / (convert(float,@total)) * 100) as decimal(10,2)) from AdtServer.dvHeader group by HeaderUser order by count(HeaderUser) desc
In our example above, over the 14 days we were observing, we obtained percentages like the following ones:
#evt
HeaderUser Account
Percent
204,904,332
SYSTEM
40.79 %
18,811,139
LOCAL SERVICE
3.74 %
14,883,946
ANONYMOUS LOGON
2.96 %
10,536,317
appintrauser
2.09 %
5,590,434
mossfarmusr
…
Just by looking at this, it is pretty clear that filtering out events tracked by the accounts “SYSTEM”, “LOCAL SERVICE” and “ANONYMOUS”, we would save over 45% of the disk space!
Number of Events by EventID
Similarly, we can look at how different Event IDs have different weights on the total amount of events tracked in the database:
–event count by ID (with Percentages) declare @total float select @total = count(EventId) from AdtServer.dvHeader select count(EventId),EventId, cast(convert(float,(count(EventId)) / (convert(float,@total)) * 100) as decimal(10,2)) from AdtServer.dvHeader group by EventId order by count(EventId) desc
We would get some similar information here:
Event ID
Meaning
Sum of events
Percent
538
A user logged off
99,494,648
27.63
540
Successful Network Logon
97,819,640
27.16
672
Authentication Ticket Request
52,281,129
14.52
680
Account Used for Logon by (Windows 2000)
35,141,235
9.76
576
Specified privileges were added to a user’s access token.
26,154,761
7.26
8086
Custom Application ID
18,789,599
5.21
673
Service Ticket Request
10,641,090
2.95
675
Pre-Authentication Failed
7,890,823
2.19
552
Logon attempt using explicit credentials
4,143,741
1.15
539
Logon Failure – Account locked out
2,383,809
0.66
528
Successful Logon
1,764,697
0.49
Also, do not forget that ACS provides some report to do this type of analysis out of the box, even if for my experience they are generally slower – on large datasets – than the queries provided here. Also, a number of reports have been buggy over time, so I just prefer to run queries and be on the safe side.
Below an example of such report (even if run against a different environment – just in case you were wondering why the numbers were not the same ones :-)):
The numbers and percentages we got from the two queries above should already point us in the right direction about what we might want to adjust in either our auditing policy directly on Windows and/or decide if there is something we want to filter out at the collector level (here you should ask yourself the question: “if they aren’t worth collecting are they worth generating?” – but I digress).
Also, a permutation of the above two queries should let you see which user is generating the most “noise” in regards to some events and not other ones… for example:
–event distribution for a specific user (change the @user) – with percentages for the user and compared with the total #events in the DB declare @user varchar(255) set @user = ‘SYSTEM’ declare @total float select @total = count(Id) from AdtServer.dvHeader declare @totalforuser float select @totalforuser = count(Id) from AdtServer.dvHeader where HeaderUser = @user select count(Id), EventID, cast(convert(float,(count(Id)) / convert(float,@totalforuser) * 100) as decimal(10,2)) as PercentageForUser, cast(convert(float,(count(Id)) / (convert(float,@total)) * 100) as decimal(10,2)) as PercentageTotal from AdtServer.dvHeader where HeaderUser = @user group by EventID order by count(Id) desc
The above is particularly important, as we might want to filter out a number of events for the SYSTEM account (i.e. logons that occur when starting and stopping services) but we might want to keep other events that are tracked by the SYSTEM account too, such as an administrator having wiped the Security Log clean – which might be something you want to keep:
of course the amount of EventIDs 517 over the total of events tracked by the SYSTEM account will not be as many, and we can still filter the other ones out.
Number of Events by EventID and by User
We could also combine the two approaches above – by EventID and by User:
select count(Id),HeaderUser, EventId
from AdtServer.dvHeader
group by HeaderUser, EventId
order by count(Id) desc
This will produce a table like the following one
which can be easily copied/pasted into Excel in order to produce a pivot Table:
Cluster EventLog Replication
One more aspect that is less widely known, but I think is worth showing, is the way that clusters behave when in ACS. I don’t mean all clusters… but if you keep the “eventlog replication” feature of clusters enabled (you should disable it also from a monitoring perspective, but I digress), each cluster node’s security eventlog will have events not just for itself, but for all other nodes as well.
Albeit I have not found a reliable way to filter out – other than disabling eventlog replication altogether.
Anyway, just to get an idea of how much this type of “duplicate” events weights on the total, I use the following query, that tells you how many events for each machine are tracked by another machine:
–to spot machines that are cluster nodes with eventlog repliation and write duplicate events (slow)
select Count(Id) as Total,replace(right(AgentMachine, (len(AgentMachine) – patindex(‘%\%’,AgentMachine))),’$’,”) as ForwarderMachine, EventMachine
from AdtServer.dvHeader
–where ForwarderMachine <> EventMachine
group by EventMachine,replace(right(AgentMachine, (len(AgentMachine) – patindex(‘%\%’,AgentMachine))),’$’,”)
order by ForwarderMachine,EventMachine
Those presented above are just some of the approaches I usually look into at first. Of course there are a number more. Here I am including the same queries already shown in action, plus a few more that can be useful in this process.
I have even considered building a page with all these queries – a bit like those that Kevin is collecting for OpsMgr (we actually wrote some of them together when building the OpsMgr Health Check)… shall I move the below queries on such a page? I though I’d list them here and give some background on how I normally use them, to start off with.
Some more Useful Queries
–top event ids select count(EventId), EventId from AdtServer.dvHeader group by EventId order by count(EventId) desc
–event count by ID (with Percentages) declare @total float select @total = count(EventId) from AdtServer.dvHeader select count(EventId),EventId, cast(convert(float,(count(EventId)) / (convert(float,@total)) * 100) as decimal(10,2)) from AdtServer.dvHeader group by EventId order by count(EventId) desc
–which machines have ever written event 538 select distinct EventMachine, count(EventId) as total from AdtServer.dvHeader where EventID = 538 group by EventMachine
–machines select * from dtMachine
–machines (more readable) select replace(right(Description, (len(Description) – patindex(‘%\%’,Description))),’$’,”) from dtMachine
–events by machine select count(EventMachine), EventMachine from AdtServer.dvHeader group by EventMachine
–rows where EventMachine field not available (typically events written by ACS itself for chekpointing) select * from AdtServer.dvHeader where EventMachine = ‘n/a’
–event count by day select convert(varchar(20), CreationTime, 102) as Date, count(EventMachine) as total from AdtServer.dvHeader group by convert(varchar(20), CreationTime, 102) order by convert(varchar(20), CreationTime, 102)
–event count by day and by machine select convert(varchar(20), CreationTime, 102) as Date, EventMachine, count(EventMachine) as total from AdtServer.dvHeader group by EventMachine, convert(varchar(20), CreationTime, 102) order by convert(varchar(20), CreationTime, 102)
–event count by machine and by date (distinuishes between AgentMachine and EventMachine select convert(varchar(10),CreationTime,102),Count(Id),EventMachine,AgentMachine from AdtServer.dvHeader group by convert(varchar(10),CreationTime,102),EventMachine,AgentMachine order by convert(varchar(10),CreationTime,102) desc ,EventMachine
–event count by User select count(Id),HeaderUser from AdtServer.dvHeader group by HeaderUser order by count(Id) desc
–event count by User (with Percentages) declare @total float select @total = count(HeaderUser) from AdtServer.dvHeader select count(HeaderUser),HeaderUser, cast(convert(float,(count(HeaderUser)) / (convert(float,@total)) * 100) as decimal(10,2)) from AdtServer.dvHeader group by HeaderUser order by count(HeaderUser) desc
–event distribution for a specific user (change the @user) – with percentages for the user and compared with the total #events in the DB declare @user varchar(255) set @user = ‘SYSTEM’ declare @total float select @total = count(Id) from AdtServer.dvHeader declare @totalforuser float select @totalforuser = count(Id) from AdtServer.dvHeader where HeaderUser = @user select count(Id), EventID, cast(convert(float,(count(Id)) / convert(float,@totalforuser) * 100) as decimal(10,2)) as PercentageForUser, cast(convert(float,(count(Id)) / (convert(float,@total)) * 100) as decimal(10,2)) as PercentageTotal from AdtServer.dvHeader where HeaderUser = @user group by EventID order by count(Id) desc
–to spot machines that write duplicate events (such as cluster nodes with eventlog replication enabled) select Count(Id),EventMachine,AgentMachine from AdtServer.dvHeader group by EventMachine,AgentMachine order by EventMachine
–to spot machines that are cluster nodes with eventlog repliation and write duplicate events (better but slower) select Count(Id) as Total,replace(right(AgentMachine, (len(AgentMachine) – patindex(‘%\%’,AgentMachine))),’$’,”) as ForwarderMachine, EventMachine from AdtServer.dvHeader –where ForwarderMachine <> EventMachine group by EventMachine,replace(right(AgentMachine, (len(AgentMachine) – patindex(‘%\%’,AgentMachine))),’$’,”) order by ForwarderMachine,EventMachine
–which user and from which machine is target of elevation (network service doing “runas” is a 552 event) select count(Id),EventMachine, TargetUser from AdtServer.dvHeader where HeaderUser = ‘NETWORK SERVICE’ and EventID = 552 group by EventMachine, TargetUser order by count(Id) desc
–by hour, minute and user –(change the timestamp)… this query is useful to search which users are active in a given time period… –helpful to spot “peaks” of activities such as password brute force attacks, or other activities limited in time. select datepart(hour,CreationTime) as Hours, datepart(minute,CreationTime) as Minutes, HeaderUser, count(Id) as total from AdtServer.dvHeader where CreationTime < ‘2010-02-22T16:00:00.000’ and CreationTime > ‘2010-02-22T15:00:00.000’ group by datepart(hour,CreationTime), datepart(minute,CreationTime),HeaderUser order by datepart(hour,CreationTime), datepart(minute,CreationTime),HeaderUser
cmdlet Get-Credential at command pipeline position 1 Supply values for the following parameters: Credential
But we do get this error:
Test-WSMan : The server certificate on the destination computer (virtubuntu.huis.dom:1270) has the following errors: The SSL certificate could not be checked for revocation. The server used to check for revocation might be unreachable.
The SSL certificate is signed by an unknown certificate authority. At line:1 char:11 + test-wsman <<<< -computer virtubuntu.huis.dom -port 1270 -authentication basic -credential (get-credential) -usessl + CategoryInfo : InvalidOperation: (:) [Test-WSMan], InvalidOperationException + FullyQualifiedErrorId : WsManError,Microsoft.WSMan.Management.TestWSManCommand
The credentials above have to be a unix login. Which we typed correctly. But we still can’t get thru, as the certificate used by the agent is not trusted by our workstation. This seems to be the “usual” issue I first faced when testing SCX with WINRM in beta1. At the time I simply dismissed it with the following sentence
[…] Of course you have to solve some other things such as DNS resolution AND trusting the self-issued certificates that the agent uses, first. Once you have done that, you can run test queries from the Windows box towards the Unix ones by using WinRM. […]
and I sincerely thought that it would explain pretty well… but eventually a lot of people got confused by this and did not know what to do, especially for the part that goes about trusting the certificate. Anyway, in the following posts I figured out you could pass the –skipCACheck parameter to WINRM… which solved the issue with having to trust the certificate (which is fine for testing, but I would not use that for automations and scripts running in production… as it might expose your credentials to man-in-the-middle attacks).
So it seems that with the Powershell cmdlets we are back to that issue, as I can’t find a parameter to skip the CA check. Maybe it is there, but with PSv2 not having been released yet, I don’t know everything about it, and the CTP documentation is not yet complete. Therefore, back to trusting the certificate.
Trusting the certificate is actually very simple, but it can be a bit tricky when passing those certs back and forth from unix to windows. So let’s make the process a bit clearer.
All of the SCX-agents certificates are ultimately signed by a key on the Management server that has discovered them, but I don’t currently know where that certificate/key is stored on the management server. Anyway, you can get it from the agent certificate – as you only really need the public key, not the private signing key.
Use WinSCP or any other utility to copy the certificate off one of the agents. You can find that in the /etc/opt/microsoft/scx/ssl location:
that scx-host-computername.pem is your agent certificate.
Copy it to the Management server and change its extension from .pem to .cer. Now Windows will be happy to show it to you with the usual Certificate interface:
We need to go to the “Certification Path” tab, select the ISSUER certificate (the one called “SCX-Certificate”):
then go to the “Details” tab, and use the “Copy to File” button to export the certificate.
After you have the certificate in a .CER file, you can add it to the “trusted root certification authorities” store on the computer you are running your powershell tests from.
So after you have trusted it, the same command as above actually works now:
cmdlet Get-Credential at command pipeline position 1 Supply values for the following parameters: Credential
wsmid : http://schemas.dmtf.org/wbem/wsman/identify/1/wsmanidentity.xsd lang : ProtocolVersion : http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd ProductVendor : Microsoft System Center Cross Platform ProductVersion : 1.0.4-248
Ok, we can talk to it! Now we can do something funnier, like actually returning instances and/or calling methods:
This is far from exhaustive, but should get you started on a world of possibilities about automating diagnostics and responses with Powershell v2 towards the OpsMgr 2007 R2 Cross-Platform machines. Enjoy!
Disclaimer
The information in this weblog is provided “AS IS” with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my own personal opinion. All code samples are provided “AS IS” without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose. THIS WORK IS NOT ENDORSED AND NOT EVEN CHECKED, AUTHORIZED, SCRUTINIZED NOR APPROVED BY MY EMPLOYER, AND IT ONLY REPRESENT SOMETHING WHICH I’VE DONE IN MY FREE TIME. NO GUARANTEE WHATSOEVER IS GIVEN ON THIS. THE AUTHOR SHALL NOT BE MADE RESPONSIBLE FOR ANY DAMAGE YOU MIGHT INCUR WHEN USING THIS INFORMATION. The solution presented here IS NOT SUPPORTED by Microsoft.
I never went really further in my experiments, mostly due to lack of time… but then yesterday I got a comment to this older post asking about Ubuntu. Of course I know about Ubuntu, and have been using Debian-based distributions for years. I actually even prefer them over RPM-based distributions such as RedHat or SuSE (personal preference). Heck, even this weblog is running on Debian!
Anyway, I never really tried to see if one of the existing RPM packages for RedHat or SuSE could be modified to run on Ubuntu. I will eventually test this on Debian too, but for now I used Ubuntu which tends to have slightly newer packages and libraries, overall. The machine I tested on is a Ubuntu Server 8.04.2. Older/newer versions might slightly differ.
BEWARE THAT ALL THAT FOLLOWS BELOW IS NOT SUPPORTED BY MICROSOFT. It is only described here for EXPERIMENTAL (==fun) purpose. DO NOT USE THIS IN A PRODUCTION ENVIRONMENT.
So, you are warned. Now let’s hack it.
The first thing to do is to copy the Redhat agent’s RPM package off your OpsMgr2007 R2 server in the “usual” path “C:Program FilesSystem Center Operations manager 2007AgentManagementUnixAgents”. Let’s grab the RHEL5 agent, which is called scx-1.0.4-248.rhel.5.x86.rpm in R2 RTM.
The converted package will install… but the script execution will fail in a few places – most notably in the generation of the certificate, as it is not able to locate the right openssl libraries, as shown in the screenshot above.
If the libssl.so.6 file cannot be found, you might be missing the “libssl-dev” package, which you can install as follows:
apt-get install libssl-dev
But even if it is installed, you will find that the files are still missing. This is not really true: actually, the files are there, but on Ubuntu they have a different name than on RedHat, that’s all. You can therefore create hardlinks to the “right” files, so that they are aliased and get found afterwards:
cd /usr/lib ln -s libcrypto.so.0.9.8 libcrypto.so.6 ln -s libssl.so.0.9.8 libssl.so.6
So now when installing the package, the certificate generation will work:
You are nearly ready to go. You have to start the service by using the init scripts – the “service” command is RedHat-specific, that will still fail.
/etc/init.d/scx-cimd start is the “standard” way of starting daemons from init on Unix.
But it still fails, as it seems that the init script provided in the RedHat package is really searching for a file called “functions” which is present on RedHat and on CentOS, which provides re-usable functions for startup scripts to include:
How do you fix this? I just copied the /etc/init.d/functions file from a CentOS box to my Ubuntu box.
I copied it via SCP from the CentOS box I have:
cd /etc/init.d
scp root@centos.huis.dom:/etc/init.d/functions .
You can probably also find and fetch the file from the Internet (both CentOS and RedHat should have accessible repositories with all the files in their distributions, since it is open sourced).
After you have the file in place, the init script will be able to include it, will find the functions it needs, and the daemon/service will now start (even if with minor errors I have not investigated for now, but that don’t seem to be causing troubles):
But… there is a “but”: not all classes actually return instances and values just yet. Most notably the “SCX_OperatingSystem” class does not seem to return anything right awy. That is a very important class, because is the one we would use to first discover the Operating System object in the Management Packs. So we need to fix it. The reason why the class does not return anything, is that the SCX provider is looking into the /etc/redhat-release file to return what OS version/distribution the machine is running. And the file is obviously not there on Ubuntu.
On all Linuxes there is a similar file, called /etc/issue… which again, we can copy with the other name and trick the provider into working:
cd /etc
cp issue redhat-release
And NOW, the SCX_OperatingSystem Class also returns an instance:
The next step would be “cooking” an MP to discover Ubuntu. More on this on a later post (maybe). I did not test all classes and their implementation… you can try to poke at them by following the instructions and commands on my previous post here. But this should get you started.
Disclaimer
The information in this weblog is provided “AS IS” with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my own personal opinion. All code samples are provided “AS IS” without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose. THIS WORK IS NOT ENDORSED AND NOT EVEN CHECKED, AUTHORIZED, SCRUTINIZED NOR APPROVED BY MY EMPLOYER, AND IT ONLY REPRESENT SOMETHING WHICH I’VE DONE IN MY FREE TIME. NO GUARANTEE WHATSOEVER IS GIVEN ON THIS. THE AUTHOR SHALL NOT BE MADE RESPONSIBLE FOR ANY DAMAGE YOU MIGHT INCUR WHEN USING THIS INFORMATION. The solution presented here IS NOT SUPPORTED by Microsoft.
Also, I started playing early with Powershell. I posted my first (and only) cmdlet back in 2006. It was not a lot more than a test for myself to learn how to write one, but that’s just to say that I started playing early with it. I have been using it to automate tasks for example.
Going back to the quote above, everyone gets on the bandwagon posting examples and articles. I had been asked a few times about writing articles on OpsMgr and Powershell usage (for example by www.powershell.it) but I declined, as I was too busy using this knowledge to do stuff for work (where “work” is defined as in “work that pays your mortgage”), rather than seeking personal prestige through articles and blogs. Anyway, that kind of articles are appearing now all over the Internet and the blogosphere now. The above examples made me think of early adoption, and the bandwagon that follows later on… but even as an early adopter, I was never very noisy or visible.
Now, going back to what I do for work, (which I mentioned here and here in the past), I work in the Premier Field Engineering organization of Microsoft Services, which provides Premier services to customers. Microsoft Premier customer have a wide range of Premier agreement features and components that they can use to support their people, improve their processes, and improve the productive use of the Microsoft technology they have purchased. Some of these services we provide are known to the world as “Health Checks”, some as “Risk Assessment Programs” (or, shortly, RAPs). These are basically services where one of our technology experts goes on the customer site and there he uses a custom, private Microsoft tool to gather a huge amount of data from the product we mean to look at (be it SQL, Exchange, AD or anything else….). The Health Check or RAP tool collects the data and outputs a draft of the report that will be delivered to the customer later on, with all the right sections and chapters. This is done so that every report of the same kind will look consistent, even if the engagement is performed by a different engineer in a different part of the world. The engineer will of course analyze the collected data and write recommendations about what is configured properly and/or about what could or should be changed and/or improved in the implementation to make it adhere to Best Practices. To make sure only the right people actually go onsite to do this job we have a strict internal accreditation process that must be followed; only accredited resources that know the product well enough and know exactly how to interpret the data that the tool collects are allowed to use it and to deliver the engagement, and present/write the findings to the customer.
So why am I telling you this here, and how have I been using my early knowledge of OpsMgr and Powershell for ?
I have used that to write the Operations Manager Health Check, of course!
We had a MOM 2005 Health Check already, but since the technology has changed so much, from MOM to OpsMgr, we had to write a completely new tool. Jeff (the original MOM2005 author, who does not have a blog that I can link to) and me are the main coders of this tool… and the tool itself is A POWERSHELL script. A longish one, of course (7000 lines, more or less), but nothing more than a Powershell script, at the end of the day. There are a few more colleagues that helped shape the features and tested the tool, including Kevin Holman. Some of the database queries on Kevin’s blog are in fact what we use to extract some of the data (beware that some of those queries have recently been updated, in case you saved them and using your local copy!), while some other information are using internal and/or custom queries. Some other times we use OpsMgr cmdlets or go to the SDK service, but a lot of times we query the database directly (we really should use the SDK all the times, but for certain stuff direct database access is way faster). It took most of the past year to write it, test it, troubleshoot it, fix it, and deliver the first engagements as “beta” to some customers to help iron out the process… and now the delivery is available! If a year seems like a long time, you have to consider this is all work that gets done next to what we all have to normally do with customers, not replacing it (i.e. I am not free to sit on my butt all day and just write the tool… I still have to deliver services to customers day in day out, in the meantime).
Occasionally, during this past calendar year, that is approaching its end, I have been willing and have found some extra time to disclose some bits and pieces, techniques and prototypes of how to use Powershell and OpsMgr together, such as innovative ways to use Powershell in OpsMgr against beta features, but in general most of my early adopter’s investment went into the private tool for this engagement, and that is one of the reasons I couldn’t blog or write much about it, being it Microsoft Intellectual Property.
But it is also true that I did not care to write other stuff when I considered it too easy or it could be found in the documentation. I like writing of ideas, thoughts, rants OR things that I discover and that are not well documented at the time I study them… so when I figure out things I might like leaving a trail for some to follow. But I am not here to spoon feed people like some in the bandwagon are doing. Now the bandwagon is busy blogging and writing continuously about some aspect of OpsMgr (known or unknown, documented or not), and the answer to the original question of Hugh is, in my opinion, that it does not really matter what the bandwagon is doing right now. I was never here to do the same thing. I think that is my differentiator. I am not saying that what a bunch of colleagues and enthusiasts is doing is not useful: blogging and writing about various things they experiment with is interesting and it will be useful to people. But blogs are useful until a certain limit. I think that blogs are best suited for conversations and thoughts (rather than for “howto’s”), and what I would love to see instead is: less marketing hype when new versions are announced and more real, official documentation.
But I think I should stop caring about what the bandwagon is doing, because that’s just another ego trip at the end of the day. What I should more sensibly do, would be listening to my horoscope instead:
[…] “How do you slay the dragon?” journalist Bill Moyers asked mythologist Joseph Campbell in an interview. By “dragon,” he was referring to the dangerous beast that symbolizes the most unripe and uncontrollable part of each of our lives. In reply to Moyers, Campbell didn’t suggest that you become a master warrior, nor did he recommend that you cultivate high levels of sleek, savage anger. “Follow your bliss,” he said simply. Personally, I don’t know if that’s enough to slay the dragon — I’m inclined to believe that you also have to take some defensive measures — but it’s definitely worth an extended experiment. Would you consider trying that in 2009? […]
I have to say that OpsMgr2007 R2 beta release notes explain the known issues, and I had no trouble whatsoever upgrading the windows part. It just took its time (I am running virtual machines in my test lab, that don’t have the best performance), but it went smoothly and without a glitch. In a couple of hours I had everything upgraded: databases, RMS, reporting, agents, gateway. All right then. The new purple icons in System Center look cute, and the new UI has some great stuff, such as a long-awaited way to update your management packs directly from the Internet, better display of Overrides (kind of what we used to rely on Override Explorer for)… and A LOT more new stuff that I won’t be wasting my Sunday writing about since everybody else has already done it two days ago:
Therefore let’s get back to my upgrade, which is a lot more interesting (to me) than the marketing tam-tam 🙂
As part of the upgrade to R2, I had to first uninstall the Xplat beta refresh bits, which I had installed, including all Unix Management Packs. Including my CentOS Management Pack I had improvised.
So this is the new start page of the integrated Discovery Wizard:
Looks nice and integrates the functionality of discovering and deploying Windows machines, SNMP Devices, and Unix/Linux machines.
Of course, my CentOS machine would not be discovered, and showed up as an unsupported platform. Of course my old Management Pack I had hacked together in XPlat Beta 1 did not work anymore. Therefore, I figured out I had to see what changes were there, and how to make it work again (of course it IS possible – It is NOT SUPPORTED, but I don’t care, as long as it works).
Since the existing agent could not be discovered, the first step I took was logging on the Linux box, un-install the old agent, and install the new one:
There I tried to discover again, but of course it still failed.
At that point I started taking a look at the new layout of things on the unix side. Most stuff is located in the same directories where beta1 was installed, and there are a bunch of useful commands under /opt/microsoft/scx/bin/tools. You can check out the Open Pegasus version used:
[root@centos tools]# ./scxcimconfig –version Version 2.7.0
Let’s take a look at what SCX classes we have available:
./scxcimcli nc -n root/scx -di |grep SCX | sort
Nice. That’s the stuff we will be querying over WS-Man from the Management Server.
So let’s look at the OS Discovery, and we test it from the OpsMgr 2007 box:
At first I assumed this worked like in Beta1, therefore I exported RedHat management pack and I made my own version of it, replacing the strings it is expecting to find to discover CentOS instead than Redhat.
While the MP was syntactically correct and would import fine, the Discovery wizard still didn’t work.
I took one more look at the discoveries in the MP, and I found there are two more, targeted to Management Server, which is probably what gets used by the Discovery Wizard to understand what kind of agent kit needs to be deployed.
So basically this discovery checks for the returned value from the module to determine if the discovered platform is a supported one:
But how does the module get its data?
Look at the layout of the /AgentManagement/UnixAgents folder on the Management Server:
That’s it: GetOSVersion.sh – a shell script. A nice, open, clear text, hackable shell script. Let’s take a look at it:
So that’s it, and how my modification looks like. What happens during the discovery wizard is that we probably copy the script over SCP to the box, execute it, look at a number of things, and return the discovery data we need.
So after modifying the script… here we go. The Wizard now thinks CentOS is Red Hat, and can install an agent on it:
Only when the Management Server discovery finally considers the CentOS machine worth managing, then the other discoveries that use WS-Man queries start kicking in, like the old one did, and find the OS objects and all the other hosted objects. In order for this to work you don’t only need to hack the shell script, but to have a hacked MP – the “regular” Red Har one won’t find CentOS, which is and remains an UNSUPPORTED platform.
Disclaimer
The information in this weblog is provided “AS IS” with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my own personal opinion. All code samples are provided “AS IS” without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose. THIS WORK IS NOT ENDORSED AND NOT EVEN CHECKED, AUTHORIZED, SCRUTINIZED NOR APPROVED BY MY EMPLOYER, AND IT ONLY REPRESENT SOMETHING WHICH I’VE DONE IN MY FREE TIME. NO GUARANTEE WHATSOEVER IS GIVEN ON THIS. THE AUTHOR SHALL NOT BE MADE RESPONSIBLE FOR ANY DAMAGE YOU MIGHT INCUR WHEN USING THIS INFORMATION. The solution presented here IS NOT SUPPORTED by Microsoft.
In System Center Operations Manager 2007, you can add and remove resolution states for your alerts at will. Other than states “0” (“New”) and “255” (“Closed”) you can create other 254 resolution states to suit your needs. This is a simple feature that was already present in previous MOM versions, and it is very useful to do a kind of tricks with your alerts. The amount of possible states you can create should be able to satisfy any kind of alert and incident management process you might have in place, and any kind of filtering or forwarding or escalation need you might want to perform by using resolution states.
By default, only OpsMgr Administrators can change these settings, with the exception of the two built-in states of “New” and “Closed”: those two states are REQUIRED if you want the product to continue working, therefore the GUI won’t let you change, edit or delete them. Which is good.
This is not true for your own resolution states, which can be edited or even deleted any time. All that is really saved in an alert when you change an alert’s resolution state is the NUMBER associated with it. In fact you even use that number when querying for alerts in the Command Shell:
That means that if by accident you delete a resolution state you have defined, you won’t see its description anymore in the GUI. Also, if you try to re-organize your resolution state, you can easily change the IDs for existing ones… Sure, you need to have the permissions in order to change or delete them, but what if you have implemented your important Alert and Incident management process by using resolution states and you want a bit of extra protection from mistakes or unintended deletion for them?
Then you can protect them by making the product think they were “built-in” too, just like “New” and “Closed”.
How would you do this? In an UNSUPPORTED WAY: editing the database 🙂 In fact, those resolution states are written in a table in the database, called “ResolutionState” (who would have guessed it?), that looks like the following picture:
Can you see the “IsPredefined” column? That can be set to “True” or “False” and that value is used by the SDK service to tell the GUI if that Resolution State can be edited/deleted or not.
Of course changing the database directly IS NOT SUPPORTED by Microsoft. You do this at your own risk, and if it was me, I would *NEVER* touch, change or remove the default two states (“New” and “Closed”) as THAT really would BREAK the product. For example, Alerts that are not set to “Closed” (255) won’t be ever groomed. And that is VERY BAD. NEVER, NEVER DO THAT.
On the other end, changing a custom Resolution State to make the product believe it is Predefined/Built-in has not had any negative impact in my (limited) testing so far, and has added the advantage of “protecting” my resolution state from unintended deletion, as shown below:
As usual, do this at your own risk. Remember what’s written in my Disclaimer:
The information in this weblog is provided “AS IS” with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my own personal opinion. All code samples are provided “AS IS” without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose. THIS WORK IS NOT ENDORSED AND NOT EVEN CHECKED, AUTHORIZED, SCRUTINIZED NOR APPROVED BY MICROSOFT, AND IT ONLY REPRESENT SOMETHING WHICH I’VE DONE IN MY FREE TIME. NO GUARANTEE WHATSOEVER IS GIVEN ON THIS. THE AUTHOR SHALL NOT BE MADE RESPONSIBLE FOR ANY DAMAGE YOU MIGHT INCUR WHEN USING THIS HACK.
On this website we use first or third-party tools that store small files (cookie) on your device. Cookies are normally used to allow the site to run properly (technical cookies), to generate navigation usage reports (statistics cookies) and to suitable advertise our services/products (profiling cookies). We can directly use technical cookies, but you have the right to choose whether or not to enable statistical and profiling cookies. Enabling these cookies, you help us to offer you a better experience. Cookie and Privacy policy