I am testing the beta bits of the cross-platform extensions that were released on Microsoft Connect
This post wants to describe my limited testing so far – I hope this can benefit/help everyone testing the beta for some stuff that might currently not be incredibly clear – unless you attended the MMS class, at least :-))
I started out with the White Paper that has been posted on the web, which describes the architecture pretty well, but from a higher level (with diagrams and the like). Then I downloaded the beta bits, which contain another document about setting the thing up. It is pretty well done, to be honest (especially if you consider that it is beta documentation for a beta product!), but it does not really go all the way down to troubleshooting things a lot, yet. I will try to cover some of that here.
I installed the agent manually – it’s just a RPM package, not much that can go wrong with that. There is a reason why I did not use the push discovery and deployment of the agent, which you will figure out reading later on. Once installed, I tried to figure out how things were looking like on the linux machine. It is all pretty understandable, after all, if you look around on the machine (documented or not, linux and open source stuff is easy to figure out by reading configuration files and the like, and by searching on the web).
Basically the “agent” is not properly an "agent" the way the windows agent is, since it does not really "sends" stuff to the Management Server on its own: It consists of a couple of services/daemons, based on existing opensource projects, but configured in their own folder, with their own name, and using different ports than a standard install of those, not to conflict with possible existing ones on those machines.
The Management Service uses these services remotely (similar to doing agentless monitoring towards a windows box) using these services. The two services are:
- scx-cimd which implements the CIM daemon (openpegasus.org)
- scx-wsmand which implements Ws-Man daemon (openwsman.org)
It is easy to figure out how they are layed out. Even if undocumented, you look at the processes
and you can figure out WHERE they live (/opt/microsoft/scx/bin/….) and where their configuration files are located (/etc/opt/microsoft/scx/conf …).
The files are self explanatory, and the documentation of the opensource projects can be found on the Internet:
- at openwsman.org (for wsmand)
- at openpegasus site (http://www.openpegasus.org/documents.tpl?CALLER=doc.tpl&dcat= )
- on the openpegasus wiki (http://wiki.opengroup.org/pegasus-wiki/doku.php?id=start )
- at the linux management IBM page http://www.ibm.com/developerworks/linux/library/os-ltc-systemsmanagement/
I still have to delve into them properly as I would like to, but I already figured out a bunch of interesting things by quickly looking at them.
Agent Communication someone must have decided to “recycle” the 1270 port number that was used in MOM2005 🙂 Basically openwsman listens as a SSL listener (with basic auth – connected via PAM module with the “regular” unix /etc/passwd users, so you can authenticate as those without having to define specific users for the service). So all that happens is that the Management Server asks things/executes WS-Man queries and commands on this channel. The Management Server connects every time to the agent on port 1270 using SSL, authenticates as “root” (or as the specified "Action Account") and does its stuff, or asks the agent to do it. So the communication is happening from the Management Server to the agent… not the other way around like it happens with Windows "agents". That’s why it feels to me more like an “agentless” thing, at least for what concerns the “direction” of traffic and who does the actual querying.
For the rest, the provided Management Packs have “normal” discoveries and “normal” monitors. Pretty much like the Windows Management Packs often discover thing by querying WMI, here they use WS-Man to run CIM queries against the Unix boxes.
The Service Model is totally cool to actually *SEE* in action, don’t you think so ?
A few more debugging/troubleshooting information:
I searched a bit and found the openwsman.org documentation and forum to be useful to figure some things out. For example I banged my head a few times before managing to actually TEST a query from windows to linux using WINRM. This document helped a lot.
Of course you have to solve some other things such as DNS resolution AND trusting the self-issued certificates that the agent uses, first. Once you have done that, you can run test queries from the Windows box towards the Unix ones by using WinRM.
For example, this is how I tested what the discovery for a Linux RedHat Computer type should be returning (I read that by opening the MP in authoring console, as one would usually do for any MP):
winrm enumerate http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_OperatingSystem?__cimnamespace=root/scx -username:root -password:password -r:https://centos:1270/wsman -auth:basic
If you need to test the query directly *ON* the linux box (querying the CIMD instead than WSMAND), the WBEMEXEC utility is packaged with the agent (under /opt/microsoft/scx/bin/tools ). It is not as easy as some windows administrators (that have used WBEMTEST or WMI Tools in the past) would hope, but not even that bad. Just to run a few queries to the CIM daemon locally it is not really interactive, so you need to create a XML file that looks like the following (basically you build the RAW request the way the CIMD accepts it):
<?xml version="1.0" ?>
<CIM CIMVERSION="2.0" DTDVERSION="2.0">
<MESSAGE ID="50000" PROTOCOLVERSION="1.0">
Once you have made such a file, you can execute the query in the file with the tool like the following:
As you can see from here, CIMD uses HTTP already. This differs from Windows' WMI that uses RPC/DCOM. In a way, this is much simpler to troubleshoot, and more firewall-friendly.
I have not really found an activity or debug log for any of those components, yet… but in the end they are not doing anything ON THEIR OWN, unless asked by the MS…. So the “healthservice” logic is all on the MS anyway. Errors about failed discoveries, permissions of the Action Account user, and anything else will be logged by the HealthService on the Windows machine (the Management Server) that is actually performing monitoring towards the Unix box.
It really is *just* getting the WMI and WinRM-equivalent layer on linux/Unix up and running– after that, everything is done from windows anyway!
After this common management infrastructure has been provided, 3rd parties will be facilitated in writing *just* MPs, without having to worry about the TRANSPORT of information anymore.
As you have probably noticed from the screenshots and commandlines, I don’t have a “real” Redhat Enterprise Linux or “supported” linux distribution… Therefore I started my testing using CentOS 5 (which is very similar to RHEL 5) – the agent installed fine as you can see, but I was not getting anything really “discovered” – the MP had only found a “linux computer” but was not finding any “RedHat” or “SuSe” or any other "Operating System" instances… and if you are somewhat familiar with the way Operations Manager targeting works, you would understand that monitors are targeted at object classes. If I don't have any instance of those objects being discovered, NO MONITORING actually happens, even if the infrastructure is in place and the pieces are talking to each other:
Therefore my machine was not being monitored.
In the end, I actually even got it to work, but I had to create a new Management Pack (exporting and modifying the RHEL5 one as a base) that would actually search for different Property values and discover CentOS instead as if it were RedHat:
After importing my hacked Management Pack the machine started to be monitored. Here you can see Health Explorer in all of its glory:
Of course this is a hack I made just to have a test setup somewhat working and to familiarize myself with the SCX components. It is not guaranteed that my Management pack actually works on CentOS the way it is supposed to work and that there aren't other – more subtle – differences between RedHat and CentOS that will make it fail. I only modified a couple of Discoveries to let it discover the "Operating System" instance… everything else should follow, but not necessarily. One difference you see already in the screenshot above is that I am not yet seeing the hardware being monitored, so my hack is already only partially working and it is definitely something that won't be supported, so I cannot provide it here. Also, this is a beta, so I I think that the Management Packs will be re-released with following beta versions, and this change is something that would need to be re-done all over again. Also, the unsupported distribution is the reason why I installed the agent manually in the first place, as the "Discovery Wizard" would not really "agree" to go and let me install the agent remotely on an unsupported "platform!".
But I could not wait to see this working, while waiting two business days (we are on a weekend!) for confirmation that I am allowed to actually download a 30-day-unsupported-Trial of the "real" RedHat Enteprise Linux, so I cheated 🙂
The information in this weblog is provided "AS IS" with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my own personal opinion. All code samples are provided "AS IS" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.
THIS WORK IS NOT ENDORSED AND NOT EVEN CHECKED, AUTHORIZED, SCRUTINIZED NOR APPROVED BY MY EMPLOYER, AND IT ONLY REPRESENT SOMETHING WHICH I'VE DONE IN MY FREE TIME. NO GUARANTEE WHATSOEVER IS GIVEN ON THIS. THE AUTHOR SHALL NOT BE MADE RESPONSIBLE FOR ANY DAMAGE YOU MIGHT INCUR WHEN USING THIS INFORMATION.