TAG | nagios
23
Monitoring HP ESXi Hosts using Insight Remote Support
No comments · Posted by wibble in IT, monitoring
This is just a direct link to the HP Blog article itself but worth a read if you are looking at monitoring any HP server running ESX or ESXi. The main bit that I have always found is that you need to install the HP extensions for ESXi installed as this greatly improves what you can see from remote tools such as Insight Remote Support, Nagios/Opsview or from the vSphere client itself.
The link to the article can be found here – http://h30507.www3.hp.com/t5/Technical-Support-Services-Blog/6-Simple-Steps-to-Monitoring-ESXi-with-Insight-Remote-Support/ba-p/100789
I have been playing arond with the check_equallogic Nagios plugin written by Claudio Kuenzler (http://www.claudiokuenzler.com) to monitor some performance and utilisation values for a client and I came across a bug with the code in the latest release which I thought I would share.
The latest release allows you to monitor the size of a single volume as well as a single check to monitor all volumes. I setup the check in Opsview as normal and then proceeded to configure the Host Attributes for the SAN host for each volume on the SAN (there were 75 volumes to monitor). Having added all the checks and reloading Opsview I started to see a large number of OK checks for the volumes but also a number of UNKNOWN outputs from the plugin. Closer inspection showed that when you have two volumes that have the similar names (e.g. BES01-D and DR-BES01-D) the more generic name, BES01-D in this example will match for both volumes and the script will return an unknown value. The DR-BES01-D volume returned the correct stats as the volume name only matched one entry.
Looking through the code in the plugin the line that is causing the issue is:
volarray=$(snmpwalk -v 2c -c ${community} ${host} 1.3.6.1.4.1.12740.5.1.7.1.1.4 | grep -n ${volume} | cut -d : -f1)
When it grep’s the list of volumes from the SNMP walk it returns two values and the script cannot cope so exits. After some playing around (and remembering the basics of writing bash scripts) I managed to work around the problem and changed the line to the following:
volarray=$(eval snmpwalk -v 2c -c ${community} ${host} 1.3.6.1.4.1.12740.5.1.7.1.1.4 | grep -n "\"${volume}\"" | cut -d : -f1)
The change adds the quotation marks that are surrounding the string value that is returned from the SNMPwalk so GREP should only return the exact matches. Having updated the script and re-run the checks the UNKNOWN status was gone and the checks all returned the correct data.
Following on from my post last night about the Windows Updates check on MonitoringExchange a colleague reminded me that we acutally modified the script from there as we weren’t looking for the names of updates to be listed but simply to get the total number of updates that are outstanding. The modified version of the script is listed below for reference and the source for this is at the following URL: https://www.monitoringexchange.org/inventory/Check-Plugins/Operating-Systems/Windows-NRPE/Check-Windows-Updates
<job>
<script language="VBScript">
' Parse command line switches for pending updates
If Wscript.Arguments.Named.Exists("h") Then
Wscript.Echo "Usage: check_win_updates.wsf /w:1 /c:2"
Wscript.Echo "/w: - number of updates before warning status "
Wscript.Echo "/c: - number of updates before critical status "
End If
If Wscript.Arguments.Named.Exists("w") Then
intWarning = Cint(Wscript.Arguments.Named("w"))
Else
intWarning = 0
End If
If Wscript.Arguments.Named.Exists("c") Then
intCritical = Cint(Wscript.Arguments.Named("c"))
Else
intCritical = 0
End If
Set objShell = CreateObject("WScript.Shell")
Dim sysroot
sysroot = objShell.ExpandEnvironmentStrings("%systemroot%")
' Check if the Server is pending a reboot and quit with warning
Set objSysInfo = CreateObject("Microsoft.Update.SystemInfo")
If objSysInfo.RebootRequired Then
Wscript.Echo "Warning: Reboot required | updates=-1"
Wscript.quit(1)
End If
' Dump Software Dist Event log to variable for parsing
Set objExec = objShell.Exec("cmd.exe /c type " & sysroot & "\SoftwareDistribution\ReportingEvents.log")
results = LCase(objExec.StdOut.ReadAll)
res_split = Split(results, vbCrLf)
Dim regEx
Set regEx = New RegExp
regEx.Pattern = "(.)\S*\s*\S*\s*\S*\s*\d\s*(\d*)\s*\S*\s*\S*[0-9\s]*\S*\s*\S*\s*.*\t(.*)"
regEx.IgnoreCase = true
count = 1
ReDim arrDyn(1)
For Each zeile in res_split
firstsign = regEx.Replace(zeile, "$1")
If (firstsign = "{") Then
number = regEx.Replace(zeile, "$2")
finish = regEx.Replace(zeile, "$3")
If (number = 147) Then
count = count + 1
ReDim Preserve arrDyn(count + 1)
arrDyn(count + 1) = finish
End If
End If
Next
mount_updates = -1
For x = 0 to UBound(arrDyn)
If x = UBound(arrDyn) Then
end_array = Split(arrDyn(x), " ")
mount_updates = end_array(UBound(end_array) - 1)
End If
Next
' Quit the script with the appropriate performance data
mount_updates = Cint(mount_updates)
If mount_updates = 0 Then
Wscript.Echo "OK: There are no pending updates | updates=0"
Wscript.Quit(0)
ElseIf mount_updates >= intCritical Then
Wscript.Echo "Critical: There are " & mount_updates & " updates pending | updates=" & mount_updates
Wscript.Quit(2)
ElseIf mount_updates >= intWarning Then
Wscript.Echo "Warning: There are " & mount_updates & " updates pending | updates=" & mount_updates
Wscript.Quit(1)
ElseIf mount_updates < intWarning Then
Wscript.Echo "OK: There are " & mount_updates & " updates pending | updates=" & mount_updates
Wscript.Quit(0)
Else
Wscript.Echo "Unknown: There has been an error"
Wscript.Quit(3)
End If
Wscript.Echo "Unknown: There has been an error"
Wscript.Quit(3)
</script>
</job>
Microsoft · monitoring · nagios · NSClient · opsview · Windows Update
NSClient 0.3.9 was released earlier this month and from the looks of the change log should be a good replacement for 0.3.8. (http://www.nsclient.org/nscp/blog/Blog-2011-07-05). As with previous releases there are both 32-bit and 64-bit variants and the option for an MSI package or for a ZIP download.
Some things I have noticed in the new release (these may have been in 0.3.8 but I never noticed them) are two new external scripts to check Printer status and check Windows Updates. I have been using my own Windows Update script (https://www.monitoringexchange.org/inventory/Check-Plugins/Operating-Systems/Windows-NRPE/Check-Windows-Updates) as I found the ones that query WMI take longer than the default 10 seconds for the script to run without timing out. Giving the bundled script a go it did a good job of outputting some useful information about the Windows Updates however it still took too long to run so I doubt that I will be using this in its current form. The output when running it on my workstation is as follows:
OK: Number of critical updates not installed: 1 <br />Number of software updates not installed: 6 <br /> Critical updates name: Service Pack 1 for Microsoft Office 2010 (KB2510690) 32-bit Edition+
The Printer check also ran through my list of installed printers and came out with an “Unknown” status and the details listed didnt match what Windows was saying so again probably wont be using this in its current format and more likely monitor the printers individually with SNMP based checks directly to the printers.
There are some good additions to the list of modules. CheckTaskSched looks to be a good addition to make sure that those scheduled tasks you have left to run on your server are running as expected and not left stuck in a running state (or didn’t exit with error code 0). CheckFile and CheckFile2 have been amalgamated into the CheckFiles module which will allow you to check a single file but also multiple files for certain criteria. The link above gives examples on checking file versions, line counts, file sizes etc.
For a full list of changes the change log can be found here: http://www.nsclient.org/nscp/blog/Blog-2011-07-05
monitoring · nagios · NSClient · opsview
Having been an avid Nagios/Opsview user for a while I am always keen to see new features that make my life of defining and managing systems easier. I had been meaning to try out the host attributes feature of Opsview for a while to redefine the way I monitor various “generic” features on my infrastructure. Up until now I have had to create an exception for a host that I want to monitor in a slightly different way and remembering what did/didnt have exceptions was never the easiest thing to do.
This has all changed with the Host Attributes feature in Opsview. I can now define a single service check that will take a number of values (currently Opsview 3.7.2 will only let you define one however looking at the SQL database there is capacity for 9 arguments. A forum post from Ton Voon has revealed a patch to the host-attributes tab that allows you to define 4 attributes which should be released in an upcoming release – 3.7.3 maybe). This means that I can define a host attribute (e.g. DISK) and then set in this the partition/disk name and the warning/critical values in different arguments to make sure that I can reduce the number of custom service checks or exceptions that I need to define.
I have managed to abstract my Disk space checks and also some checks for Exchange Information Store sizes across my organisation. I plan to try and further abstract other generalised items of monitoring (e.g. Windows Services, Performance counters etc).
Once I had created these checks I needed to add in a viewport to display the status of my Information Stores. In the past this used to be setup individually on each host and service check manually. In the latest release its possible to create a new keyword and then add in the host/services that you want from the Keywords tab. This has made the process of making new views/displays easier and made the monitoring much simpler.
When I get some time I will put up some pictures to go with this article and expand on my ability to monitor network interfaces with the latest version of Opsview.
abstraction · attributes · monitoring · nagios · opsview · snmp
Following on from my Symantec AV check I have written a first version of a similar check for E-Trust virus definitions. The format and structure to the check is the same as this check but it should return the relevant information for Computer Assoicates E-Trust Antivirus product.
For details on installation and configuration please check out the previous post. For the source code please check out the details below. If you wish to download this from Monitoring Exchange please use this link.
' Script: check_etrust_av.vbs ' Author: Matt White ' Version: 1.0 ' Date: 12-03-2010 ' Details: Check the current definitions for E-Trust AntiVirus are within acceptable bounds ' Usage: cscript /nologo check_etrust_av.vbs -w:-c: ' Define Constants for the script exiting Const intOK = 0 Const intWarning = 1 Const intCritical = 2 Const intUnknown = 3 ' Parse Arguments to find Warning and Critical Levels If Wscript.Arguments.Named.Exists("w") Then intWarnLevel = Cint(Wscript.Arguments.Named("w")) Else intWarnLevel = 2 End If If Wscript.Arguments.Named.Exists("c") Then intCritLevel = Cint(Wscript.Arguments.Named("c")) Else intCritLevel = 4 End If ' Define Date Regular Expression Const strDateRegExp = "(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d" ' Create required objects Set objShell = CreateObject("Wscript.Shell") Set ObjProcess = ObjShell.Environment("Process") Set objRegExp = New RegExp Set objReg=GetObject("winmgmts:{impersonationLevel=impersonate}!\\.\root\default:StdRegProv") const HKEY_CURRENT_USER = &H80000001 const HKEY_LOCAL_MACHINE = &H80000002 ' read the path of E-Trust Anti-Virus from the registry strKeyPath = "SOFTWARE\ComputerAssociates\ScanEngine\Path" objReg.GetStringValue HKEY_LOCAL_MACHINE,strKeyPath,"Engine",strScanEnginePath If TypeName(StrScanEnginePath) = "Null" Then WScript.Echo "UKNOWN: Cannot read registry Info. Is E-Trust installed?" Wscript.Quit(intUnknown) End If 'strScanEnginePath = ObjShell.RegRead("HKLM\SOFTWARE\ComputerAssociates\ScanEngine\Path\Engine") ' Determine CPU architecture for correct executable to run strCPUArch = objProcess("PROCESSOR_ARCHITECTURE") If InStr(1, strCPUArch, "x86") > 0 Then strExecutable = "\inocmd32.exe" ElseIf InStr(1, strCPUArch, "64") > 0 Then strExecutable = "\inocmd64.exe" End If ' If the path doesnt exist Exit with an Unknown status If Len(StrScanEnginePath) = 0 Then Wscript.Echo "UNKNOWN: Unable to read registry path" Wscript.Quit(intUnknown) End If ' Run the command and read the output into a string Set objExec = objShell.Exec(strScanEnginePath & strExecutable & " /sig") strVirusDefs = objExec.StdOut.ReadAll() ' Search the Virus definition for the date using Regular Expression objRegExp.Pattern = strDateRegExp objRegExp.Global = True objRegExp.IgnoreCase = True Set regExpMatch = objRegExp.Execute(strVirusDefs) ' If date not found in the output. Exit with a warning If regExpMatch.Count = 0 Then Wscript.Echo "UNKNOWN: Unable to read date from the output" Wscript.Quit(intUnknown) End If intDateDifference = DateDiff("d",CDate(regExpMatch(0).Value), Date) Wscript.Echo strVirusDefs If intDateDifference > intCritLevel Then Wscript.Quit(intCritical) ElseIf intDateDifference > intWarnLevel Then Wscript.Quit(intWarning) ElseIf intDateDifference <= intWarnLevel Then Wscript.Quit(intOK) End If Wscript.Quit(intUnknown)
Anti-Virus · CA · nagios · NRPE · opsview
As I start to write/modify more checks and scripts for monitoring applications in Nagios/Opsview I have decided to share these as much as possible with the community so they can enjoy, and if necessary, improve the scripts I have written. I have decided to use the MonitoringExchange.org website to host my scripts (as well as detailing them on this blog) as I have found a number of good scripts here that do what I wanted them to.
All the scripts should appear as projects under my profile (wibble) with a link back to the same script on the blog here. I will also endeavour to post the link to Monitoring Exchange in the bottom of the blog post.
1
Nagios/Opsview: Check Symantec AV Definitions
5 Comments · Posted by wibble in IT, monitoring
This morning whilst deploying a modified version of the Symantec Anti-Virus check from MonitoringExchange.org I noticed that on my 64-bit hosts that the check was not returning the correct data and instead of the expected output I was receiving the following error code:
check_av.vbs(51, 1) Microsoft VBScript runtime error: Type mismatch: 'strValue'
Initially I thought this could be a change due to the new installs being Symantec Endpoint Protection compared to the previous times I had implemented this using Symantec Anti-Virus 10.x but the SEP installs on the 32-bit systems were working fine however the 64-bit versions were not.
A quick look in the registry showed me that the value that is read by the script is not there on the 64-bit version and has been moved to another location (HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Symantec\SharedDefs\DefWatch). I sat down with the script and quickly wrote in some extra code that would allow me to change the search path depending on the Operating System Architecture. I also added in some more error checking so if the key didnt exist then rather than exiting with an OK status it returns an UNKNOWN status and a relevant error message.
As I use NSClient++ to enable me to monitor my Windows servers I simply save the script to the NSClient++\scripts folder and add the following line into my NSCI.ini under [NRPE Handlers]
check_av=cscript.exe //NoLogo scripts\check_av.vbs /W:$ARG1$ /c:$ARG2$
Then from within Nagios or Opsview define the command for check_nrpe
check_nrpe -H $HOSTADDRESS$ -c check_av -a 2 3
The full script is listed below and is also available on Monitoring Exchange (link):
' Script: check_av.vbs
' Author: Matt White
' Version: 1.1
' Date: 01-03-2010
' Details: Check the current definitions for Symantec AntiVirus are within acceptable bounds
' Usage: cscript /nologo check_av.vbs -w:<days> -c:<days>
' Define Constants for the script exiting
Const intOK = 0
Const intWarning = 1
Const intCritical = 2
Const intUnknown = 3
' Create required objects
Set ObjShell = CreateObject("WScript.Shell")
Set ObjProcess = ObjShell.Environment("Process")
const HKEY_CURRENT_USER = &H80000001
const HKEY_LOCAL_MACHINE = &H80000002
Dim strKeyPath, strSymantecVer
Dim intWarnLevel, intCritLevel, intYear, intMonth , intDay, intVer_Major, intDateDifference
Dim year, Month , Day, Ver_Major
Dim arrValue
' Parse Arguments to find Warning and Critical Levels
If Wscript.Arguments.Named.Exists("w") Then
intWarnLevel = Cint(Wscript.Arguments.Named("w"))
Else
intWarnLevel = 2
End If
If Wscript.Arguments.Named.Exists("c") Then
intCritLevel = Cint(Wscript.Arguments.Named("c"))
Else
intCritLevel = 4
End If
' Determine CPU architecture for correct location of the registry key
strCPUArch = objProcess("PROCESSOR_ARCHITECTURE")
If InStr(1, strCPUArch, "x86") > 0 Then
strKeyPath = "SOFTWARE\Symantec\SharedDefs\DefWatch"
ElseIf InStr(1, strCPUArch, "64") > 0 Then
strKeyPath = "SOFTWARE\Wow6432Node\Symantec\SharedDefs\DefWatch"
End If
' Query Registry using WMI to obtain the definition value
Set oReg=GetObject("winmgmts:{impersonationLevel=impersonate}!\\.\root\default:StdRegProv")
oReg.GetBinaryValue HKEY_LOCAL_MACHINE,strKeyPath,"DefVersion",arrValue
' If the query doesnt return an array Quit - Unknown
If isArray(arrValue) = vbFalse Then
Wscript.Echo "UNKNOWN - Unable to read Definitions from the Registry"
Wscript.Quit(intUnknown)
End If
' Generate output from the registry value
intYear = CLng("&H" & hex(arrValue(1)) & hex(arrValue(0)))
intMonth = CLng("&H" & hex(arrValue(3)) & hex(arrValue(2)))
intDay = CLng("&H" & hex(arrValue(7)) & hex(arrValue(6)))
intVer_Major = CLng("&H" & hex(arrValue(17)) & hex(arrValue(16)))
strSymantecVer= intYear & "-" & intMonth & "-" & intDay & " rev. " & intVer_Major
intDateDifference = DateDiff("d", intYear & "/" & intMonth & "/" & intDay, Date)
' Output current version and definition age as Performance data
Wscript.Echo("Current Definitions: " & strSymantecVer & " Which are " & intDateDifference & " days old" & "|age=" & intDateDifference)
If intDateDifference > intCritLevel Then
Wscript.Quit(intCritical)
ElseIf intDateDifference > intWarnLevel Then
Wscript.Quit(intWarning)
ElseIf intDateDifference <= intWarnLevel Then
Wscript.Quit(intOK)
End If
Wscript.Quit(intUnknown)
Anti-Virus · nagios · NRPE · opsview · Symantec
9
Monitoring ESXi Server health using Nagios/Opsview
3 Comments · Posted by wibble in IT, monitoring
As part of a project I am currently working on I have a requirement to check that my clients’ infrastructure is working to the best of its ability. Whilst we perform regular checks to ensure the sites are running as expected we don’t currently have an easy way to check the health of the ESX hosts that the virtual servers run on. Until now.
I had spent a lot of time trying to “hack” SNMP to be enabled on the ESXi boxes which involved editing the snmp.xml file in the “unsupported” console on the host but after enabling this found that it didnt give me the data I was looking for to run my checks against. Looking a bit further I found a python script which queries the CIM service on the ESX host to find out whether the hardware is working as expected. The script uses the CIM service to check the ESX Health Status and report back to your monitoring platform what the current status of the host is.
Installation is fairly straightforward. The following details are for an Opsview install running on Ubuntu 8.04LTS server but should be easily adaptable to any installation if needs be.
First login to your server as normal and download the latest version of the pywbem module (http://archive.ubuntu.com/ubuntu/pool/universe/p/pywbem/pywbem_0.7.0.orig.tar.gz)
opsview@LON-SVR-MON1:~$ wget http://archive.ubuntu.com/ubuntu/pool/universe/p/pywbem/pywbem_0.7.0.orig.tar.gz
Once you have downloaded the module extract and run the python installer as root
opsview@LON-SVR-MON1:~$ tar -xzf pywbem_0.7.0.orig.tar.gz
opsview@LON-SVR-MON1:~$ cd pywbem-0.7.0/
opsview@LON-SVR-MON1:~/pywbem-0.7.0$ sudo python setup.py install
Next you need to download the check_esx_wbem.py script (http://communities.vmware.com/docs/DOC-7170) and place it in your libexec folder
opsview@LON-SVR-MON1:~/pywbem-0.7.0$ cd /usr/local/nagios/libexec/ opsview@LON-SVR-MON1:/usr/local/nagios/libexec# wget http://communities.vmware.com/servlet/JiveServlet/downloadBody/7170-102-5-4233/check_esx_wbem.py opsview@LON-SVR-MON1:/usr/local/nagios/libexec# sudo chown nagios:nagios check_esx_wbem.py opsview@LON-SVR-MON1:/usr/local/nagios/libexec# sudo chmod a+x check_esx_wbem.py
You can test this from the command line using the following command
opsview@LON-SVR-MON1:/usr/local/nagios/libexec# ./check_esx_wbem.py https://10.9.0.65:5989 root Password
In the case above I received the following output but if everything is working as expected the script should return “OK”
WARNING : Power Supply 3 Power Supplies<br>CRITICAL : Power Supply 2 Power Supply 2: Failure detected<br>
Now we have confirmed the script is running we need to add it to Opsview. The first step here is to reload Opsview to pickup the new plugin. Once complete goto Configuration -> Service Checks and Create New Service Check. Setup your check in a similar way to the image below (remember to substitute “root” and “Password” with a valid username and password to login to your ESX host
Save this service check and then apply this to your ESX hosts. If you have multiple ESX hosts that have different username and passwords then you don’t need to create multiple Service Checks as the later versions of Opsview let you specify exceptions when you configure the check for a host
Once you have configured this reload Opsview and wait for Opsview to start checking the ESX server(s). Below is the screenshot from my server with its disconnected PSU
This should now allow you to keep an eye on your ESX hosts alongside the rest of your network monitoring system.
dell · esx · esxi · health · hp · monitoring · nagios · opsview











