Monitoring Alarm Status on Juniper EX Switches

I am in the process of installing a number of Juniper EX2200, EX3200 and EX4200 switches for a client and as part of the setup need to be able to monitor the switches for any alarms  (eg Switch Management interface down or Switch booted from Backup Partition) and have them dealt with accordingly.

Having a look at the SNMP OID tree for the EX switches I came across the following useful table

http://www.oidview.com/mibs/2636/JUNIPER-ALARM-MIB.html

Object Name Object Identifier
jnxAlarms jnxAlarms 1.3.6.1.4.1.2636.3.4
jnxCraftAlarms jnxCraftAlarms 1.3.6.1.4.1.2636.3.4.2
jnxAlarmRelayMode jnxAlarmRelayMode 1.3.6.1.4.1.2636.3.4.2.1
jnxYellowAlarms jnxYellowAlarms 1.3.6.1.4.1.2636.3.4.2.2
jnxYellowAlarmState jnxYellowAlarmState 1.3.6.1.4.1.2636.3.4.2.2.1
jnxYellowAlarmCount jnxYellowAlarmCount 1.3.6.1.4.1.2636.3.4.2.2.2
jnxYellowAlarmLastChange jnxYellowAlarmLastChange 1.3.6.1.4.1.2636.3.4.2.2.3
jnxRedAlarms jnxRedAlarms 1.3.6.1.4.1.2636.3.4.2.3
jnxRedAlarmState jnxRedAlarmState 1.3.6.1.4.1.2636.3.4.2.3.1
jnxRedAlarmCount jnxRedAlarmCount 1.3.6.1.4.1.2636.3.4.2.3.2
jnxRedAlarmLastChange jnxRedAlarmLastChange 1.3.6.1.4.1.2636.3.4.2.3.3

I have used the jnxRedAlarmCount and jnxYellowAlarmCount oid values as basic Opsview SNMP Service Checks to give me an initial overview but in the long term will be looking to combine this into a full service check script that can be used to check a number of different things.

The setup of the Service Check in Opsview is fairly simple and below are screenshots of the config that I have for each service check.

All you need to configure on your hosts is the SNMP community string and you can apply these checks individually or via a Host Template.

Once I performed a reload I could see the following in Opsview for one of my switches:

A bit of inspection showed that the Red Alarm was for the Management Interface being down (but wasnt being used on this switch) and the Yellow alarm was due to not setting a rescue configuration. I cleared the alarms by isuing the following commands

edit
set chassis alarms management-interface link-down ignore
commit and-quit
request system configuration rescue save

Now when I refresh the checks in Opsview I get an OK state for both checks

Opsview – patch for check_route plugin

I was playing around with the check_route plugin and noticed a few issues with it not running. In order to get it to work on my Opsview boxes I had to install a new package, change some settings on the traceroute program and then make a patch in the script itself.

First thing you need to do is download the traceroute package if its not already installed

sudo apt-get install traceroute

Once installed you will find that the plugin will fail and show the following error:

The specified type of tracerouting is allowed for superuser only
Can't use an undefined value as an ARRAY reference at ./check_route line 129.

Googling the first line I found that you have to setuid root for the traceroute binary

chmod u+s /usr/sbin/traceroute

Trying the plugin again you get the following error

Use of uninitialized value $time_units in string eq at ./check_route line 114.
ROUTE UNKNOWN - Cannot cope with line 'traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets'

To get around this you need the plugin to ignore the first line of the output from the traceroute which can be done with the following patch

http://snipt.net/mattywhi/opsview-check_route-diff/

Now the script runs as expected and you get the following output

ROUTE OK - Time taken is 145.895 ms | total_time=145.895ms;5000;100000 hops=14;; route_change=0;;


 

Monitoring HP ESXi Hosts using Insight Remote Support

This is just a direct link to the HP Blog article itself but worth a read if you are looking at monitoring any HP server running ESX or ESXi. The main bit that I have always found is that you need to install the HP extensions for ESXi installed as this greatly improves what you can see from remote tools such as Insight Remote Support, Nagios/Opsview or from the vSphere client itself.

The link to the article can be found here – http://h30507.www3.hp.com/t5/Technical-Support-Services-Blog/6-Simple-Steps-to-Monitoring-ESXi-with-Insight-Remote-Support/ba-p/100789

Nagios Windows Updates check

Following on from my post last night about the Windows Updates check on MonitoringExchange a colleague reminded me that we acutally modified the script from there as we weren’t looking for the names of updates to be listed but simply to get the total number of updates that are outstanding. The modified version of the script is listed below for reference and the source for this is at the following URL: https://www.monitoringexchange.org/inventory/Check-Plugins/Operating-Systems/Windows-NRPE/Check-Windows-Updates

<job>
  <script language="VBScript">
    ' Parse command line switches for pending updates
    If Wscript.Arguments.Named.Exists("h") Then
      Wscript.Echo "Usage: check_win_updates.wsf /w:1 /c:2"
      Wscript.Echo "/w: - number of updates before warning status "
      Wscript.Echo "/c: - number of updates before critical status "
    End If
    If Wscript.Arguments.Named.Exists("w") Then
      intWarning = Cint(Wscript.Arguments.Named("w"))
    Else
      intWarning = 0
    End If
    If Wscript.Arguments.Named.Exists("c") Then
      intCritical = Cint(Wscript.Arguments.Named("c"))
    Else
      intCritical = 0
    End If
    Set objShell = CreateObject("WScript.Shell")
    Dim sysroot
    sysroot = objShell.ExpandEnvironmentStrings("%systemroot%")
    ' Check if the Server is pending a reboot and quit with warning
    Set objSysInfo = CreateObject("Microsoft.Update.SystemInfo")
    If objSysInfo.RebootRequired Then
      Wscript.Echo "Warning: Reboot required | updates=-1"
      Wscript.quit(1)
    End If
    ' Dump Software Dist Event log to variable for parsing
    Set objExec = objShell.Exec("cmd.exe /c type " & sysroot & "\SoftwareDistribution\ReportingEvents.log")
    results = LCase(objExec.StdOut.ReadAll)
    res_split = Split(results, vbCrLf)
    Dim regEx
    Set regEx = New RegExp
    regEx.Pattern = "(.)\S*\s*\S*\s*\S*\s*\d\s*(\d*)\s*\S*\s*\S*[0-9\s]*\S*\s*\S*\s*.*\t(.*)"
    regEx.IgnoreCase = true
    count = 1
    ReDim arrDyn(1)
    For Each zeile in res_split
      firstsign = regEx.Replace(zeile, "$1")
      If (firstsign = "{") Then
                number = regEx.Replace(zeile, "$2")
        finish = regEx.Replace(zeile, "$3")
                If (number = 147) Then
          count = count + 1
          ReDim Preserve arrDyn(count + 1)
                  arrDyn(count + 1) = finish
        End If
      End If
    Next
    mount_updates = -1
    For x = 0 to UBound(arrDyn)
      If x = UBound(arrDyn) Then
                      end_array = Split(arrDyn(x), " ")
                      mount_updates = end_array(UBound(end_array) - 1)
      End If
    Next
    ' Quit the script with the appropriate performance data
    mount_updates = Cint(mount_updates)
    If mount_updates = 0 Then
      Wscript.Echo "OK: There are no pending updates | updates=0"
      Wscript.Quit(0)
    ElseIf mount_updates >= intCritical Then
      Wscript.Echo "Critical: There are " & mount_updates & " updates pending | updates=" & mount_updates
      Wscript.Quit(2)
    ElseIf mount_updates >= intWarning Then
      Wscript.Echo "Warning: There are " & mount_updates & " updates pending | updates=" & mount_updates
      Wscript.Quit(1)
    ElseIf mount_updates < intWarning Then
      Wscript.Echo "OK: There are " & mount_updates & " updates pending | updates=" & mount_updates
      Wscript.Quit(0)
    Else
      Wscript.Echo "Unknown: There has been an error"
      Wscript.Quit(3)
    End If
    Wscript.Echo "Unknown: There has been an error"
    Wscript.Quit(3)
  </script>
</job>

NSClient 0.3.9 released

NSClient 0.3.9 was released earlier this month and from the looks of the change log should be a good replacement for 0.3.8. (http://www.nsclient.org/nscp/blog/Blog-2011-07-05). As with previous releases there are both 32-bit and 64-bit variants and the option for an MSI package or for a ZIP download.

Some things I have noticed in the new release (these may have been in 0.3.8 but I never noticed them) are two new external scripts to check Printer status and check Windows Updates. I have been using my own Windows Update script (https://www.monitoringexchange.org/inventory/Check-Plugins/Operating-Systems/Windows-NRPE/Check-Windows-Updates) as I found the ones that query WMI take longer than the default 10 seconds for the script to run without timing out. Giving the bundled script a go it did a good job of outputting some useful information about the Windows Updates however it still took too long to run so I doubt that I will be using this in its current form. The output when running it on my workstation is as follows:

OK: Number of critical updates not installed: 1 <br />Number of software updates not installed: 6 <br /> Critical updates name: Service Pack 1 for Microsoft Office 2010 (KB2510690) 32-bit Edition+

The Printer check also ran through my list of installed printers and came out with an “Unknown” status and the details listed didnt match what Windows was saying so again probably wont be using this in its current format and more likely monitor the printers individually with SNMP based checks directly to the printers.

There are some good additions to the list of modules. CheckTaskSched looks to be a good addition to make sure that those scheduled tasks you have left to run on your server are running as expected and not left stuck in a running state (or didn’t exit with error code 0). CheckFile and CheckFile2 have been amalgamated into the CheckFiles module which will allow you to check a single file but also multiple files for certain criteria. The link above gives examples on checking file versions, line counts, file sizes etc.

For a full list of changes the change log can be found here: http://www.nsclient.org/nscp/blog/Blog-2011-07-05

Opsview: Host Attributes and Keywords

Having been an avid Nagios/Opsview user for a while I am always keen to see new features that make my life of defining and managing systems easier. I had been meaning to try out the host attributes feature of Opsview for a while to redefine the way I monitor various “generic” features on my infrastructure. Up until now I have had to create an exception for a host that I want to monitor in a slightly different way and remembering what did/didnt have exceptions was never the easiest thing to do.

This has all changed with the Host Attributes feature in Opsview. I can now define a single service check that will take a number of values (currently Opsview 3.7.2 will only let you define one however looking at the SQL database there is capacity for 9 arguments. A forum post from Ton Voon has revealed a patch to the host-attributes tab that allows you to define 4 attributes which should be released in an upcoming release – 3.7.3 maybe). This means that I can define a host attribute (e.g. DISK) and then set in this the partition/disk name and the warning/critical values in different arguments to make sure that I can reduce the number of custom service checks or exceptions that I need to define.

I have managed to abstract my Disk space checks and also some checks for Exchange Information Store sizes across my organisation. I plan to try and further abstract other generalised items of monitoring (e.g. Windows Services, Performance counters etc).

Once I had created these checks I needed to add in a viewport to display the status of my Information Stores. In the past this used to be setup individually on each host and service check manually. In the latest release its possible to create a new keyword and then add in the host/services that you want from the Keywords tab. This has made the process of making new views/displays easier and made the monitoring much simpler.

When I get some time I will put up some pictures to go with this article and expand on my ability to monitor network interfaces with the latest version of Opsview.

Rancid email notification issues

Just spent a few days getting RANCID setup on one of my monitoring servers to backup my device configs on a daily basis. Whilst setting it up I followed a number of guides to get my config files setup and checked. The one thing I couldnt get to work however was the email when RACID detected a config change on one of the network devices.

Scouring the Internet I couldnt find what I had missed. Postfix was setup correctly and I could use the aliases I setup in /etc/alises if i “telnet localhost 25″ and mail was delivered. In the end looking at the update logs I could see a line saying it couldnt find sendmail.

A quick look at racnid_control and I updated the lines that referenced sendmail to include a full path to /usr/sbin/sendmail and low and behold my inbox was full of config changes this morning.

I’m sure that if I was able to get the money to buy Opsview Enterprise I would make full use of the RANCID module within this but for the moment this works well enough for me.

My next goal is to get SNMP Trap processing setup so that if the appropriate trap is received from a monitored device it will pull the latest config down and we will always have the latest config.

Publishing scripts to Monitoring Exchange

As I start to write/modify more checks and scripts for monitoring applications in Nagios/Opsview I have decided to share these as much as possible with the community so they can enjoy, and if necessary, improve the scripts I have written. I have decided to use the MonitoringExchange.org website to host my scripts (as well as detailing them on this blog) as I have found a number of good scripts here that do what I wanted them to.

All the scripts should appear as projects under my profile (wibble) with a link back to the same script on the blog here.  I will also endeavour to post the link to Monitoring Exchange in the bottom of the blog post.

ESXi enabling SNMP

Last night I wrote an article about how to monitor the health of an ESXi server (link here) and I wanted to explain a bit more about my findings with SNMP on an ESXi host.

My goal with the monitoring was to use the check_dell and check_hp commands I have found for Nagios/Opsview to monitor the hardware that ESX is running on. The ESXi installs I am working with are using the Dell and HP management agents installed so I thought that everything would work out of the box and enabling SNMP would let me query the different aspects of the hardware.

The official line from VMWare was that SNMP is not enabled on ESXi and with no console cant be enabled. I knew however, having read a recent post on the TechHead blog (link here) that you could see the snmp.xml file and this shows that it is not enabled which made me think it must be possible to enable it. I was right.

A quick google came up with this article and I had a look and this was a fairly simple process to run:

First you need to enter the “unsupported” console on your ESXi server. To do this press Ctrl+Alt+F1 at your ESX console, now type the word unsupported (N.B. you will not see the text on your screen) and press Enter. If all goes well you should see a password prompt, enter your root password here and you should get a warning you are entering a mode that should only be enabled with VMWare support and be presented with a console.

type the following command to enter the VI text editor and start to modify the snmp.xml file:

vi /etc/vmware/snmp.xml

You should see a single line of text at the top of the screen which is the contents of the xml file. Press i to enter Insert mode and change

<enabled>false</enabled>

to

<enabled>true</enabled>

Then scroll across and add the community name you want the SNMP agent to respond on and place this between the following tags

<communities></communities>

so it should look like

<communities>public</communities>

I wasnt interested in setting up SNMP traps so left this blank and quit the VI editor by press Esc to exit insert mode and then :wq to write the file and quit the editor.

Finally we need to restart the services on the esx host which can be done with the following command

/sbin/services.sh restart

Great, SNMP is now enabled so I should be able to get the information from the HP/Dell management agents that I want. Wrong. My snmpwalk of the host provided little to no useful information about what I was trying to unlock.

opsview@LON-SVR-MON1:~$ snmpwalk -v 2c -c public 10.9.0.65
SNMPv2-MIB::sysDescr.0 = STRING: VMware ESX 4.0.0 build-219382 VMware, Inc. x86_64
SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.6876.4.1
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (6061646) 16:50:16.46
SNMPv2-MIB::sysContact.0 = STRING: not set
SNMPv2-MIB::sysName.0 = STRING: lon-svr-esx2.domain.local
SNMPv2-MIB::sysLocation.0 = STRING: not set
SNMPv2-MIB::sysServices.0 = INTEGER: 72
SNMPv2-MIB::sysORLastChange.0 = Timeticks: (0) 0:00:00.00
SNMPv2-MIB::sysORID.1 = OID: SNMPv2-MIB::snmpMIB
SNMPv2-MIB::sysORID.2 = OID: IF-MIB::ifMIB
SNMPv2-MIB::sysORID.3 = OID: SNMPv2-SMI::enterprises.6876.1.10
SNMPv2-MIB::sysORID.4 = OID: SNMPv2-SMI::enterprises.6876.2.10
SNMPv2-MIB::sysORID.5 = OID: SNMPv2-SMI::enterprises.6876.3.10
SNMPv2-MIB::sysORDescr.1 = STRING: SNMPv2-MIB, RFC 3418
SNMPv2-MIB::sysORDescr.2 = STRING: IF-MIB, RFC 2863
SNMPv2-MIB::sysORDescr.3 = STRING: VMWARE-SYSTEM-MIB, REVISION 200801120000Z
SNMPv2-MIB::sysORDescr.4 = STRING: VMWARE-VMINFO-MIB, REVISION 200810230000Z
SNMPv2-MIB::sysORDescr.5 = STRING: VMWARE-RESOURCES-MIB, REVISION 200810150000Z
SNMPv2-MIB::sysORUpTime.1 = Timeticks: (0) 0:00:00.00
SNMPv2-MIB::sysORUpTime.2 = Timeticks: (0) 0:00:00.00
SNMPv2-MIB::sysORUpTime.3 = Timeticks: (0) 0:00:00.00
SNMPv2-MIB::sysORUpTime.4 = Timeticks: (0) 0:00:00.00
SNMPv2-MIB::sysORUpTime.5 = Timeticks: (0) 0:00:00.00
IF-MIB::ifNumber.0 = INTEGER: 4
IF-MIB::ifDescr.1 = STRING: Device vmnic0 at 02:00.0 bnx2
IF-MIB::ifDescr.2 = STRING: Device vmnic1 at 02:00.1 bnx2
IF-MIB::ifDescr.3 = STRING: Device vmnic2 at 03:00.0 bnx2
IF-MIB::ifDescr.4 = STRING: Device vmnic3 at 03:00.1 bnx2
IF-MIB::ifType.1 = INTEGER: ethernetCsmacd(6)
IF-MIB::ifType.2 = INTEGER: ethernetCsmacd(6)
IF-MIB::ifType.3 = INTEGER: ethernetCsmacd(6)
IF-MIB::ifType.4 = INTEGER: ethernetCsmacd(6)
IF-MIB::ifMtu.1 = INTEGER: 1500
IF-MIB::ifMtu.2 = INTEGER: 1500
IF-MIB::ifMtu.3 = INTEGER: 1500
IF-MIB::ifMtu.4 = INTEGER: 1500
IF-MIB::ifSpeed.1 = Gauge32: 1000000000
IF-MIB::ifSpeed.2 = Gauge32: 1000000000
IF-MIB::ifSpeed.3 = Gauge32: 0
IF-MIB::ifSpeed.4 = Gauge32: 0
IF-MIB::ifPhysAddress.1 = STRING: 18:a9:5:4e:a7:1c
IF-MIB::ifPhysAddress.2 = STRING: 18:a9:5:4e:a7:1e
IF-MIB::ifPhysAddress.3 = STRING: 18:a9:5:4e:a7:20
IF-MIB::ifPhysAddress.4 = STRING: 18:a9:5:4e:a7:22
IF-MIB::ifAdminStatus.1 = INTEGER: up(1)
IF-MIB::ifAdminStatus.2 = INTEGER: up(1)
IF-MIB::ifAdminStatus.3 = INTEGER: up(1)
IF-MIB::ifAdminStatus.4 = INTEGER: up(1)
IF-MIB::ifOperStatus.1 = INTEGER: up(1)
IF-MIB::ifOperStatus.2 = INTEGER: up(1)
IF-MIB::ifOperStatus.3 = INTEGER: down(2)
IF-MIB::ifOperStatus.4 = INTEGER: down(2)
IF-MIB::ifLastChange.1 = Timeticks: (0) 0:00:00.00
IF-MIB::ifLastChange.2 = Timeticks: (0) 0:00:00.00
IF-MIB::ifLastChange.3 = Timeticks: (0) 0:00:00.00
IF-MIB::ifLastChange.4 = Timeticks: (0) 0:00:00.00
SNMPv2-MIB::snmpInPkts.0 = Counter32: 187
SNMPv2-MIB::snmpInBadVersions.0 = Counter32: 0
SNMPv2-MIB::snmpInBadCommunityNames.0 = Counter32: 0
SNMPv2-MIB::snmpInBadCommunityUses.0 = Counter32: 0
SNMPv2-MIB::snmpInASNParseErrs.0 = Counter32: 0
SNMPv2-MIB::snmpEnableAuthenTraps.0 = INTEGER: disabled(2)
SNMPv2-MIB::snmpSilentDrops.0 = Counter32: 0
SNMPv2-MIB::snmpProxyDrops.0 = Counter32: 0

My thoughts now are simple. SNMP is not enabled in ESXi for the reason that there is not much there to query and you can use the CIM queries that I mentioned in the previous post to look at this instead.

Monitoring ESXi Server health using Nagios/Opsview

As part of a project I am currently working on I have a requirement to check that my clients’ infrastructure is working to the best of its ability. Whilst we perform regular checks to ensure the sites are running as expected we don’t currently have an easy way to check the health of the ESX hosts that the virtual servers run on. Until now.

I had spent a lot of time trying to “hack” SNMP to be enabled on the ESXi boxes which involved editing the snmp.xml file in the “unsupported” console on the host but after enabling this found that it didnt give me the data I was looking for to run my checks against. Looking a bit further I found a python script which queries the CIM service on the ESX host to find out whether the hardware is working as expected. The script uses the CIM service to check the ESX Health Status and report back to your monitoring platform what the current status of the host is.

Installation is fairly straightforward. The following details are for an Opsview install running on Ubuntu 8.04LTS server but should be easily adaptable to any installation if needs be.

First login to your server as normal and download the latest version of the pywbem module (http://archive.ubuntu.com/ubuntu/pool/universe/p/pywbem/pywbem_0.7.0.orig.tar.gz)

opsview@LON-SVR-MON1:~$ wget http://archive.ubuntu.com/ubuntu/pool/universe/p/pywbem/pywbem_0.7.0.orig.tar.gz

Once you have downloaded the module extract and run the python installer as root

opsview@LON-SVR-MON1:~$ tar -xzf pywbem_0.7.0.orig.tar.gz
opsview@LON-SVR-MON1:~$ cd pywbem-0.7.0/
opsview@LON-SVR-MON1:~/pywbem-0.7.0$ sudo python setup.py install

Next you need to download the check_esx_wbem.py script (http://communities.vmware.com/docs/DOC-7170) and place it in your libexec folder

opsview@LON-SVR-MON1:~/pywbem-0.7.0$ cd /usr/local/nagios/libexec/
opsview@LON-SVR-MON1:/usr/local/nagios/libexec# wget http://communities.vmware.com/servlet/JiveServlet/downloadBody/7170-102-5-4233/check_esx_wbem.py
opsview@LON-SVR-MON1:/usr/local/nagios/libexec# sudo chown nagios:nagios check_esx_wbem.py
opsview@LON-SVR-MON1:/usr/local/nagios/libexec# sudo chmod a+x check_esx_wbem.py

You can test this from the command line using the following command

opsview@LON-SVR-MON1:/usr/local/nagios/libexec# ./check_esx_wbem.py https://10.9.0.65:5989 root Password

In the case above I received the following output but if everything is working as expected the script should return “OK”

WARNING : Power Supply 3 Power Supplies<br>CRITICAL : Power Supply 2 Power Supply 2: Failure detected<br>

Now we have confirmed the script is running we need to add it to Opsview. The first step here is to reload Opsview to pickup the new plugin. Once complete goto Configuration -> Service Checks and Create New Service Check. Setup your check in a similar way to the image below (remember to substitute “root” and “Password” with a valid username and password to login to your ESX host

Save this service check and then apply this to your ESX hosts. If you have multiple ESX hosts that have different username and passwords then you don’t need to create multiple Service Checks as the later versions of Opsview let you specify exceptions when you configure the check for a host

Once you have configured this reload Opsview and wait for Opsview to start checking the ESX server(s). Below is the screenshot from my server with its disconnected PSU

This should now allow you  to keep an eye on your ESX hosts alongside the rest of your network monitoring system.

HOWTO: Build an open source monitoring solution – Part1 Build the Server

Introduction

No matter what size of network you are responsible for you should always know what is happening with it to make sure any issues are rectified as soon as possible and hopefully with minimal disruption to your users. Obviously the needs of a small company are different to those of a large corporation and in part this guide is not aimed at people who have a single server, single switch and a few PCs but more at the sys admin who needs to keep an eye on a handful of servers and managed switches (although you can still keep an eye on that single server with this setup).

I have split the guide up into a number of sections which, for me at least, is a logical way to install the different components. All the technologies used in this guide are free to setup and if you have an old server lying around the cost to set this up is simply your time.

OK. Enough with the intro let’s start with building the server.

Part 1 – Build the Server

What you need:

  • Server to run this off – a decent PC will suffice for small setups. I am building this as a virtual host on an ESX server
  • Ubuntu 8.04 Server (Download it here) Make sure its Server Edition and also not 8.10 or this won’t work. N.B. you can use other Linux distributions but this is based around Ubuntu 8.04 server

Installation process:

I tried to insert pictures at each step of the installation process but it made the post look untidy so I have created a list of steps that you will complete along the way as you setup your server. If you want to have a look at the screenshots check out the image gallery at the bottom of the post.

  1. Download the ISO from your nearest mirror and burn to a CD (if you are building a virtual machine you can skip burning this to a cd). Stick the CD into your server and power it on
  2. The first thing you will see is a prompt to select your language. Select your preference from here with the arrow keys and press enter – I am going to choose English (screenshot)
  3. You will next be asked what you want to do. This should be fairly self-explanatory what each option does. We want to “Install Ubuntu Server” (screenshot)
  4. The installer will load the Kernel off the CD and you will be presented with a blue/grey screen asking which language you want to use (Yes you are asked twice). Once again use the arrow keys to select the option you want and press Enter. Again I am selecting English here. (screenshot)
  5. Your next prompt asks you which type of English you would like. I am going to choose your localisation. I am choosing United Kingdom.
  6. The next prompt asks you to select your keyboard layout. If you know what keyboard you have connected then select No and you will be asked to select it on the next screens otherwise choose Yes and you will be asked to press keys on the keyboard and the installer will work out what you are using. (screenshot1 screenshot2)
  7. After this has completed the installer will look to load some more components for the setup and try to acquire an IP address of a DHCP server on your network. This is fine as we will be setting this statically later in the guide. (screenshot)
  8. After it has an IP address you need to set your hostname. If you have a naming convention for your site then follow this (e.g. ACME-SVR-MON1) it’s better than just leaving the default as ubuntu. (screenshot)
  9. Once this has done the installer will now ask how you want to partition your disk off. I am going to go with the simplest option “Guided – use entire disk” to give me a nice big partition over the whole drive to work with. If you are confident with how to partition a disk then you can choose manual but that is outside the scope of this guide. (screenshot)
  10. Having chosen the option you need to choose the disk you want to partition. If there is only one disk in the server then you should only see one option here. Select the relevant disk and press Enter. You will be asked one more time to confirm the changes that will be made so review the page and select Yes to proceed.(screenshot)
  11. Ubuntu will now partition the hard drive and start to install the basic OS. This will take a few minutes so go and brew a cuppa. (screenshot)
  12. Enjoyed your drink? Good. Now back to the setup process. You need to setup the user account that you will access the system. First enter your full name, then your username and finally choose a password. (screenshot1 screenshot2)
  13. The next step is to install the relevant core packages you need. Before doing this you will be asked if there is an HTTP proxy between the monitoring server and the Internet. If there is then enter the address here otherwise leave it blank and choose Continue (screenshot)
  14. In this example we are selecting a LAMP (Linux, Apache MySQL, PHP) to provide a web interface and database functionality, Open SSH to give us remote access and Mail to enable our monitoring server to notify us when there are issues. (screenshot)
  15. You next need to enter the password for your root MySQL account and confirm it. Please dont leave this blank as its a big security hole if you do. (screenshot1 screenshot2)
  16. After this you will be prompted for how you want to configure your email. I recommend you choose the Satellite System option as this will allow you to push all email generated by the server to your mail server for delivery. After selecting this option you need to choose the system name (what appears after the @ sign) and then the smart host you are going to relay all your mail through (screenshot1 screenshot2 screenshot3)
  17. Once this is done – go away and make yourself another drink as this next step takes another 5-10 minutes to complete depending on the speed of your server. When you come back however the install is complete. Remove the CD and press Enter to reboot your server. (screenshot)

Initial Login and basic configuration

Now that the installation is complete and your server rebooted you should see a screen similar to the one below. This is your login screen, enter the username and password you setup in step 12 and login to the server.

Base Ubuntu install

Base Ubuntu install

Now you are logged in we need to set the IP address so that it is static and check that the correct DNS servers are listed. Because of the changes we are making we need to run the next few commands as the root account on the server. Your user account has permissions to run commands as root you just need to tell the server that you want to carry out the changes – a bit like UAC in Windows Vista.

To access the shell as the root user type the following command at the console and press enter.

sudo -s

Enter your password that you logged in with and press enter. Your command line should change from

matt@ACME-SVR-MON1:~$

to

root@ACME-SVR-MON1:~#

Anything you enter now will be run as the root user.

To set the IP address to be static we need to edit the network interfaces configuration file. This is a plain text file that tells the server what IP address, Subnet mask, gateway etc to assign to the different interfaces on your server. There are a number of text editors available but I find nano to be a simple and easy to use editor. Type the following command and press enter to open the config file:

nano /etc/network/interfaces

The file will show you the following default configuration for your server:

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet dhcp

This needs to be changed so that the primary network interface (eth0) will not look to the DHCP server but will instead be a static address. The code below shows a customised interfaces file. add in the relevant lines and substitute in the correct values for your network. (N.B. don’t use the number pad to enter the values here as it can cause issues as nano doesnt seem to register that NumLock is turned on)

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet static
address 192.168.1.3
netmask 255.255.255.0
network 192.168.1.0
broadcast 192.168.1.255
gateway 192.168.1.254

Once this has been done press Ctrl+X to exit nano. You will be asked if you want to save the file – press Y to confirm and exit. Your configuration will be saved and you will return to the root command line however your IP address will not have changed yet as we need to restart the networking service for this to take effect. Type the following command and press enter:

/etc/init.d/networking restart

If this is successful you should see the following:

 * Reconfiguring network interfaces...                                   [ OK ]

If you do not see this you have made a mistake in the config file. Open it up and check that each line is correct and then try to restart the networking services again. to confirm your server is now listening on the correct IP address we use the ifconfig command – this is very similar to the ipconfig command in Windows and gives an output similar to this:

eth0      Link encap:Ethernet  HWaddr 00:0c:29:ef:62:67
          inet addr:192.168.1.3  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:feef:6267/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6601515 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7587624 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:997379356 (951.1 MB)  TX bytes:759778115 (724.5 MB)
          Interrupt:16 Base address:0x1424

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:966156 errors:0 dropped:0 overruns:0 frame:0
          TX packets:966156 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:106793708 (101.8 MB)  TX bytes:106793708 (101.8 MB)

There is one thing left to check and that is that your DNS servers have been successfully added to the server. If your DHCP setup process was successful we shouldnt need to change anything but its good to make sure its all working. Type the following command and you should see a number of lines saying “nameserver” with the IP address of your DNS server listed next to them:

more /etc/resolv.conf

running this on my server gave me

search home.bisnet
nameserver 192.168.1.1
nameserver 192.168.1.4

If you want to test DNS resolution then try to ping www.google.co.uk and you should get a reply (N.B. Unlike Windows PING this will run until you stop it. Once you are happy you are getting replies press Ctrl+C to stop the ping).

When you are happy this is working press Ctrl+D to log out of the root command line and back to your normal account.

Congratulations. You have now setup your basic server. In Part 2 of this guide I will go through installing the applications you will use as well as show you the basics of configuring them.

Screenshots from the Installation Process

Giving this blog a purpose

Having spent a long time ignoring this blog or simply linking to amusing things on the net that I found through sites like stumbleupon.com I think its time to try and focus what I am writing about and see if I can get a good set of useful articles written.

Having thought about it for about 5 minutes this morning I decided that it should be something related to what I do on a daily basis but also something that I have interest in otherwise what’s the point? Visualization was a first thought but I already read a good blog about vmware (http://www.techhead.co.uk) which I would probably end up plagiarising and isn’t the reason for this. The other thing that I am keen on at the moment in the world of technology is network monitoring and the technologies you can use for it.

Now I will say now I’m quite biased when I am looking at setting up a monitoring solution as I don’t really want to say for the extra hardware or software I use to monitor everything. This does mean I will look for a good open source application(s) to carry out a task and which I can customize rather than paying for a boxed product that does some of what I want to do but not everything.

Now I still like sharing interesting pages I find on the web but I may need to split the blog into 2 sections to look more professional… Still haven’t decided yet but don’t worry the random site links will still be there!

So what’s my first entry under the new incarnation of the blog? I think I will write up the “Howto” on building an open source monitoring machine that can keep an eye on your network. Expect it in a few days.