Überwachung – Nagios

Überwachung ist gut und nötig. Denn es ist schlecht wenn ein Dienst versagt und
der Admin es nicht merkt, sondern es erst von seinen Usern gesagt bekommen muss.
(„Bad admin, no bisquit!“)
Hierzu eignet sich recht gut Nagios (früher Netsaint). Obwohl ich das alte Netsaint,
also die Vorgängerversion, einfacher zu konfigurieren fand. So sind bei den Check-
Scripts und Programmen einige verbesserungen vorgenommen worden.


Sinnvollerweise sollte Nagios auf einem Extrarechner installiert werden.
Sodass ein Komplettausfall eines überwachten Rechners auch bemerkt wird.

HP: http://www.nagios.org

Vorarbeiten:
User nagios erzeugen:
/etc/passwd: nagios:x:500:500:Nagios Ueberwachung:/usr/local/nagios:/usr/bin/false
/etc/shadow: nagios:!:12891:0:99999:7:::
/etc/groups: nagios:x:500

Installieren:
Dependencies: gd library >=1.6.3; jpeg library; Net::SNMP; Apache (wie z.B. hier beschrieben)

Da die jpeg lib dringend gebraucht wird, den Weg zu einem GD mit jpeg Unterstützung:
/usr/local/src # wget ftp://ftp.uu.net/graphics/jpeg/jpegsrc.v6b.tar.gz
/usr/local/src # wget http://www.boutell.com/gd/http/gd-2.0.33.tar.gz
/usr/local/src # tar xzf jpegsrc.v6b.tar.gz
/usr/local/src # tar xzf gd-2.0.33.tar.gz
/usr/local/src # cd jpeg-6b/
/usr/local/src/jpeg-6b # ./configure –prefix=/usr/local –enable-static –enable-shared && make && make test && make install
/usr/local/src/jpeg-6b # cd ../gd-2.0.33/
/usr/local/src/gd-2.0.33 # ./configure
/usr/local/src/gd-2.0.33 # make && make install
/usr/local/src/jpeg-6b # cd ../
/usr/local/src # cpan -i Net::SNMP

/usr/local/src # wget http://mesh.dl.sourceforge.net/sourceforge/nagios/nagios-1.2.tar.gz
/usr/local/src # wget http://mesh.dl.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.tar.gz
/usr/local/src # wget http://mesh.dl.sourceforge.net/sourceforge/nagios/nrpe-2.0b5.tar.gz
/usr/local/src # tar xzf nagios-1.2.tar.gz
/usr/local/src # tar xzf nagios-plugins-1.4.tar.gz
/usr/local/src # tar xzf nrpe-2.0b5.tar.gz
/usr/local/src # cd nagios-1.2/
/usr/local/src/nagios-1.2 # ./configure –enable-embedded-perl –with-cgiurl=/cgi-bin/nagios –with-htmurl=/nagios –with-init-dir=/etc/init.d
/usr/local/src/nagios-1.2 # make all && make install && make install-init && make install-commandmode && make install-config
/usr/local/src/nagios-1.2 # cd ../nagios-plugins-1.4/
/usr/local/src/nagios-plugins-1.4 # ./configure –with-trusted-path=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin –with-cgiurl=/cgi-bin/nagios
/usr/local/src/nagios-plugins-1.4 # make && make install
/usr/local/src/nagios-plugins-1.4 # cp contrib/*.pl /usr/local/nagios/libexec/
/usr/local/src/nagios-plugins-1.4 # cp plugins-scripts/*.pl /usr/local/nagios/libexec/
/usr/local/src/nagios-plugins-1.4 # cd ../nrpe-2.0b5/
/usr/local/src/nrpe-2.0b5 # ./configure
/usr/local/src/nrpe-2.0b5 # make all
/usr/local/src/nrpe-2.0b5 # cp src/nrpe /usr/local/nagios/libexec/
/usr/local/src/nrpe-2.0b5 # cp src/check_nrpe /usr/local/nagios/libexec/
/usr/local/src/nrpe-2.0b5 # cp nrpe.cfg /usr/local/nagios/etc/

nrpe-2.0b5.tar.gz ist der Remotecheck und sollte auch auf allen zu überwachenden
Hosts installiert werden. Um CPU, Prozesse, Plattenplatz usw. usf. überwachen zu
können. Auf dem Überwachungsrechner selbst wird eigentlich nur check_nrpe gebraucht.

ACHTUNG:
Wenn man bei „ps -axw | head -1“ eine „bad syntax“ Fehlermeldung sieht muss, bevor
man die nagios-plugins mit „make“ Compiliert, vorher die config.h editiert werden!
/usr/local/src/nagios-plugins-1.4 # vi config.h

* Verbatim command to execute for ps in check_procs */
#define PS_COMMAND „/bin/ps axwo ’stat uid ppid vsz rss pcpu comm args'“

Der „-“ muss, wie hier geschehen, weg.
(Die Entwickler der Plugins scheinen sich zu fein für ein Bugfix zu sein…)

Konfiguriation:

– Apache (ggf. in einem eigenen VHost – SSL sehr empfohlen)
# vi /usr/local/apache/conf/httpd.conf

ScriptAlias /cgi-bin/nagios/ „/usr/local/nagios/sbin/“
<Directory „/usr/local/nagios/sbin/“>
AllowOverride AuthConfig
Options ExecCGI
Order allow,deny
Allow from all
</Directory>

Alias /nagios/ /usr/local/nagios/share/
<Directory „/usr/local/nagios/share“>
Options None
AllowOverride AuthConfig
Order allow,deny
Allow from all
</Directory>

– Nagios selbst
# cd /usr/local/nagios/etc
/usr/local/nagios/etc # cp nagios.cfg-sample nagios.cfg
/usr/local/nagios/etc # cp timeperiods.cfg-sample timeperiods.cfg
/usr/local/nagios/etc # cp checkcommands.cfg-sample checkcommands.cfg
/usr/local/nagios/etc # cp misccommands.cfg-sample misccommands.cfg
/usr/local/nagios/etc # cp resource.cfg-sample resource.cfg
/usr/local/nagios/etc # cp cgi.cfg-sample cgi.cfg
/usr/local/nagios/etc # vi cgi.cfg

nagios_check_command=/usr/local/nagios/libexec/check_nagios /usr/local/nagios/var/status.log 5 ‚/usr/local/nagios/bin/nagios‘
authorized_for_system_information=sysadmin
authorized_for_configuration_information=sysadmin
authorized_for_system_commands=sysadmin
authorized_for_all_services=sysadmin,admin
authorized_for_all_hosts=sysadmin,admin
authorized_for_all_service_commands=sysadmin,admin
authorized_for_all_host_commands=sysadmin,admin

(Nur geänderte Zeilen.)

/usr/local/nagios/etc # vi contactgroups.cfg

define contactgroup{
contactgroup_name       nagios-admins
alias                   Nagios Administrators
members                 nagios
}

define contactgroup{
contactgroup_name       linux-admins
alias                   Linux Administrators
members                 nagios,linux
}

/usr/local/nagios/etc # vi contacts.cfg

define contact{
contact_name                    nagios
alias                           Nagios Admin
service_notification_period     24x7
host_notification_period        24x7
service_notification_options    w,u,c,r
host_notification_options       d,u,r
service_notification_commands   notify-by-email
host_notification_commands      notify-by-email
email                           nagios-admin@futzelnet.de
}

define contact{
contact_name                    linux
alias                           Linux Admin
service_notification_period     24x7
host_notification_period        24x7
service_notification_options    w,u,c,r
host_notification_options       d,u,r
service_notification_commands   notify-by-email
host_notification_commands      notify-by-email
email                           ilinux-admin@futzelnet.de
}

/usr/local/nagios/etc # vi hostgroups.cfg

define hostgroup{
hostgroup_name  nagios-boxes
alias           Nagios Servers
contact_groups  nagios-admins
members         myself
}

define hostgroup{
hostgroup_name  linux-boxes
alias           Linux Servers
contact_groups  linux-admins
members         myself,rechner2
}

/usr/local/nagios/etc # vi hosts.cfg

define host{
name                            generic-host    ; The name of this host template - referenced in other host definitions, used for template recursion/resolution
notifications_enabled           1       ; Host notifications are enabled
event_handler_enabled           1       ; Host event handler is enabled
flap_detection_enabled          1       ; Flap detection is enabled
process_perf_data               1       ; Process performance data
retain_status_information       1       ; Retain status information across program restarts
retain_nonstatus_information    1       ; Retain non-status information across program restarts

register                        0       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}

define host{
use                     generic-host            ; Name of host template to use

host_name               myself
alias                   Ueberwachungsrechner
address                 127.0.0.1               ; Hostnamen tuen auch
check_command           check-host-alive
max_check_attempts      10
notification_interval   480
notification_period     24x7
notification_options    d,u,r
}

define host{
use                     generic-host            ; Name of host template to use

host_name               rechner2
alias                   Rechner 2
address                 rechner2.futzelnet.de
check_command           check-host-alive
max_check_attempts      10
notification_interval   480
notification_period     24x7
notification_options    d,u,r
}

/usr/local/nagios/etc # vi services.cfg

define service{
name                            generic-service ; The 'name' of this service template, referenced in other service definitions
active_checks_enabled           1       ; Active service checks are enabled
passive_checks_enabled          1       ; Passive service checks are enabled/accepted
parallelize_check               1       ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service             1       ; We should obsess over this service (if necessary)
check_freshness                 0       ; Default is to NOT check service 'freshness'
notifications_enabled           1       ; Service notifications are enabled
event_handler_enabled           1       ; Service event handler is enabled
flap_detection_enabled          1       ; Flap detection is enabled
process_perf_data               1       ; Process performance data
retain_status_information       1       ; Retain status information across program restarts
retain_nonstatus_information    1       ; Retain non-status information across program restarts

register                        0       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}

define service{
use                             generic-service         ; Name of service template to use

host_name                       myself
service_description             PING
is_volatile                     0
check_period                    24x7
max_check_attempts              3
normal_check_interval           5
retry_check_interval            1
contact_groups                  linux-admins
notification_interval           240
notification_period             24x7
notification_options            c,r
check_command                   check_ping!100.0,20%!500.0,60%
}

define service{
use                             generic-service         ; Name of service template to use

host_name                       myself
service_description             HTTP
is_volatile                     0
check_period                    24x7
max_check_attempts              3
normal_check_interval           2
retry_check_interval            1
contact_groups                  nagios-admins
notification_interval           240
notification_period             24x7
notification_options            w,u,c,r
check_command                   check_http
}

define service{
use                             generic-service         ; Name of service template to use

host_name                       myself
service_description             SSH
is_volatile                     0
check_period                    24x7
max_check_attempts              3
normal_check_interval           2
retry_check_interval            1
contact_groups                  linux-admins
notification_interval           240
notification_period             24x7
notification_options            w,u,c,r
check_command                   check_ssh
}

define service{
use                             generic-service         ; Name of service template to use

host_name                       myself
service_description             Total Processes
is_volatile                     0
check_period                    24x7
max_check_attempts              3
normal_check_interval           5
retry_check_interval            2
contact_groups                  linux-admins
notification_interval           240
notification_period             24x7
notification_options            w,u,c,r
check_command                   check_local_procs!150!200!RSZDT
}

define service{
use                             generic-service         ; Name of service template to use

host_name                       myself
service_description             /dev/hda1 Free Space
is_volatile                     0
check_period                    24x7
max_check_attempts              3
normal_check_interval           5
retry_check_interval            1
contact_groups                  linux-admins
notification_interval           120
notification_period             24x7
notification_options            w,u,c,r
check_command                   check_local_disk!20%!10%!/dev/hda1
}

define service{
use                             generic-service         ; Name of service template to use

host_name                       rechner2
service_description             PING
is_volatile                     0
check_period                    24x7
max_check_attempts              3
normal_check_interval           5
retry_check_interval            1
contact_groups                  linux-admins
notification_interval           240
notification_period             24x7
notification_options            c,r
check_command                   check_ping!100.0,20%!500.0,60%
}

/usr/local/nagios/etc # vi checkcommands.cfg

define command{
command_name    check_ssh
command_line    $USER1$/check_ssh -H $HOSTADDRESS$
}

#define command{
#        command_name    check_remote_checkscript
#        command_line    $USER1$/check_nrpe $HOSTADDRESS$ -c check_checkscript
#}


Hinzufügen.

/usr/local/nagios/etc # chown -R nagios:nagios /usr/local/nagios
/usr/local/nagios/etc # touch dependencies.cfg escalations.cfg
/usr/local/nagios/etc # vi ../sbin/.htaccess

AuthName „Nagios Access“
AuthType Basic
AuthUserFile /usr/local/nagios/etc/.htpasswd
require valid-user

/usr/local/nagios/etc # vi .htpasswd

sysadmin:gI/gzUSJ6JGkk
admin:gI/gzUSJ6JGkk

Prüfen ob die Konfig so OK ist:
/usr/local/nagios/etc # /usr/local/nagios/bin/nagios -v nagios.cfg
[…]
Total Warnings: 0
Total Errors: 0
/usr/local/nagios/etc #
Dies sollte nach jeder Konfigänderung nochmal extra gemacht werden!
Sicher ist sicher. 😉

Starten:
/usr/local/nagios/etc # /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
oder:
/usr/local/nagios/etc # /etc/init.d/nagios start

Apache neu starten um Änderung aktiv werden zu lassen.
/usr/local/nagios/etc # /usr/local/apache/bin/apachectl graceful

Im Browser Nagios gucken:
http://localhost/nagios/

Die *sehr* empfehlenswerte Doku kann mit http://localhost/nagios/docs/toc.html
besichtigt werden. Alternativ: /usr/local/nagios/share/docs
Auch nützlich können die sample Konfigs in /usr/local/nagios/etc sein.