nagios
- 네트워크 모니터링 프로그램
- 웹 서버나 다양한 어플리케이션 프로그램도 모니터링 서비스 제공
- 오픈스택이 제공하는 서비스(keystone, nova, neutron)등 이들이 사용하는 자원(cpu, ram, disk)사용량 까지도 모니터링 가능
- 장애 시 알림기능도 제공
- 웹 인터페이스를 통해 nagios가 제공하는 모든 기능을 관리
nagios 서버
- 모니터링을 담당하는 시스템
nagios agent 또는 target
- 모니터링 되는 대상
########### controller에서 nagios 서버 설치 #########################
[root@controller ~]# yum install nagios nagios-plugins-all
[root@controller ~]# rpm -qa | grep nagios
[root@controller ~]# vi /etc/httpd/conf.d/nagios.conf
....
<RequireAll>
Require all granted
# Require host 127.0.0.1
Require ip 127.0.0.1 192.168.100.0/24
....
패스워드 입력
[root@controller ~]# htpasswd /etc/nagios/passwd nagiosadmin
New password: nagiosadmin
Re-type new password: nagiosadmin
Updating password for user nagiosadmin
[root@controller ~]#
[root@controller ~]# cat /etc/nagios/passwd
nagiosadmin:$apr1$mxH1.BM6$HD31nIwpUyMd2hTuwsnnz/
데몬 실행
[root@controller ~]# systemctl restart nagios httpd
[root@controller ~]# systemctl enable nagios httpd
컨피그 검증
[root@controller ~]# nagios -v /etc/nagios/nagios.cfg
Nagios Core 4.4.5
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2019-08-20
License: GPL
Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...
Running pre-flight check on configuration data...
Checking objects...
Checked 8 services.
Checked 1 hosts.
Checked 1 host groups.
Checked 0 service groups.
Checked 1 contacts.
Checked 1 contact groups.
Checked 24 commands.
Checked 5 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 1 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
[root@controller ~]#
로그 확인
[root@controller ~]# ls /var/log/nagios/
archives nagios.log
[root@controller ~]#
웹 접속
Nagios NRPE 플러그 인 사용
hostname | ip | role | program |
controller | x.x.100.110/24 | 모니터링 서버/타깃 | check_nrpe 플러그인 nrpe 데몬 |
compute | x.x.100.111/24 | 모니터링 타깃 | nrpe 데몬 |
network | x.x.100.112/24 | 모니터링 타깃 | nrpe 데몬 |
nagios chec_nrpe 프러그인 이용해 모니터링 타깃 접속해 정보 및 로그 수집
nrpe데몬 포트 5666 번 사용
NRPE 플러그 설정
###############compute 에서 설정 #####################
[root@compute ~]# yum install nrpe nagios-plugins-nrpe nagios-plugins-all
[root@compute ~]# rpm -qa | grep nrpe
nrpe-4.0.3-2.el7.x86_64
nagios-plugins-nrpe-4.0.3-2.el7.x86_64
[root@compute ~]# vi /etc/nagios/nrpe.cfg
allowed_hosts=127.0.0.1,192.168.100.110
데몬 체크
command[check_nova_metadata]=/usr/lib64/nagios/plugins/check_procs -C nova-api-metadata -u nova
command[check_nova_compute]=/usr/lib64/nagios/plugins/check_procs -C nova-compute -u nova
[root@compute ~]# systemctl restart nrpe
[root@compute ~]# systemctl status nrpe
인증 안하면 critical 뜰 수 있다
[root@compute ~]# source ~/keystonerc
###############network 에서 설정 #####################
[root@network ~]# yum install nrpe nagios-plugins-nrpe nagios-plugins-all
[root@network ~]# vi /etc/nagios/nrpe.cfg
allowed_hosts=127.0.0.1,192.168.100.110
command[check_neutron_dhcp]=/usr/lib64/nagios/plugins/check_procs -C neutron-dhcp-agent -u neutron
command[check_neutron_openvswitch]=/usr/lib64/nagios/plugins/check_procs -C neutron-openvswitch-agent -u neutron
command[check_cinder_volume]=/usr/lib64/nagios/plugins/check_procs -C cinder-volume -u cinder -c 1:4
[root@network ~]# systemctl restart nrpe
[root@network ~]# systemctl status nrpe
인증 안하면 critical 뜰 수 있다
[root@compute ~]# source ~/keystonerc
###############controller 에서 설정 #####################
[root@controller ~]# yum -y install nrpe nagios-plugins-nrpe nagios-plugins-all
[root@controller ~]# vi /etc/nagios/nrpe.cfg
allowed_hosts=127.0.0.1,192.168.100.110
실행 명령어 정의
command[check_keystone_api]=/usr/lib64/nagios/plugins/check_http localhost -p 5000
command[check_neutron_procs]=/usr/lib64/nagios/plugins/check_procs -C neutron-server -u neutron -c 1:10
command[check_glance_api_procs]=/usr/lib64/nagios/plugins/check_procs -C -u glance glance-api
command[check_glance_registry]=/usr/lib64/nagios/plugins/check_procs -C -u glance glance-registry
command[check_nova_api]=/usr/lib64/nagios/plugins/check_http localhost -p 8774
[root@controller ~]# systemctl restart nrpe
[root@controller ~]# systemctl status nrpe
check_nrpe는 각 서버의 nrpe.cfg에 있는 명령어 실행 결과를 읽어옴
[root@controller ~]# mkdir /etc/nagios/conf.d
[root@controller ~]# vi /etc/nagios/conf.d/controller.cfg
define host{
use linux-server
host_name controller
alias controller
address 192.168.100.110
}
define service{
use generic-service
host_name controller
service_description keystone API
check_command check_nrpe!check_keystone_api
notification_period 24x7
}
define service{
use generic-service
host_name controller
service_description Keystone Process
check_command check_nrpe!check_neutron_procs
notification_period 24x7
}
define service{
use generic-service
host_name controller
service_description Glance Process
check_command check_nrpe!check_glance_api_procs
notification_period 24x7
}
define service{
use generic-service
host_name controller
service_description Glance Registry
check_command check_nrpe!check_glance_registry
notification_period 24x7
}
define service{
use generic-service
host_name controller
service_description Nova API
check_command check_nrpe!check_nova_api
notification_period 24x7
}
[root@controller ~]# vi /etc/nagios/conf.d/compute.cfg
define host{
use linux-server
host_name compute
alias compute
address 192.168.100.111
}
define service{
use generic-service
host_name compute
service_description Nova Compute
check_command check_nrpe!check_nova_compute
notification_period 24x7
}
define service{
use generic-service
host_name compute
service_description Nova Metadata
check_command check_nrpe!check_nova_metadata
notification_period 24x7
}
[root@controller ~]# vi /etc/nagios/conf.d/network.cfg
define host{
use linux-server
host_name network
alias network
address 192.168.100.112
}
define service{
use generic-service
host_name network
service_description Nuetron DHCP
check_command check_nrpe!check_neutron_dhcp
notification_period 24x7
}
define service{
use generic-service
host_name network
service_description Neutron Openvswitch
check_command check_nrpe!check_neutron_openvswitch
notification_period 24x7
}
define service{
use generic-service
host_name network
service_description Cinder Volume
check_command check_nrpe!check_cinder_volume
notification_period 24x7
}
[root@controller ~]# vi /etc/nagios/objects/commands.cfg
define command {
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$
}
옵션 설명
use | - /etc/nagios/objects/templates.cfg 파일에 정의된 linux-server 를 정의 - 일반적인 리눅스 호스트에 대한 기본점검 내용을 사용한다는 의미 |
host_name | 서비스를 점검할 대상 호스트의 이름을 정의 |
service_description | 이 서비스에 대한 설명을 정의 |
check_command | - 이 서비스에서 실행할 명령어를 정의 - check_nrpe는 /usr/lib/nagios/plugins/에 위치한 check_nrpe 플러그인이 하나의 인자 값을 받는다는 의미이며 그 인자 값으로 check_keystone_api 가 사용된다는 의미 - check_keystone_api 명령어는 컨트롤러노드의 /etc/nagios/nrpe.cfg에 새로 정의된 명령어를 의미 |
check_command | 알림 기간을 매일 24시간 으로 설정 |
생성한 conf.d폴더에 있는 cfg 파일을 읽어들임
[root@controller ~]# vi /etc/nagios/nagios.cfg
cfg_dir=/etc/nagios/conf.d/
[root@controller ~]# nagios -v /etc/nagios/nagios.cfg
[root@controller ~]# systemctl restart nagios
[root@controller ~]# systemctl status nagios
###############################
critical이나 빨간색 이 나오면 체크 해보기
[root@compute ~]# /usr/lib64/nagios/plugins/check_procs -C nova-compute -u nova
PROCS OK: 1 process with command name 'nova-compute', UID = 162 (nova) | procs=1;;;0;
[root@compute ~]# /usr/lib64/nagios/plugins/check_procs -C nova-api-metadata -u nova
PROCS OK: 0 processes with command name 'nova-api-metadata', UID = 162 (nova) | procs=0;;;0;
[root@compute ~]# /usr/lib64/nagios/plugins/check_procs -C nova-api-metadata -u nova
PROCS OK: 0 processes with command name 'nova-api-metadata', UID = 162 (nova) | procs=0;;;0;
[root@compute ~]# /usr/lib64/nagios/plugins/check_procs -C nova-compute -u nova
PROCS OK: 1 process with command name 'nova-compute', UID = 162 (nova) | procs=1;;;0;
'openstack' 카테고리의 다른 글
openstack - loadbalancer (0) | 2021.10.22 |
---|---|
openstack- ELK 로그수집 및 관리 (0) | 2021.02.20 |
openstack - heat 설치 및 사용 (0) | 2021.02.18 |
openstack - swift 사용 (0) | 2021.02.18 |
openstack - ceph 스토리지 설정 (0) | 2021.02.17 |