ZABBIX monitoring Flume

Posted by Yancy on 2017-09-21

ZABBIX monitoring Flume

Flume本身提供了http, ganglia的监控服务,而我们目前主要使用zabbix做监控。因此,我们为Flume添加了zabbix监控模块,和sa的监控服务无缝融合。
另一方面,净化Flume的metrics。只将我们需要的metrics发送给zabbix,避免 zabbix server造成压力。目前我们最为关心的是Flume能否及时把应用端发送过来的日志写到Hdfs上, 对应关注的metrics为:

Source : 接收的event数和处理的event数
Channel : Channel中拥堵的event数
Sink : 已经处理的event数

zabbix安装&JVM性能监控

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
zabbix安装
http://my.oschina.net/yunnet/blog/173161
JDK1.8
[jollybi@countly1 conf]$ java -version
java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
#JVM性能监控
Young GC counts
/usr/local/jdk/bin/jstat -gcutil 87007 | tail -1 | awk '{print $6}'
95.16
Full GC counts
/usr/local/jdk/bin/jstat -gcutil 87007 | tail -1 | awk '{print $8}'
436.252
JVM total memory usage
/usr/local/jdk/bin/jmap -histo $(pgrep java)|grep Total | sed -n '$p' | awk '{print $3}'
JVM total instances usage
/usr/local/jdk/bin/jmap -histo $(pgrep java)|grep Total | sed -n '$p' | awk '{print $2}'

flume应用参数监控

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
启动时加上JSON repoting参数,这样就可以通过http://localhost:34545/metrics访问
bin/flume-ng agent --conf conf --conf-file conf/flume-conf-test.properties --name agent -Dflume.root.logger=INFO,console -Dflume.monitoring.type=http -Dflume.monitoring.port=34545 &
[root@localhost apache-flume-1.8.0-bin]# curl http://localhost:34545/metrics 2>/dev/null|sed -e 's/\([,]\)\s*/\1\n/g' -e 's/[{}]/\n/g' -e 's/[",]//g'
SOURCE.source1:
EventReceivedCount:4871
AppendBatchAcceptedCount:52
Type:SOURCE
EventAcceptedCount:4871
AppendReceivedCount:0
StartTime:1511251310062
OpenConnectionCount:0
AppendAcceptedCount:0
AppendBatchReceivedCount:52
StopTime:0
SINK.sink1:
ConnectionCreatedCount:0
BatchCompleteCount:0
BatchEmptyCount:43
EventDrainAttemptCount:0
StartTime:1511251311047
BatchUnderflowCount:1
ConnectionFailedCount:0
ConnectionClosedCount:0
Type:SINK
RollbackCount:0
EventDrainSuccessCount:4871
KafkaEventSendTimer:24748
StopTime:0
CHANNEL.channel1:
ChannelCapacity:1000
ChannelFillPercentage:0.0
Type:CHANNEL
ChannelSize:0
EventTakeSuccessCount:4871
EventTakeAttemptCount:4915
StartTime:1511251309391
EventPutAttemptCount:4871
EventPutSuccessCount:4871
StopTime:0
/opt/jdk1.8.0_101/bin/jstat

配置监控flume的脚本文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
vim /etc/zabbix/monitor_flume.sh
event=EventDrainSuccessCount
#curl http://localhost:34545/metrics 2>/dev/null|sed -e 's/\([,]\)\s*/\1\n/g' -e 's/[{}]/\n/g' -e 's/[",]//g' |grep $1|awk -F: '{print $2}'
function EventDrainSuccessCount {
curl http://localhost:34545/metrics 2>/dev/null|sed -e 's/\([,]\)\s*/\1\n/g' -e 's/[{}]/\n/g' -e 's/[",]//g' |grep $event|awk -F: '{print $2}'
}
function StartTime {
curl http://localhost:34545/metrics 2>/dev/null|sed -e 's/\([,]\)\s*/\1\n/g' -e 's/[{}]/\n/g' -e 's/[",]//g' |grep StartTim |awk -F: '{print $2}' |sed -n "2p"
}
function Total {
curl http://localhost:34545/metrics 2>/dev/null|sed -e 's/\([,]\)\s*/\1\n/g' -e 's/[{}]/\n/g' -e 's/[",]//g' |grep Total|awk -F: '{print $2}'
}
#Run the requested function
$1

在zabbix agent配置文件进行部署

1
2
3
4
5
6
vim zabbix_flume_jdk.conf
UserParameter=ygc.counts,sudo /opt/jdk1.8.0_101/bin/jstat -gcutil $(pgrep java|head -1)|tail -1|awk '{print $6}'
UserParameter=fgc.counts,sudo /opt/jdk1.8.0_101/bin/jstat -gcutil $(pgrep java|head -1)|tail -1|awk '{print $8}'
UserParameter=jvm.memory.usage,sudo /opt/jdk1.8.0_101/bin/jmap -histo $(pgrep java|sed -n '$p')|grep Total | sed -n '$p' |awk '{print $3}'
UserParameter=jvm.instances.usage,sudo /opt/jdk1.8.0_101/bin/jmap -histo $(pgrep java|sed -n '$p')|grep Total | sed -n '$p' |awk '{print $2}'
UserParameter=flume.monitor[*],sudo /bin/bash /etc/zabbix/monitor_flume.sh $1