lvs+keepalived+bind 不能正常踢掉real server

使用lvs+keepalived+bind,发现在client端打压力测试时,一旦将一台real_server 的named down掉,这台client就卡住了,很慢才能解析,过几分钟后才能真正踢掉down的real server,我的配置如下:

vrrp_instance VI_1 {
state MASTER
interface eth0:0
virtual_router_id 51
priority 150
advert_int 3
authentication {
auth_type PASS
auth_pass xxxxxx
}
virtual_ipaddress {
10.11.194.221
}
}

virtual_server 10.11.194.221 53 {
delay_loop 6
lb_algo rr
lb_kind DR
nat_mask 255.255.252.0
protocol TCP
ha_suspend
persistence_timeout 0

real_server 10.11.194.220 53 {
weight 100
notify_down "/sbin/ipvsadm -d -u 10.11.194.221:53 -r 10.11.194.220:53"
notify_up "/sbin/ipvsadm -a -u 10.11.194.221:53 -r 10.11.194.220:53 -g -w 100"
TCP_CHECK {
connect_timeout 1
connect_port 53
}
}
real_server 10.11.194.222 53 {
weight 100
notify_down "/sbin/ipvsadm -d -u 10.11.194.221:53 -r 10.11.194.222:53"
notify_up "/sbin/ipvsadm -a -u 10.11.194.221:53 -r 10.11.194.222:53 -g -w 100"
TCP_CHECK {
connect_timeout 1
connect_port 53
}
}
}

virtual_server 10.11.194.221 53 {
delay_loop 10
lb_algo rr
lb_kind DR
nat_mask 255.255.252.0
protocol UDP
real_server 10.11.194.220 53 {
weight 100
}
real_server 10.11.194.222 53 {
weight 100
}
}

并修改了/etc/sysctl.conf
net.ipv4.vs.expire_nodest_conn = 1
net.ipv4.vs.expire_quiescent_template = 1
net.ipv4.vs.drop_packet = 1
net.ipv4.vs.drop_entry = 1

之前使用了misc_check,写了脚本,也是一样的效果,故改为用tcp检测,脚本如下:
#!/bin/bash
RR=test.sum-op.cn
[ $# -le 1 ]&-h "; exit 126;}
while getopts "h:" OPT;do
case $OPT in
h)host=$OPTARG;;
*)echo "usage: $0 -h " && exit 1;;
esac
done
#/usr/bin/dig @${host} a ${RR} +time=1 +tries=2 +fail >/dev/null
/usr/bin/dig -b 10.11.194.223 @${host} a ${RR} +time=1 +tries=1 +fail >/dev/null
if [ $? -eq 0 ];then
exit 0
elif [ $? -ne 0 ];then
exit 1
fi

当real_server down了,用ipvsadm查看,那台down的realserver已经给踢了,但通过tcpdump查看包时,明显看到每隔几秒,lvs会检测一下已经down掉的real_server,就会卡住1、2s,然后继续走,过个几分钟那台real_server才会完全失效.
不知各位大拿有没有遇到过, 还望不吝赐教

Forums:

randomness