redhat LVS运行错误:两台real server轮流shutdown

通过redhat的piranha-gui配置了一个LVS,实际使用中发现两台服务器的负载不均衡,检查message日志发现了很多报错。
Apr 17 17:08:30 lvs-dr2 nanny[2530]: READ to 10.1.8.180:80 timed out
Apr 17 17:08:30 lvs-dr2 nanny[2530]: [inactive] shutting down 10.1.8.180:80 due to connection failure
Apr 17 17:08:42 lvs-dr2 nanny[2530]: [ active ] making 10.1.8.180:80 available
Apr 17 18:11:27 lvs-dr2 nanny[2532]: READ to 10.1.8.196:80 timed out
Apr 17 18:11:27 lvs-dr2 nanny[2532]: [inactive] shutting down 10.1.8.196:80 due to connection failure
Apr 17 18:11:40 lvs-dr2 nanny[2532]: [ active ] making 10.1.8.196:80 available
Apr 17 19:08:48 lvs-dr2 nanny[2530]: READ to 10.1.8.180:80 timed out
Apr 17 19:08:48 lvs-dr2 nanny[2530]: [inactive] shutting down 10.1.8.180:80 due to connection failure
Apr 17 19:09:00 lvs-dr2 nanny[2530]: [ active ] making 10.1.8.180:80 available
Apr 17 19:54:10 lvs-dr2 nanny[2532]: READ to 10.1.8.196:80 timed out
Apr 17 19:54:10 lvs-dr2 nanny[2532]: [inactive] shutting down 10.1.8.196:80 due to connection failure
Apr 17 19:54:22 lvs-dr2 nanny[2532]: [ active ] making 10.1.8.196:80 available
Apr 17 21:09:06 lvs-dr2 nanny[2530]: READ to 10.1.8.180:80 timed out
Apr 17 21:09:06 lvs-dr2 nanny[2530]: [inactive] shutting down 10.1.8.180:80 due to connection failure
Apr 17 21:09:18 lvs-dr2 nanny[2530]: [ active ] making 10.1.8.180:80 available

两台realserver上运行的是tomcat,负载都不大,不可能出现端口无法响应的可能。而且这种timed out也太奇怪了,两台轮流,从未发现过一台重复两次的。

lvs.cf配置如下:

serial_no = 33
primary = 10.1.8.194
service = lvs
backup_active = 1
backup = 10.1.8.195
heartbeat = 1
heartbeat_port = 539
keepalive = 6
deadtime = 18
network = direct
debug_level = NONE
monitor_links = 0
syncdaemon = 0
virtual wine_lvs {
active = 1
address = 10.1.8.193 eth0:1
vip_nmask = 255.255.255.255
port = 80
persistent = 120
pmask = 255.255.255.255
send = "GET / HTTP/1.0\r\n\r\n"
expect = "HTTP"
use_regex = 0
load_monitor = none
scheduler = wlc
protocol = tcp
timeout = 6
reentry = 15
quiesce_server = 0
server RS1 {
address = 10.1.8.180
active = 1
weight = 5
}
server RS2 {
address = 10.1.8.196
active = 1
weight = 3
}
}

在使用piranha之前,用过ipvsadm+keepalived,也出现过同样的情况。所以大概可以排除是nanny的问题。

Mar 22 17:29:06 lvs-dr2 Keepalived_healthcheckers: TCP connection to [10.1.8.180:80] failed !!!
Mar 22 17:29:06 lvs-dr2 Keepalived_healthcheckers: Removing service [10.1.8.180:80] from VS [10.1.8.193:80]
Mar 22 17:29:26 lvs-dr2 Keepalived_healthcheckers: TCP connection to [10.1.8.180:80] success.
Mar 22 17:29:26 lvs-dr2 Keepalived_healthcheckers: Adding service [10.1.8.180:80] to VS [10.1.8.193:80]
Mar 22 18:03:42 lvs-dr2 Keepalived_healthcheckers: TCP connection to [10.1.8.196:80] failed !!!
Mar 22 18:03:42 lvs-dr2 Keepalived_healthcheckers: Removing service [10.1.8.196:80] from VS [10.1.8.193:80]
Mar 22 18:04:02 lvs-dr2 Keepalived_healthcheckers: TCP connection to [10.1.8.196:80] success.
Mar 22 18:04:02 lvs-dr2 Keepalived_healthcheckers: Adding service [10.1.8.196:80] to VS [10.1.8.193:80]

Forums:

randomness