LVS+keepalived主备切换异常

问题描述:
一个lvs+keepalived一共包括三个分发服务,包括一个nginx分发和两个mysql。
lvs+keepalived主备切换不正常,过段时间分发nginx的服务总是自动把主的切换到备用服务器上,而且日志里也没异常。手动重启时候主的有Transition to MASTER STATE这样的日志,但是backup上没有任何日志输出,没有类似Entering BACKUP STATE的日志。并且如果重启备用服务器keepalived服务会抢占主的服务,备的切换为MASTER,同样主服务器里也没日志输出。还有个问题是mysql的服务分发不正常,只往一台分发,但是对nginx的分发就正常。

以下是配置文件:
包括1个应用服务和2个mysql服务
MASTER配置
! Configuration File for keepalived

global_defs {
notification_email {
}
notification_email_from Alexandre.Cassen@firewall.loc
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id LVS_DEVEL
}

vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
121.121.211.185
}
}

vrrp_instance VI_2 {
state MASTER
interface eth1
virtual_router_id 52
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.0.15
}
}

virtual_server 121.121.211.185 80 {
delay_loop 6
lb_algo wrr
lb_kind DR
persistence_timeout 60
protocol TCP

real_server 121.121.211.170 80 {
weight 1
TCP_CHECK {
connect_timeout 10
nb_get_retry 3
delay_before_retry 3
connect_port 80
}

}

real_server 121.121.211.174 80 {
weight 1
TCP_CHECK {
connect_timeout 10
nb_get_retry 3
delay_before_retry 3
connect_port 80
}

}

real_server 121.121.211.252 80 {
weight 1
TCP_CHECK {
connect_timeout 10
nb_get_retry 3
delay_before_retry 3
connect_port 80
}

}

}

virtual_server 192.168.0.15 3306 {
delay_loop 6
lb_algo wrr
lb_kind DR
persistence_timeout 60
protocol TCP

real_server 192.168.0.11 3306 {
weight 1
TCP_CHECK {
connect_timeout 10
nb_get_retry 3
delay_before_retry 3
connect_port 3306
}

}

real_server 192.168.0.12 3306 {
weight 1
TCP_CHECK {
connect_timeout 10
nb_get_retry 3
delay_before_retry 3
connect_port 3306
}

}
}

virtual_server 192.168.0.15 3307 {
delay_loop 6
lb_algo wrr
lb_kind DR
persistence_timeout 60
protocol TCP

real_server 192.168.0.11 3307 {
weight 1
TCP_CHECK {
connect_timeout 10
nb_get_retry 3
delay_before_retry 3
connect_port 3307
}

}

real_server 192.168.0.12 3307 {
weight 1
TCP_CHECK {
connect_timeout 10
nb_get_retry 3
delay_before_retry 3
connect_port 3307
}

}
}

----------------------------

BACKUP配置文件:
! Configuration File for keepalived

global_defs {
notification_email {
}
notification_email_from Alexandre.Cassen@firewall.loc
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id LVS_DEVEL
}

vrrp_instance VI_1 {
state BACKUP
interface eth1
virtual_router_id 51
priority 70
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
121.121.211.185
}
}

vrrp_instance VI_2 {
state BACKUP
interface eth0
virtual_router_id 52
priority 70
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.0.15
}
}

virtual_server 121.121.211.185 80 {
delay_loop 6
lb_algo rr
lb_kind DR
persistence_timeout 60
protocol TCP

real_server 121.121.211.170 80 {
weight 1
TCP_CHECK {
connect_timeout 10
nb_get_retry 3
delay_before_retry 3
connect_port 80
}

}

real_server 121.121.211.174 80 {
weight 1
TCP_CHECK {
connect_timeout 10
nb_get_retry 3
delay_before_retry 3
connect_port 80
}

}

real_server 121.121.211.252 80 {
weight 1
TCP_CHECK {
connect_timeout 10
nb_get_retry 3
delay_before_retry 3
connect_port 80
}

}

}

virtual_server 192.168.0.15 3306 {
delay_loop 6
lb_algo rr
lb_kind DR
persistence_timeout 60
protocol TCP

real_server 192.168.0.11 3306 {
weight 1
TCP_CHECK {
connect_timeout 10
nb_get_retry 3
delay_before_retry 3
connect_port 3306
}

}

real_server 192.168.0.12 3306 {
weight 1
TCP_CHECK {
connect_timeout 10
nb_get_retry 3
delay_before_retry 3
connect_port 3306
}

}
}

virtual_server 192.168.0.15 3307 {
delay_loop 6
lb_algo rr
lb_kind DR
persistence_timeout 60
protocol TCP

real_server 192.168.0.11 3307 {
weight 1
TCP_CHECK {
connect_timeout 10
nb_get_retry 3
delay_before_retry 3
connect_port 3307
}

}

real_server 192.168.0.12 3307 {
weight 1
TCP_CHECK {
connect_timeout 10
nb_get_retry 3
delay_before_retry 3
connect_port 3307
}

}
}

问题反复出现,在网上搜了很久也没找到解决办法。提心吊胆很久了…………

Forums:

你可以在主备上使用tcpdump host 224.0.0.18 -nnn 抓包看一下主备通信情况。

Gosh, I wish I would have had that inaofmrtion earlier!

That's a crkccerjaak answer to an interesting question

不知道你是否用了网上的那个LVS-DR.sh脚本?那个脚本仅仅用来手动测试LVS的,实际中无需运行,直接配置好keepalived,然后运行keepalived服务即可。
其他的就是运行下realserver即可。

我刚开始接触LVS的时候,就是按照那些教程做的,结果就运行了那个LVS-DR脚本,导致主备切换不了,备机老抢占。。。。不知道你是否也是这样?

我也碰到类似的问题了,不知楼主最终是如何解决这个问题的

randomness