帮忙症断一下heartbeat的问题

症状:
主备两台机器开始都会等待2两分钟,之后会认为对方当了,然后激活自己。

主用机的配置
[root@centos100 ha.d]# grep -v -E '^#|^$' ha.cf
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 694
bcast eth0 # Linux
mcast eth0 225.0.0.1 694 1 0
ucast eth0 192.168.110.8
auto_failback on
node centos100
node acer
ping_group group1 192.168.110.22 192.168.110.2
respawn hacluster /usr/lib/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster

[root@centos100 ha.d]# grep -v -E '^#|^$' haresources
centos100 192.168.110.7

[root@centos100 ha.d]# cat /var/log/ha-log
heartbeat[5316]: 2006/12/31_12:06:23 info: **************************
heartbeat[5316]: 2006/12/31_12:06:23 info: Configuration validated. Starting heartbeat 2.0.7
heartbeat[5317]: 2006/12/31_12:06:23 info: heartbeat: version 2.0.7
heartbeat[5317]: 2006/12/31_12:06:23 info: Heartbeat generation: 17
heartbeat[5317]: 2006/12/31_12:06:23 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[5317]: 2006/12/31_12:06:23 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[5317]: 2006/12/31_12:06:23 info: Removing /var/run/heartbeat/rsctmp failed, recreating.
heartbeat[5317]: 2006/12/31_12:06:23 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
heartbeat[5317]: 2006/12/31_12:06:23 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
heartbeat[5317]: 2006/12/31_12:06:23 info: glib: UDP multicast heartbeat started for group 225.0.0.1 port 694 interface eth0 (ttl=1 loop=0)
heartbeat[5317]: 2006/12/31_12:06:23 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0
heartbeat[5317]: 2006/12/31_12:06:23 info: glib: ucast: bound send socket to device: eth0
heartbeat[5317]: 2006/12/31_12:06:23 info: glib: ucast: bound receive socket to device: eth0
heartbeat[5317]: 2006/12/31_12:06:23 info: glib: ucast: started on port 694 interface eth0 to 192.168.110.8
heartbeat[5317]: 2006/12/31_12:06:23 info: glib: ping group heartbeat started.
heartbeat[5317]: 2006/12/31_12:06:23 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[5317]: 2006/12/31_12:06:23 info: Local status now set to: 'up'
heartbeat[5317]: 2006/12/31_12:06:24 info: Link group1:group1 up.
heartbeat[5317]: 2006/12/31_12:06:24 info: Status update for node group1: status ping
heartbeat[5317]: 2006/12/31_12:08:24 WARN: node acer: is dead
heartbeat[5317]: 2006/12/31_12:08:24 info: Comm_now_up(): updating status to active
heartbeat[5317]: 2006/12/31_12:08:24 info: Local status now set to: 'active'
heartbeat[5317]: 2006/12/31_12:08:24 info: Starting child client "/usr/lib/heartbeat/ipfail" (500,500)
heartbeat[5317]: 2006/12/31_12:08:24 WARN: No STONITH device configured.
heartbeat[5317]: 2006/12/31_12:08:24 WARN: Shared disks are not protected.
heartbeat[5317]: 2006/12/31_12:08:24 info: Resources being acquired from acer.
heartbeat[5332]: 2006/12/31_12:08:24 info: Starting "/usr/lib/heartbeat/ipfail" as uid 500 gid 500 (pid 5332)
harc[5333]: 2006/12/31_12:08:24 info: Running /etc/ha.d/rc.d/status status
mach_down[5344]: 2006/12/31_12:08:24 info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[5344]: 2006/12/31_12:08:24 info: mach_down takeover complete for node acer.
heartbeat[5317]: 2006/12/31_12:08:24 info: mach_down takeover complete.
heartbeat[5317]: 2006/12/31_12:08:24 info: Initial resource acquisition complete (mach_down)
IPaddr[5397]: 2006/12/31_12:08:24 INFO: IPaddr Resource is stopped
heartbeat[5334]: 2006/12/31_12:08:24 info: Local Resource acquisition completed.
harc[5483]: 2006/12/31_12:08:25 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[5483]: 2006/12/31_12:08:25 received ip-request-resp 192.168.110.7 OK yes
ResourceManager[5498]: 2006/12/31_12:08:25 info: Acquiring resource group: centos100 192.168.110.7
IPaddr[5522]: 2006/12/31_12:08:25 INFO: IPaddr Resource is stopped
ResourceManager[5498]: 2006/12/31_12:08:25 info: Running /etc/ha.d/resource.d/IPaddr 192.168.110.7 start
IPaddr[5698]: 2006/12/31_12:08:26 INFO: eval /sbin/ifconfig eth0:0 192.168.110.7 netmask 255.255.255.0 broadcast 192.168.110.255
IPaddr[5698]: 2006/12/31_12:08:26 INFO: Sending Gratuitous Arp for 192.168.110.7 on eth0:0 [eth0]
IPaddr[5698]: 2006/12/31_12:08:26 INFO: /usr/lib/heartbeat/send_arp -i 500 -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.110.7 eth0 192.168.110.7 auto 192.168.110.7 ffffffffffff
IPaddr[5628]: 2006/12/31_12:08:26 INFO: IPaddr Success
heartbeat[5317]: 2006/12/31_12:08:34 info: Local Resource acquisition completed. (none)
heartbeat[5317]: 2006/12/31_12:08:34 info: local resource transition completed.

备用机的配置
[root@acer ha.d]# grep -v -E '^#|^$' ha.cf
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 694
bcast eth0 # Linux
mcast eth0 225.0.0.1 694 1 0
ucast eth0 192.168.110.6
auto_failback on
node centos100
node acer
ping_group group1 192.168.110.22 192.168.110.2
respawn hacluster /usr/lib/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster

[root@acer ha.d]# grep -v -E '^#|^$' haresources
centos100 192.168.110.7

[root@acer ha.d]# cat /var/log/ha-log
heartbeat[4999]: 2006/12/31_12:19:33 info: **************************
heartbeat[4999]: 2006/12/31_12:19:33 info: Configuration validated. Starting heartbeat 2.0.7
heartbeat[5000]: 2006/12/31_12:19:33 info: heartbeat: version 2.0.7
heartbeat[5000]: 2006/12/31_12:19:33 info: Heartbeat generation: 16
heartbeat[5000]: 2006/12/31_12:19:33 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[5000]: 2006/12/31_12:19:33 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[5000]: 2006/12/31_12:19:34 info: Removing /var/run/heartbeat/rsctmp failed, recreating.
heartbeat[5000]: 2006/12/31_12:19:34 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
heartbeat[5000]: 2006/12/31_12:19:34 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
heartbeat[5000]: 2006/12/31_12:19:34 info: glib: UDP multicast heartbeat started for group 225.0.0.1 port 694 interface eth0 (ttl=1 loop=0)
heartbeat[5000]: 2006/12/31_12:19:34 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0
heartbeat[5000]: 2006/12/31_12:19:34 info: glib: ucast: bound send socket to device: eth0
heartbeat[5000]: 2006/12/31_12:19:34 info: glib: ucast: bound receive socket to device: eth0
heartbeat[5000]: 2006/12/31_12:19:34 info: glib: ucast: started on port 694 interface eth0 to 192.168.110.6
heartbeat[5000]: 2006/12/31_12:19:34 info: glib: ping group heartbeat started.
heartbeat[5000]: 2006/12/31_12:19:34 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[5000]: 2006/12/31_12:19:34 info: Local status now set to: 'up'
heartbeat[5000]: 2006/12/31_12:19:35 info: Link group1:group1 up.
heartbeat[5000]: 2006/12/31_12:19:35 info: Status update for node group1: status ping
heartbeat[5000]: 2006/12/31_12:21:34 WARN: node centos100: is dead
heartbeat[5000]: 2006/12/31_12:21:34 info: Comm_now_up(): updating status to active
heartbeat[5000]: 2006/12/31_12:21:34 info: Local status now set to: 'active'
heartbeat[5000]: 2006/12/31_12:21:34 info: Starting child client "/usr/lib/heartbeat/ipfail" (500,500)
heartbeat[5000]: 2006/12/31_12:21:34 WARN: No STONITH device configured.
heartbeat[5000]: 2006/12/31_12:21:34 WARN: Shared disks are not protected.
heartbeat[5000]: 2006/12/31_12:21:34 info: Resources being acquired from centos100.
heartbeat[5013]: 2006/12/31_12:21:34 info: Starting "/usr/lib/heartbeat/ipfail" as uid 500 gid 500 (pid 5013)
harc[5014]: 2006/12/31_12:21:34 info: Running /etc/ha.d/rc.d/status status
heartbeat[5015]: 2006/12/31_12:21:34 info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys acer] to acquire.
mach_down[5025]: 2006/12/31_12:21:34 info: Taking over resource group 192.168.110.7
ResourceManager[5054]: 2006/12/31_12:21:34 info: Acquiring resource group: centos100 192.168.110.7
IPaddr[5078]: 2006/12/31_12:21:35 INFO: IPaddr Resource is stopped
ResourceManager[5054]: 2006/12/31_12:21:35 info: Running /etc/ha.d/resource.d/IPaddr 192.168.110.7 start
IPaddr[5255]: 2006/12/31_12:21:36 INFO: eval /sbin/ifconfig eth0:0 192.168.110.7 netmask 255.255.255.0 broadcast 192.168.110.255
IPaddr[5255]: 2006/12/31_12:21:36 INFO: Sending Gratuitous Arp for 192.168.110.7 on eth0:0 [eth0]
IPaddr[5255]: 2006/12/31_12:21:36 INFO: /usr/lib/heartbeat/send_arp -i 500 -r 10 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.110.7 eth0 192.168.110.7 auto 192.168.110.7 ffffffffffff
IPaddr[5185]: 2006/12/31_12:21:36 INFO: IPaddr Success
mach_down[5025]: 2006/12/31_12:21:36 info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[5025]: 2006/12/31_12:21:36 info: mach_down takeover complete for node centos100.
heartbeat[5000]: 2006/12/31_12:21:36 info: mach_down takeover complete.
heartbeat[5000]: 2006/12/31_12:21:36 info: Initial resource acquisition complete (mach_down)
heartbeat[5000]: 2006/12/31_12:21:44 info: Local Resource acquisition completed. (none)
heartbeat[5000]: 2006/12/31_12:21:44 info: local resource transition completed.

Forums:

原因查出来了,是防火墙的问题。

只是把防火墙关了
还是不知道怎么配防火墙可以使之符合heartbeat的要求!

搞定了
iptable的配置文件(/etc/sysconfig/iptables)里加上 -A RH-Firewall-1-INPUT -p udp --dport 694 -j ACCEPT

randomness