请章博士和各位高手指点------备份服务器上ldirectord启动问题

两台机器CentOS5.2,应用为MySQL cluster,想实现负载和HA,
两台机器的主机名和ip如下:
192.168.100.5 server5.domain
192.168.100.7 server7.domain

现在两台机器都安装了LVS+heartbeat+ldirectord,配置文件如下

ha.cf :

debugfile /var/log/ha-debug
logfile /var/log/ha-log
keepalive 1000ms
deadtime 30
warntime 5
initdead 120
udpport 694
bcast eth3
mcast eth3 225.0.0.1 694 1 0
ucast eth1 192.168.100.7
auto_failback on
node server5.domain
node server7.domain
respawn hacluster /usr/lib64/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster

haresource:

server5.domain \
ldirectord::ldirectord.cf \
LVSSyncDaemonSwap::master \
IPaddr2::192.168.100.20/24/eth1/192.168.100.255

ldirectord.cf:

checktimeout=5
checkinterval=10
autoreload=yes
logfile="/var/log/ldirectord.log"
quiescent=no
virtual=192.168.100.20:3306
real=192.168.100.8:3306 gate
real=192.168.100.9:3306 gate
real=192.168.100.10:3306 gate
real=192.168.100.11:3306 gate
real=192.168.100.12:3306 gate
real=192.168.100.13:3306 gate
real=192.168.100.14:3306 gate
real=192.168.100.15:3306 gate
service=mysql
scheduler=wrr
protocol=tcp
checkport=3306
checktype=negotiate
login="admin"
passwd="admin"
database="testdatabase"
request="select * from test"

我的问题是:
(1)主服务器(server5)能够正常启动,而从服务器(server7)启动时出现如下提示:

ldirectord stale pid file /var/run/ldirectord.ldirectord.cf.pid for /etc/ha.d/ldirectord.cf

(2)当主服务器关闭的时候,虽然从服务器能够正常接管,但是lirectord无法启动
运行/usr/sbin/ldirectord ldirectord.cf status,出现如下所示:

ldirectord stale pid file /var/run/ldirectord.ldirectord.cf.pid for /etc/ha.d/ldirectord.cf
ldirectord is stopped for /etc/ha.d/ldirectord.cf

此时使用ps查看ldirectord时,发现ldirectord没有启动

附server5关闭时,server7的日志:

-------------------------------server7----------------------------------
ha-log:
heartbeat[25498]: 2009/07/25_18:05:16 info: Received shutdown notice from 'server5.domain'.
heartbeat[25498]: 2009/07/25_18:05:16 info: Resources being acquired from server5.domain.
heartbeat[25691]: 2009/07/25_18:05:16 info: acquire local HA resources (standby).
heartbeat[25691]: 2009/07/25_18:05:16 info: local HA resource acquisition completed (standby).
heartbeat[25498]: 2009/07/25_18:05:16 info: Standby resource acquisition done [all].
heartbeat[25692]: 2009/07/25_18:05:16 info: No local resources [/usr/share/heartbeat/ResourceManager listkeys server7.domain] to acquire.
harc[25717]: 2009/07/25_18:05:16 info: Running /etc/ha.d/rc.d/status status
mach_down[25732]: 2009/07/25_18:05:16 info: Taking over resource group ldirectord::ldirectord.cf
ResourceManager[25757]: 2009/07/25_18:05:16 info: Acquiring resource group: server5.domain ldirectord::ldirectord.cf LVSSyncDaemonSwap::master IPaddr2::192.168.100.20/24/eth1/192.168.100.255
ResourceManager[25757]: 2009/07/25_18:05:16 info: Running /etc/ha.d/resource.d/ldirectord ldirectord.cf start
ResourceManager[25757]: 2009/07/25_18:05:17 info: Running /etc/ha.d/resource.d/LVSSyncDaemonSwap master start
LVSSyncDaemonSwap[25890]: 2009/07/25_18:05:18 info: ipvs_syncbackup down
LVSSyncDaemonSwap[25890]: 2009/07/25_18:05:18 info: ipvs_syncmaster up
LVSSyncDaemonSwap[25890]: 2009/07/25_18:05:18 info: ipvs_syncmaster obtained
IPaddr2[25938]: 2009/07/25_18:05:18 INFO: Resource is stopped
ResourceManager[25757]: 2009/07/25_18:05:18 info: Running /etc/ha.d/resource.d/IPaddr2 192.168.100.20/24/eth1/192.168.100.255 start
IPaddr2[26047]: 2009/07/25_18:05:18 INFO: ip -f inet addr add 192.168.100.20/24 brd 192.168.100.255 dev eth1
IPaddr2[26047]: 2009/07/25_18:05:18 INFO: ip link set eth1 up
IPaddr2[26047]: 2009/07/25_18:05:18 INFO: /usr/lib64/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.100.20 eth1 192.168.100.20 auto not_used not_used
IPaddr2[26018]: 2009/07/25_18:05:18 INFO: Success
mach_down[25732]: 2009/07/25_18:05:18 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[25732]: 2009/07/25_18:05:18 info: mach_down takeover complete for node server5.domain.
heartbeat[25498]: 2009/07/25_18:05:18 info: mach_down takeover complete.
heartbeat[25498]: 2009/07/25_18:05:48 WARN: node server5.domain: is dead
heartbeat[25498]: 2009/07/25_18:05:48 info: Dead node server5.domain gave up resources.
heartbeat[25498]: 2009/07/25_18:05:48 info: Link server5.domain:eth3 dead.
heartbeat[25498]: 2009/07/25_18:05:48 info: Link server5.domain:eth1 dead.
ipfail[25524]: 2009/07/25_18:05:48 info: Status update: Node server5.domain now has status dead
ipfail[25524]: 2009/07/25_18:05:48 info: NS We are dead.
ipfail[25524]: 2009/07/25_18:05:48 info: Link Status update: Link server5.domain/eth3 now has status dead
ipfail[25524]: 2009/07/25_18:05:49 info: We are dead.
ipfail[25524]: 2009/07/25_18:05:49 info: Asking other side for ping node count.
ipfail[25524]: 2009/07/25_18:05:49 info: Link Status update: Link server5.domain/eth1 now has status dead
ipfail[25524]: 2009/07/25_18:05:50 info: We are dead.
ipfail[25524]: 2009/07/25_18:05:50 info: Asking other side for ping node count.

ldirectord.cf:
[Sat Jul 25 18:05:16 2009|ldirectord.cf|25783] Invoking ldirectord invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf status
[Sat Jul 25 18:05:16 2009|ldirectord.cf|25783] ldirectord stale pid file /var/run/ldirectord.ldirectord.cf.pid for /etc/ha.d/ldirectord.cf
[Sat Jul 25 18:05:16 2009|ldirectord.cf|25783] Exiting with exit_status 1: Exiting from ldirectord status
[Sat Jul 25 18:05:17 2009|ldirectord.cf|25803] Invoking ldirectord invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf start
[Sat Jul 25 18:05:17 2009|ldirectord.cf|25803] Starting Linux Director v1.186-ha-2.1.3 as daemon
[Sat Jul 25 18:05:17 2009|ldirectord.cf|25807] Changed virtual server: 192.168.100.20:3306

这个问题困扰我两个礼拜了,望各位能够不吝赐教!谢谢!

Forums:

各位高手帮帮忙。。。

randomness