请章博士和各位高手指点------备份服务器上ldirectord启动问题

两台机器CentOS5.2,应用为MySQL cluster,想实现负载和HA,
两台机器的主机名和ip如下:
192.168.100.5 server5.domain
192.168.100.7 server7.domain

现在两台机器都安装了LVS+heartbeat+ldirectord,配置文件如下

ha.cf :

debugfile /var/log/ha-debug
logfile /var/log/ha-log
keepalive 1000ms
deadtime 30
warntime 5
initdead 120
udpport 694
bcast eth3
mcast eth3 225.0.0.1 694 1 0
ucast eth1 192.168.100.7
auto_failback on
node server5.domain
node server7.domain
respawn hacluster /usr/lib64/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster

haresource:

server5.domain \
ldirectord::ldirectord.cf \
LVSSyncDaemonSwap::master \
IPaddr2::192.168.100.20/24/eth1/192.168.100.255

ldirectord.cf:

checktimeout=5
checkinterval=10
autoreload=yes
logfile="/var/log/ldirectord.log"
quiescent=no
virtual=192.168.100.20:3306
real=192.168.100.8:3306 gate
real=192.168.100.9:3306 gate
real=192.168.100.10:3306 gate
real=192.168.100.11:3306 gate
real=192.168.100.12:3306 gate
real=192.168.100.13:3306 gate
real=192.168.100.14:3306 gate
real=192.168.100.15:3306 gate
service=mysql
scheduler=wrr
protocol=tcp
checkport=3306
checktype=negotiate
login="admin"
passwd="admin"
database="testdatabase"
request="select * from test"

我的问题是:
(1)主服务器(server5)能够正常启动,而从服务器(server7)启动时出现如下提示:

ldirectord stale pid file /var/run/ldirectord.ldirectord.cf.pid for /etc/ha.d/ldirectord.cf

(2)当主服务器关闭的时候,虽然从服务器能够正常接管,但是lirectord无法启动
运行/usr/sbin/ldirectord ldirectord.cf status,出现如下所示:

ldirectord stale pid file /var/run/ldirectord.ldirectord.cf.pid for /etc/ha.d/ldirectord.cf
ldirectord is stopped for /etc/ha.d/ldirectord.cf

此时使用ps查看ldirectord时,发现ldirectord没有启动

附server5关闭时,server5和server7的日志:
-------------------server5 --------------------
ha-log:

heartbeat[11873]: 2009/07/25_18:06:11 info: Heartbeat shutdown in progress. (11873)
heartbeat[12569]: 2009/07/25_18:06:11 info: Giving up all HA resources.
ResourceManager[12582]: 2009/07/25_18:06:11 info: Releasing resource group: server5.domain ldirectord::ldirectord.cf LVSSyncDaemonSwap::master IPaddr2::192.168.100.20/24/eth1/192.168.100.255
ResourceManager[12582]: 2009/07/25_18:06:11 info: Running /etc/ha.d/resource.d/IPaddr2 192.168.100.20/24/eth1/192.168.100.255 stop
IPaddr2[12647]: 2009/07/25_18:06:11 INFO: ip -f inet addr delete 192.168.100.20/24 dev eth1
IPaddr2[12647]: 2009/07/25_18:06:11 INFO: ip -o -f inet addr show eth1
IPaddr2[12618]: 2009/07/25_18:06:11 INFO: Success
ResourceManager[12582]: 2009/07/25_18:06:11 info: Running /etc/ha.d/resource.d/LVSSyncDaemonSwap master stop
LVSSyncDaemonSwap[12714]: 2009/07/25_18:06:12 info: ipvs_syncmaster down
LVSSyncDaemonSwap[12714]: 2009/07/25_18:06:12 info: ipvs_syncbackup up
LVSSyncDaemonSwap[12714]: 2009/07/25_18:06:12 info: ipvs_syncmaster released
ResourceManager[12582]: 2009/07/25_18:06:12 info: Running /etc/ha.d/resource.d/ldirectord ldirectord.cf stop
heartbeat[12569]: 2009/07/25_18:06:12 info: All HA resources relinquished.
heartbeat[11873]: 2009/07/25_18:06:12 WARN: 1 lost packet(s) for [server7.domain] [750:752]
heartbeat[11873]: 2009/07/25_18:06:12 info: No pkts missing from server7.domain!
heartbeat[11873]: 2009/07/25_18:06:13 info: killing /usr/lib64/heartbeat/ipfail process group 11900 with signal 15
heartbeat[11873]: 2009/07/25_18:06:14 info: killing HBWRITE process 11880 with signal 15
heartbeat[11873]: 2009/07/25_18:06:14 info: killing HBREAD process 11881 with signal 15
heartbeat[11873]: 2009/07/25_18:06:14 info: killing HBFIFO process 11875 with signal 15
heartbeat[11873]: 2009/07/25_18:06:14 info: killing HBWRITE process 11876 with signal 15
heartbeat[11873]: 2009/07/25_18:06:14 info: killing HBREAD process 11877 with signal 15
heartbeat[11873]: 2009/07/25_18:06:14 info: killing HBWRITE process 11878 with signal 15
heartbeat[11873]: 2009/07/25_18:06:14 info: killing HBREAD process 11879 with signal 15
heartbeat[11873]: 2009/07/25_18:06:14 info: Core process 11879 exited. 7 remaining
heartbeat[11873]: 2009/07/25_18:06:14 info: Core process 11877 exited. 6 remaining
heartbeat[11873]: 2009/07/25_18:06:14 info: Core process 11876 exited. 5 remaining
heartbeat[11873]: 2009/07/25_18:06:14 info: Core process 11881 exited. 4 remaining
heartbeat[11873]: 2009/07/25_18:06:14 info: Core process 11875 exited. 3 remaining
heartbeat[11873]: 2009/07/25_18:06:14 info: Core process 11880 exited. 2 remaining
heartbeat[11873]: 2009/07/25_18:06:14 info: Core process 11878 exited. 1 remaining
heartbeat[11873]: 2009/07/25_18:06:14 info: server5.domain Heartbeat shutdown complete.

ldirectod.cf:
[Sat Jul 25 18:06:12 2009|ldirectord.cf|12771] Invoking ldirectord invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf stop
[Sat Jul 25 18:06:31 2009|ldirectord.cf|12026] Purged real server (stop): 192.168.100.8:3306 (192.168.100.20:3306)
[Sat Jul 25 18:06:31 2009|ldirectord.cf|12026] emailalert: Purged real server (stop): 192.168.100.8:3306 (192.168.100.20:3306)
[Sat Jul 25 18:06:31 2009|ldirectord.cf|12026] Purged real server (stop): 192.168.100.9:3306 (192.168.100.20:3306)
[Sat Jul 25 18:06:31 2009|ldirectord.cf|12026] emailalert: Purged real server (stop): 192.168.100.9:3306 (192.168.100.20:3306)
[Sat Jul 25 18:06:31 2009|ldirectord.cf|12026] Purged real server (stop): 192.168.100.10:3306 (192.168.100.20:3306)
[Sat Jul 25 18:06:31 2009|ldirectord.cf|12026] emailalert: Purged real server (stop): 192.168.100.10:3306 (192.168.100.20:3306)
[Sat Jul 25 18:06:31 2009|ldirectord.cf|12026] Purged real server (stop): 192.168.100.11:3306 (192.168.100.20:3306)
[Sat Jul 25 18:06:31 2009|ldirectord.cf|12026] emailalert: Purged real server (stop): 192.168.100.11:3306 (192.168.100.20:3306)
[Sat Jul 25 18:06:31 2009|ldirectord.cf|12026] Purged real server (stop): 192.168.100.12:3306 (192.168.100.20:3306)
[Sat Jul 25 18:06:31 2009|ldirectord.cf|12026] emailalert: Purged real server (stop): 192.168.100.12:3306 (192.168.100.20:3306)
[Sat Jul 25 18:06:31 2009|ldirectord.cf|12026] Purged real server (stop): 192.168.100.13:3306 (192.168.100.20:3306)
[Sat Jul 25 18:06:31 2009|ldirectord.cf|12026] emailalert: Purged real server (stop): 192.168.100.13:3306 (192.168.100.20:3306)
[Sat Jul 25 18:06:31 2009|ldirectord.cf|12026] Purged real server (stop): 192.168.100.14:3306 (192.168.100.20:3306)
[Sat Jul 25 18:06:31 2009|ldirectord.cf|12026] emailalert: Purged real server (stop): 192.168.100.14:3306 (192.168.100.20:3306)
[Sat Jul 25 18:06:31 2009|ldirectord.cf|12026] Purged real server (stop): 192.168.100.15:3306 (192.168.100.20:3306)
[Sat Jul 25 18:06:31 2009|ldirectord.cf|12026] emailalert: Purged real server (stop): 192.168.100.15:3306 (192.168.100.20:3306)
[Sat Jul 25 18:06:31 2009|ldirectord.cf|12026] Purged virtual server (stop): 192.168.100.20:3306
[Sat Jul 25 18:06:31 2009|ldirectord.cf|12026] Linux Director Daemon terminated on signal: TERM

-------------------------------server7----------------------------------
ha-log:
heartbeat[25498]: 2009/07/25_18:05:16 info: Received shutdown notice from 'server5.domain'.
heartbeat[25498]: 2009/07/25_18:05:16 info: Resources being acquired from server5.domain.
heartbeat[25691]: 2009/07/25_18:05:16 info: acquire local HA resources (standby).
heartbeat[25691]: 2009/07/25_18:05:16 info: local HA resource acquisition completed (standby).
heartbeat[25498]: 2009/07/25_18:05:16 info: Standby resource acquisition done [all].
heartbeat[25692]: 2009/07/25_18:05:16 info: No local resources [/usr/share/heartbeat/ResourceManager listkeys server7.domain] to acquire.
harc[25717]: 2009/07/25_18:05:16 info: Running /etc/ha.d/rc.d/status status
mach_down[25732]: 2009/07/25_18:05:16 info: Taking over resource group ldirectord::ldirectord.cf
ResourceManager[25757]: 2009/07/25_18:05:16 info: Acquiring resource group: server5.domain ldirectord::ldirectord.cf LVSSyncDaemonSwap::master IPaddr2::192.168.100.20/24/eth1/192.168.100.255
ResourceManager[25757]: 2009/07/25_18:05:16 info: Running /etc/ha.d/resource.d/ldirectord ldirectord.cf start
ResourceManager[25757]: 2009/07/25_18:05:17 info: Running /etc/ha.d/resource.d/LVSSyncDaemonSwap master start
LVSSyncDaemonSwap[25890]: 2009/07/25_18:05:18 info: ipvs_syncbackup down
LVSSyncDaemonSwap[25890]: 2009/07/25_18:05:18 info: ipvs_syncmaster up
LVSSyncDaemonSwap[25890]: 2009/07/25_18:05:18 info: ipvs_syncmaster obtained
IPaddr2[25938]: 2009/07/25_18:05:18 INFO: Resource is stopped
ResourceManager[25757]: 2009/07/25_18:05:18 info: Running /etc/ha.d/resource.d/IPaddr2 192.168.100.20/24/eth1/192.168.100.255 start
IPaddr2[26047]: 2009/07/25_18:05:18 INFO: ip -f inet addr add 192.168.100.20/24 brd 192.168.100.255 dev eth1
IPaddr2[26047]: 2009/07/25_18:05:18 INFO: ip link set eth1 up
IPaddr2[26047]: 2009/07/25_18:05:18 INFO: /usr/lib64/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.100.20 eth1 192.168.100.20 auto not_used not_used
IPaddr2[26018]: 2009/07/25_18:05:18 INFO: Success
mach_down[25732]: 2009/07/25_18:05:18 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[25732]: 2009/07/25_18:05:18 info: mach_down takeover complete for node server5.domain.
heartbeat[25498]: 2009/07/25_18:05:18 info: mach_down takeover complete.
heartbeat[25498]: 2009/07/25_18:05:48 WARN: node server5.domain: is dead
heartbeat[25498]: 2009/07/25_18:05:48 info: Dead node server5.domain gave up resources.
heartbeat[25498]: 2009/07/25_18:05:48 info: Link server5.domain:eth3 dead.
heartbeat[25498]: 2009/07/25_18:05:48 info: Link server5.domain:eth1 dead.
ipfail[25524]: 2009/07/25_18:05:48 info: Status update: Node server5.domain now has status dead
ipfail[25524]: 2009/07/25_18:05:48 info: NS: We are dead. :<
ipfail[25524]: 2009/07/25_18:05:48 info: Link Status update: Link server5.domain/eth3 now has status dead
ipfail[25524]: 2009/07/25_18:05:49 info: We are dead. :<
ipfail[25524]: 2009/07/25_18:05:49 info: Asking other side for ping node count.
ipfail[25524]: 2009/07/25_18:05:49 info: Link Status update: Link server5.domain/eth1 now has status dead
ipfail[25524]: 2009/07/25_18:05:50 info: We are dead. :<
ipfail[25524]: 2009/07/25_18:05:50 info: Asking other side for ping node count.

ldirectord.cf:
[Sat Jul 25 18:05:16 2009|ldirectord.cf|25783] Invoking ldirectord invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf status
[Sat Jul 25 18:05:16 2009|ldirectord.cf|25783] ldirectord stale pid file /var/run/ldirectord.ldirectord.cf.pid for /etc/ha.d/ldirectord.cf
[Sat Jul 25 18:05:16 2009|ldirectord.cf|25783] Exiting with exit_status 1: Exiting from ldirectord status
[Sat Jul 25 18:05:17 2009|ldirectord.cf|25803] Invoking ldirectord invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf start
[Sat Jul 25 18:05:17 2009|ldirectord.cf|25803] Starting Linux Director v1.186-ha-2.1.3 as daemon
[Sat Jul 25 18:05:17 2009|ldirectord.cf|25807] Changed virtual server: 192.168.100.20:3306

这个问题困扰我两个礼拜了,望各位能够不吝赐教!谢谢!

Comments

火速托管-B2M的贴心管家-加盟火速-成就财富梦想   
   上海火速网络科技有限公司成立于1998年10月,是一家专注于网络营销服务的民营企业,历经11年的发展,已经发展成为中国网络营销领域的领跑者。

我们的服务:
以B2M网络托管服务为核心,建立基础网络服务、网建业务、推广和转化服务、信息化及增值服务四大体系,以“火速托管,B2M的贴心管家”为口号,为广大B2M电子商务企业提供优质的高回报率服务。B2M指Business to Marketing,面向市场营销的电子商务企业(电子商务
上海火速网络科技有限公司
地址: 上海浦东软件园郭守敬路498号22号楼301-315
邮编: 201203
网址: http://Shop.ebdoor.com/Shops/508413/Index.aspx

肖源海 ( 经理 )

电话: 13641821553
传真: 021-28934284
手机: 13641821553
地址: 上海浦东软件园郭守敬路498号22号楼301-315
邮编: 201203
邮箱: xiaohy@hotsales.net

randomness