想请教个问题,我用的lvs+heartbeat+ldirectord做集群,测试的时候有问题

lvs+heartbeat+ldirectord

你好,想请教个问题,我用的lvs+heartbeat,
192.168.1.12 主节点
192.168.1.15 备份节点
192.168.1.14 虚拟ip

我测试的时候用ssh登入192.168.1.14 可以登进去ip为192.168.1.12的主节点,
但是我把主节点关掉,然后,eth0:0也转移到备份节点了,日志也没有报错,但是ssh连接192.168.1.14就连接不

上了。虽然那个网卡转移了,(192.168.1.15上有eth0:0这个网卡了),但是ssh还是连接不上。

环境: 192.168.1.12 node1
192.168.1.15 node2
vip:192.168.1.14

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
脚本 /etc/rc.dinit.d/lvs_rs
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
#!/bin/sh
# chkconfig: 2345 72 08
# description: Config realserver lo:0 port and apply arp patch
VIP=192.168.1.14
. /etc/rc.d/init.d/functions
case $1 in
start)
echo "lo:0 port starting"
echo "0" >/proc/sys/net/ipv4/ip_forward
/sbin/ifconfig lo:0 $VIP broadcast $VIP netmask 255.255.255.255 up
/sbin/route add -host $VIP dev lo:0
echo "1" > /proc/sys/net/ipv4/conf/lo/arp_ignore
echo "2" > /proc/sys/net/ipv4/conf/lo/arp_announce
echo "1" > /proc/sys/net/ipv4/conf/all/arp_ignore
echo "2" > /proc/sys/net/ipv4/conf/all/arp_announce
sysctl -p
;;
stop)
echo "lo:0 port closing"
ifconfig lo:0 down
echo "1" > /proc/sys/net/ipv4/ip_forward
echo "0" > /proc/sys/net/ipv4/conf/all/arp_announce
;;
*)
echo "Usage: $0 {start|stop}"
exit 1
esac

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

/etc/ha.d/的配置文件

ha.cf haresources ldirectord.cf authkeys
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
首先是ha.cf

debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 10
warntime 5
#initdead 120
udpport 28855
#bcast eth0
ucast eth0 192.168.1.12 (另一台配置文件的ip就为192.168.1.15)
auto_failback on
watchdog /dev/watchdog
node node1 (192.168.1.12的主机名)
node node2 (192.168.1.15的主机名)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
haresources配置文件

node1 lvs_switch 192.168.1.14 lvs_dr ldirectord

lvs_switch和lvs_dr在/etc/ha.d/resouce.d/中定义的脚本。

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

ldirectord.cf配置文件
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
checktimeout=3
checkinterval=1
fallback=127.0.0.1:80
autoreload=yes
logfile="/var/log/ldirectord.log"
#logfile="local0"
#emailalert="admin@x.y.z"
#emailalertfreq=3600
#emailalertstatus=all
quiescent=yes

# Sample for an http virtual service
virtual=192.168.1.14:80
real=192.168.1.12:80 gate
real=192.168.1.15:80 gate
fallback=127.0.0.1:80 gate
service=http
request="lvstest.html"
receive="lvstest"
scheduler=rr
#persistent=600
#netmask=255.255.255.255
protocol=tcp
checktype=negotiate
checkport=80
request="lvstest.html"
receive="lvstest"

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

authkeys配置文件
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
auth 3
#1 crc
#2 sha1 HI!
3 md5 Hello!
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
resource.d下面的脚本
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
lvs_dr配置文件

#!/bin/sh
# description: start LVS of Directorserver
VIP=192.168.1.14
RIP1=192.168.1.12
RIP2=192.168.1.15
SERVICE=80 #http is used in this case
. /etc/rc.d/init.d/functions
case $1 in
start)
echo "start LVS of DirectorServer"
# set ip_forward&send_redirects
echo "0" >/proc/sys/net/ipv4/ip_forward
echo "1" >/proc/sys/net/ipv4/conf/all/send_redirects
echo "1" >/proc/sys/net/ipv4/conf/default/send_redirects
echo "1" >/proc/sys/net/ipv4/conf/eth0/send_redirects
# set the Virtual IP Address
/sbin/ifconfig eth0:0 $VIP broadcast $VIP netmask 255.255.255.255 up
/sbin/route add -host $VIP dev eth0:0
#Clear IPVS table
/sbin/ipvsadm -C
#set LVS
/sbin/ipvsadm -A -t $VIP:$SERVICE -s rr
/sbin/ipvsadm -a -t $VIP:$SERVICE -r $RIP1:$SERVICE -g -w 1
/sbin/ipvsadm -a -t $VIP:$SERVICE -r $RIP2:$SERVICE -g -w 1
#/sbin/ipvsadm -a -t $VIP:$SERVICE -r $RIP3:$SERVICE -g -w 1
/sbin/ipvsadm --set 30 120 300
#Run LVS
/sbin/ipvsadm
#end
;;
stop)
echo "close LVS Directorserver"
/sbin/ipvsadm -C
;;
*)
echo "Usage: $0 {start|stop}"
exit 1
esac

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

lvs_switch配置文件

#!/bin/sh
# description: close lo:0 and arp_ignore
VIP=192.168.1.14
. /etc/rc.d/init.d/functions
case $1 in
start)
echo "start director server and close lo:0"
#ifconfig lo:0 down
echo "1" > /proc/sys/net/ipv4/ip_forward
echo "0" > /proc/sys/net/ipv4/conf/all/arp_announce
;;
stop)
echo "start Real Server"
echo "0" >/proc/sys/net/ipv4/ip_forward
/sbin/ifconfig lo:0 $VIP broadcast $VIP netmask 255.255.255.255 up
/sbin/route add -host $VIP dev lo:0
echo "1" > /proc/sys/net/ipv4/conf/lo/arp_ignore
echo "2" > /proc/sys/net/ipv4/conf/lo/arp_announce
echo "1" > /proc/sys/net/ipv4/conf/all/arp_ignore
echo "2" > /proc/sys/net/ipv4/conf/all/arp_announce
sysctl -p
;;
*)
echo "Usage: lvs {start|stop}"
exit 1
esac

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
192.168.1.12的ip地址信息
[root@node1 ~]# ifconfig |more
eth0 Link encap:Ethernet HWaddr 00:0C:29:11:1D:4E
inet addr:192.168.1.12 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe11:1d4e/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:15002 errors:0 dropped:0 overruns:0 frame:0
TX packets:51847 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1766034 (1.6 MiB) TX bytes:11604580 (11.0 MiB)
Interrupt:67 Base address:0x2000

eth0:0 Link encap:Ethernet HWaddr 00:0C:29:11:1D:4E
inet addr:192.168.1.14 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:67 Base address:0x2000

eth1 Link encap:Ethernet HWaddr 00:0C:29:11:1D:58
inet addr:192.168.0.12 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe11:1d58/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:4841 errors:0 dropped:0 overruns:0 frame:0
TX packets:85 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:420811 (410.9 KiB) TX bytes:16341 (15.9 KiB)
Interrupt:67 Base address:0x2080

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:592 errors:0 dropped:0 overruns:0 frame:0
TX packets:592 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:69082 (67.4 KiB) TX bytes:69082 (67.4 KiB)

lo:0 Link encap:Local Loopback
inet addr:192.168.1.14 Mask:255.255.255.255
UP LOOPBACK RUNNING MTU:16436 Metric:1

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
192.168.1.15的ip地址信息
[root@node2 ha.d]# ifconfig |more
eth0 Link encap:Ethernet HWaddr 00:0C:29:FB:3F:59
inet addr:192.168.1.15 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fefb:3f59/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:55377 errors:0 dropped:0 overruns:0 frame:0
TX packets:11213 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:11781076 (11.2 MiB) TX bytes:1479266 (1.4 MiB)
Interrupt:67 Base address:0x2000

eth1 Link encap:Ethernet HWaddr 00:0C:29:FB:3F:63
inet addr:192.168.0.15 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fefb:3f63/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:4400 errors:0 dropped:0 overruns:0 frame:0
TX packets:83 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:368430 (359.7 KiB) TX bytes:16346 (15.9 KiB)
Interrupt:67 Base address:0x2080

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:65082 errors:0 dropped:0 overruns:0 frame:0
TX packets:65082 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:5237901 (4.9 MiB) TX bytes:5237901 (4.9 MiB)

lo:0 Link encap:Local Loopback
inet addr:192.168.1.14 Mask:255.255.255.255
UP LOOPBACK RUNNING MTU:16436 Metric:1

主节点从断掉到起来转移的日志信息:

[root@node2 log]# vi ha-log

heartbeat[20182]: 2009/04/06_15:53:07 WARN: node node1: is dead
heartbeat[20182]: 2009/04/06_15:53:07 WARN: No STONITH device configured.
heartbeat[20182]: 2009/04/06_15:53:07 WARN: Shared disks are not protected.
heartbeat[20182]: 2009/04/06_15:53:07 info: Resources being acquired from node1.
heartbeat[20182]: 2009/04/06_15:53:07 info: Link node1:eth0 dead.
heartbeat[20244]: 2009/04/06_15:53:07 info: No local resources [/usr/share/heartbeat/ResourceManager

listkeys node2] to acquire.
harc[20243]: 2009/04/06_15:53:07 info: Running /etc/ha.d/rc.d/status status
mach_down[20272]: 2009/04/06_15:53:07 info: Taking over resource group lvs_switch
ResourceManager[20298]: 2009/04/06_15:53:07 info: Acquiring resource group: node1 lvs_switch

192.168.1.14 lvs_dr ldirectord
ResourceManager[20298]: 2009/04/06_15:53:07 info: Running /etc/ha.d/resource.d/lvs_switch start
IPaddr[20360]: 2009/04/06_15:53:08 INFO: Running OK
ResourceManager[20298]: 2009/04/06_15:53:08 info: Running /etc/ha.d/resource.d/lvs_dr start
ResourceManager[20298]: 2009/04/06_15:53:08 info: Running /etc/ha.d/resource.d/ldirectord start
mach_down[20272]: 2009/04/06_15:53:08 info: /usr/share/heartbeat/mach_down: nice_failback:

foreign resources acquired
mach_down[20272]: 2009/04/06_15:53:08 info: mach_down takeover complete for node node1.
heartbeat[20182]: 2009/04/06_15:53:08 info: mach_down takeover complete.
heartbeat[20182]: 2009/04/06_15:55:01 CRIT: Cluster node node1 returning after partition.
heartbeat[20182]: 2009/04/06_15:55:01 info: For information on cluster partitions, See URL:

http://linux-ha.org/SplitBrain
heartbeat[20182]: 2009/04/06_15:55:01 WARN: Deadtime value may be too small.
heartbeat[20182]: 2009/04/06_15:55:01 info: See FAQ for information on tuning deadtime.
heartbeat[20182]: 2009/04/06_15:55:01 info: URL: http://linux-ha.org/FAQ#heavy_load
heartbeat[20182]: 2009/04/06_15:55:01 info: Link node1:eth0 up.
heartbeat[20182]: 2009/04/06_15:55:01 WARN: Late heartbeat: Node node1: interval 124510 ms
heartbeat[20182]: 2009/04/06_15:55:01 info: Status update for node node1: status active
harc[20521]: 2009/04/06_15:55:01 info: Running /etc/ha.d/rc.d/status status
heartbeat[20182]: 2009/04/06_15:55:03 WARN: Shutdown delayed until current resource activity

finishes.
heartbeat[20182]: 2009/04/06_15:55:04 info: Heartbeat shutdown in progress. (20182)
heartbeat[20182]: 2009/04/06_15:55:04 info: Received shutdown notice from 'node1'.
heartbeat[20182]: 2009/04/06_15:55:04 info: Resource takeover cancelled - shutdown in progress.
heartbeat[20537]: 2009/04/06_15:55:04 info: Giving up all HA resources.
ResourceManager[20550]: 2009/04/06_15:55:04 info: Releasing resource group: node1 lvs_switch

192.168.1.14 lvs_dr ldirectord
ResourceManager[20550]: 2009/04/06_15:55:04 info: Running /etc/ha.d/resource.d/ldirectord stop
ResourceManager[20550]: 2009/04/06_15:55:04 info: Running /etc/ha.d/resource.d/lvs_dr stop
ResourceManager[20550]: 2009/04/06_15:55:04 info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.14

stop
IPaddr[20654]: 2009/04/06_15:55:04 INFO: ifconfig eth0:0 down
IPaddr[20637]: 2009/04/06_15:55:04 INFO: Success
ResourceManager[20550]: 2009/04/06_15:55:04 info: Running /etc/ha.d/resource.d/lvs_switch stop
heartbeat[20537]: 2009/04/06_15:55:05 info: All HA resources relinquished.
heartbeat[20182]: 2009/04/06_15:55:07 info: killing HBWRITE process 20185 with signal 15
heartbeat[20182]: 2009/04/06_15:55:07 info: killing HBREAD process 20186 with signal 15
heartbeat[20182]: 2009/04/06_15:55:07 info: killing HBFIFO process 20184 with signal 15
heartbeat[20182]: 2009/04/06_15:55:07 info: Core process 20186 exited. 3 remaining
heartbeat[20182]: 2009/04/06_15:55:07 info: Core process 20185 exited. 2 remaining
heartbeat[20182]: 2009/04/06_15:55:07 info: Core process 20184 exited. 1 remaining
heartbeat[20182]: 2009/04/06_15:55:07 info: node2 Heartbeat shutdown complete.
heartbeat[20182]: 2009/04/06_15:55:07 info: Heartbeat restart triggered.
heartbeat[20182]: 2009/04/06_15:55:07 info: Restarting heartbeat.
heartbeat[20182]: 2009/04/06_15:55:07 info: Performing heartbeat restart exec.
heartbeat[20182]: 2009/04/06_15:55:18 info: Version 2 support: false
heartbeat[20182]: 2009/04/06_15:55:18 WARN: Logging daemon is disabled --enabling logging daemon is

recommended
heartbeat[20182]: 2009/04/06_15:55:18 info: **************************
heartbeat[20182]: 2009/04/06_15:55:18 info: Configuration validated. Starting heartbeat 2.1.3
heartbeat[20711]: 2009/04/06_15:55:18 info: heartbeat: version 2.1.3
heartbeat[20711]: 2009/04/06_15:55:18 info: Heartbeat generation: 1238987688
heartbeat[20711]: 2009/04/06_15:55:18 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY

on eth0
heartbeat[20711]: 2009/04/06_15:55:18 info: glib: ucast: bound send socket to device: eth0
heartbeat[20711]: 2009/04/06_15:55:18 info: glib: ucast: bound receive socket to device: eth0
heartbeat[20711]: 2009/04/06_15:55:18 info: glib: ucast: started on port 28855 interface eth0 to

192.168.1.12
heartbeat[20711]: 2009/04/06_15:55:18 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[20711]: 2009/04/06_15:55:18 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[20711]: 2009/04/06_15:55:18 notice: Using watchdog device: /dev/watchdog
heartbeat[20711]: 2009/04/06_15:55:18 info: G_main_add_SignalHandler: Added signal handler for

signal 17
heartbeat[20711]: 2009/04/06_15:55:18 info: Local status now set to: 'up'
heartbeat[20711]: 2009/04/06_15:55:19 info: Link node1:eth0 up.
heartbeat[20711]: 2009/04/06_15:55:19 info: Status update for node node1: status up
harc[20717]: 2009/04/06_15:55:19 info: Running /etc/ha.d/rc.d/status status
heartbeat[20711]: 2009/04/06_15:55:19 info: Comm_now_up(): updating status to active
heartbeat[20711]: 2009/04/06_15:55:19 info: Local status now set to: 'active'
heartbeat[20711]: 2009/04/06_15:55:20 info: Status update for node node1: status active
harc[20736]: 2009/04/06_15:55:20 info: Running /etc/ha.d/rc.d/status status
heartbeat[20711]: 2009/04/06_15:55:30 info: remote resource transition completed.
heartbeat[20711]: 2009/04/06_15:55:30 info: remote resource transition completed.
heartbeat[20711]: 2009/04/06_15:55:30 info: Initial resource acquisition complete (T_RESOURCES(us))

heartbeat[20752]: 2009/04/06_15:55:30 info: No local resources [/usr/share/heartbeat/ResourceManager

listkeys node2] to acquire.

Forums: