为RAC私有网络配置网卡Bonding
在RAC的安装部署过程中。并不不过简单的安装完毕了事。整个安装过程要考虑可能出现的单点问题,当中比較重要的是私有网络。
私有网络是RAC节点间通信的通道。包含节点间的网络心跳信息、Cache fusion传递数据块都须要通过私有网络。而非常多的私有网络都不过一块单独的网卡连接上交换机就完毕了,更有甚者,直接使用server间网卡互连的方式配置私有网络。这样的部署方式简单,但RAC投入使用后风险非常大,存在诸多单点如网卡、网线、交换机口、交换机。
差点儿每一个组件发生问题都会导致RAC
split,所以建议为私有网络配置双网卡bonding。
以下是我的配置步骤:
环境:
OS:CentOS release 6.4 (Final)
Oracle:11.2.0.4 RAC
个 em1,em2,em3,em4,当前em1作为公有网卡,em3作为私有网卡已经启用了。em2和em4闲置。
配置bond模块并载入(在2个节点运行):
编辑/etc/modprobe.d/bonding.conf增加内容:
[root@node2 ~]# vi /etc/modprobe.d/bonding.conf
alias bond0 bonding
[root@node2 ~]# modprobe -a bond0
验证:
[root@node2 ~]# lsmod |grep bond
bonding 127331 0
8021q 25317 1 bonding
ipv6 321422 274 bonding,ip6t_REJECT,nf_conntrack_ipv6,nf_defrag_ipv6
编辑网卡配置文件。编辑成例如以下内容:
节点一:
Ifcfg-em2:
DEVICE=em2
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
Ifcfg-em4:
DEVICE=em4
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
Ifcfg-bond0:
DEVICE=bond0
MASTER=yes
BOOTPROTO=node
ONBOOT=yes
BONDING_OPTS="mode=1 miimon=100"
IPADDR=10.10.10.105
PREFIX=24
GATEWAY=10.10.10.1
节点二:
ifcfg-em2:
DEVICE=em2
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
ifcfg-em4:
DEVICE=em4
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
Ifcfg-bond0:
DEVICE=bond0
MASTER=yes
BOOTPROTO=node
ONBOOT=yes
BONDING_OPTS="mode=1 miimon=100"
IPADDR=10.10.10.106
PREFIX=24
GATEWAY=10.10.10.1
我这里使用的是mode=1的主备网卡模式,平时仅仅激活一块网卡,一旦主网卡发生问题,会切换链路到备网卡,其它也能够考虑4,6两种mode。
改动完了配置文件之后。分别在2个节点启动bond0:ifup bond0。
此时能够看到:
[root@node1 ~]# ifconfig
bond0 Link encap:Ethernet HWaddr C8:1F:66:FB:6F:CB
inet addr:10.10.10.105 Bcast:10.10.10.255 Mask:255.255.255.0
inet6 addr: fe80::ca1f:66ff:fefb:6fcb/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
RX packets:9844809 errors:0 dropped:0 overruns:0 frame:0
TX packets:7731078 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:9097132073 (8.4 GiB) TX bytes:6133004979 (5.7 GiB)
em2 Link encap:Ethernet HWaddr C8:1F:66:FB:6F:CB
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:9792915 errors:0 dropped:0 overruns:0 frame:0
TX packets:7731078 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:9088278883 (8.4 GiB) TX bytes:6133004979 (5.7 GiB)
Interrupt:38
em4 Link encap:Ethernet HWaddr C8:1F:66:FB:6F:CB
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:51894 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:8853190 (8.4 MiB) TX bytes:0 (0.0 b)
Interrupt:36
网卡的bonding已经配置成功了。
測试验证
此时能够測试分别断掉em2 em4。在一个节点长ping还有一个节点的私有IP。并结合/proc/net/bonding/bond0的信息观察primary
slave的变化,能够发现当down一个网卡时ping不会中断。
Bond0配置好之后,接下来一步就是把配置成RAC的私有网卡。
为了避免配置失败,首先要备份好原来的配置文件。
以grid用户在2个节点对$GRID_HOME/ grid/gpnp/noden/profiles/peer/profile.xml文件运行备份:
cd /u01/app/11.2.0/grid/gpnp/noden/profiles/peer
cp profile.xml profile.xml.bk
[root@node2 peer]# ls
pending.xml profile_orig.xml profile.xml profile.xml.bk、
查看眼下的私有网络配置:
node2-> oifcfg getif
em1 192.168.10.0 global public
em3 10.10.10.0 global cluster_interconnect
先加入新的私有网络,在任一节点运行就可以:
node1-> oifcfg setif -global bond0/10.10.10.0:cluster_interconnect
这一步在运行时可能会报错:
node1-> oifcfg setif -global bond0/10.10.10.0:cluster_interconnect
PRIF-33: Failed to set or delete interface because hosts could not be discovered
CRS-02307: No GPnP services on requested remote hosts.
PRIF-32: Error in checking for profile availability for host node2
CRS-02306: GPnP service on host "node2" not found.
这是由于gpnpd服务异常导致的。
解决方法:能够Kill掉gpnpd进程。GI会自己主动又一次启动gpnpd服务。
在2个节点运行:
[root@node2 ~]# ps -ef| grep gpnp
grid 4927 1 0 Sep22 ? 00:26:38 /u01/app/11.2.0/grid/bin/gpnpd.bin
grid 48568 46762 0 17:26 pts/3 00:00:00 tail -f /u01/app/11.2.0/grid/log/node2/gpnpd/gpnpd.log
root 48648 48623 0 17:26 pts/4 00:00:00 grep gpnp
[root@node2 ~]# kill -9 4927
[root@node2 ~]#
參考gpnpd.log
加入私有网络之后,我们依照例如以下步骤将原来的私有网络删除:
首先停止并disable掉crs。
以root用户在2个节点分别运行下面命令:
停止crs
crsctl stop crs
禁用crs
crsctl disable crs
改动hosts文件,将私有IP地址改为新地址。
2个节点分别运行:
ping node1-priv
ping node2-priv
再启动crs。
[root@node2 ~]# crsctl enable crs
CRS-4622: Oracle High Availability Services autostart is enabled.
[root@node2 ~]# crsctl start crs
删除原来的私有网络:
node2-> oifcfg delif -global em3/10.10.10.0:cluster_interconnect
检查验证,配置成功了。
node2-> oifcfg getif
em1 192.168.10.0 global public
bond0 10.10.10.0 global cluster_interconnect
node2->
以下做一个測试验证下bonding的效果:
ifdown掉em2,此时messages会出现日志信息:
Oct 25 22:00:32 node1 kernel: bonding: bond0: Removing slave em2
Oct 25 22:00:32 node1 kernel: bonding: bond0: Warning: the permanent HWaddr of em2 - c8:1f:66:fb:6f:cb - is still in use by bond0. Set the HWaddr of em2 to a different address to avoid conflicts.
Oct 25 22:00:32 node1 kernel: bonding: bond0: releasing active interface em2
Oct 25 22:00:32 node1 kernel: bonding: bond0: making interface em4 the new active one.
此时bond0自己主动切换到em4上,所以这个时候ping私有IP是没有问题的。
查看/proc/net/bonding/bond0发现active的slave已经变成em4了:
[root@node1 ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: em4
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: em4
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 3
Permanent HW addr: c8:1f:66:fb:6f:cd
Slave queue ID: 0
[root@node1 ~]#
crsd.log ocssd.log中没有报错信息。css仍然每隔5秒发送一次网络心跳。
2014-10-25 22:00:32.975: [ CSSD][656893696]clssnmSendingThread: sent 5 status msgs to all nodes
2014-10-25 22:00:37.977: [ CSSD][656893696]clssnmSendingThread: sending status msg to all nodes
2014-10-25 22:00:37.977: [ CSSD][656893696]clssnmSendingThread: sent 5 status msgs to all nodes
2014-10-25 22:00:42.978: [ CSSD][656893696]clssnmSendingThread: sending status msg to all nodes
这说明bonding确实做到了保护私有网络的单点故障。
此时再down掉em4:
[root@node1 ~]# ifdown em4
Em4down掉之后,私有IP无法从Node2上ping通了,此时bond0也down掉:
[root@node1 ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: None
MII Status: down
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
此时messages信息:
Oct 25 22:02:23 node1 kernel: bonding: bond0: Removing slave em4
Oct 25 22:02:23 node1 kernel: bonding: bond0: releasing active interface em4
ocssd.log日志显示2秒后css已经检測到了私有网络故障:
2014-10-25 22:02:25.573: [GIPCHGEN][1744828160] gipchaInterfaceFail: marking interface failing 0x7f025c00c0a0 { host '', haName 'c617-7010-b72d-6c39', local (nil), ip '10.10.10.105:46469', subnet '10.10.10.0',
mask '255.255.255.0', mac 'c8-1f-66-fb-6f-cb', ifname 'bond0', numRef 1, numFail 0, idxBoot 0, flags 0xd }
2014-10-25 22:02:25.661: [GIPCHGEN][1951459072] gipchaInterfaceFail: marking interface failing 0x7f025c023d90 { host 'node2', haName 'ba2c-9227-ca29-8a21', local 0x7f025c00c0a0, ip '10.10.10.106:32369', subnet
'10.10.10.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 2, flags 0x6 }
2014-10-25 22:02:27.663: [GIPCHTHR][1951459072] gipchaWorkerCreateInterface: created remote interface for node 'node2', haName 'ba2c-9227-ca29-8a21', inf 'udp://10.10.10.106:32369'
而且发现本server的私有网络连不上了
2014-10-25 22:02:27.572: [GIPCHDEM][536868608] gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0xc99250 [0000000000000010] { gipchaContext : host 'node1', name 'CSS_node-cluster', luid 'e2a491a6-00000000',
numNode 1, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2014-10-25 22:02:31.012: [ CSSD][656893696]clssnmSendingThread: sending status msg to all nodes
2014-10-25 22:02:31.012: [GIPCHALO][669509376] gipchaLowerProcessNode:
no valid interfaces found to node for 5530 ms, node 0x7f0c18065e40 { host 'node2', haName 'CSS_node-cluster', srcLuid e2a491a6-d8e24a48, dstLuid ebce4a7f-2b4e4348 numInf 0, contigSeq 317864, lastAck 317717, lastValidAck 317864,
sendSeq [317718 : 317729], createTime 4294073210, sentRegister 1, localMonitor 1, flags 0x2408 }
2014-10-25 22:02:31.012: [ CSSD][656893696]clssnmSendingThread: sent 5 status msgs to all nodes
2014-10-25 22:02:33.573: [GIPCHDEM][536868608] gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0xc99250 [0000000000000010] { gipchaContext : host 'node1', name 'CSS_node-cluster', luid 'e2a491a6-00000000',
numNode 1, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd
2014-10-25 22:02:36.013: [ CSSD][656893696]clssnmSendingThread: sending status msg to all nodes
2014-10-25 22:02:36.013: [ CSSD][656893696]clssnmSendingThread: sent 5 status msgs to all nodes
2014-10-25 22:02:37.014: [GIPCHALO][669509376] gipchaLowerProcessNode: no valid interfaces found to node for 11530 ms, node 0x7f0c18065e40 { host 'node2', haName 'CSS_node-cluster', srcLuid e2a491a6-d8e24a48,
dstLuid ebce4a7f-2b4e4348 numInf 0, contigSeq 317864, lastAck 317717, lastValidAck 317864, sendSeq [317718 : 317741], createTime 4294073210, sentRegister 1, localMonitor 1, flags 0x2408 }
显示节点二的网络心跳失败,14.34秒之后会将节点二移出集群:
2014-10-25 22:02:39.010: [ CSSD][658470656]clssnmPollingThread: node node2 (2) at 50% heartbeat fatal, removal in 14.340 seconds
2014-10-25 22:02:39.010: [ CSSD][658470656]clssnmPollingThread: node node2 (2) is impending reconfig, flag 2228230, misstime 15660
网络心跳失败容忍多长时间由css中的Misscount决定,RAC默认配置是30秒:
node2-> crsctl get css misscount
CRS-4678: Successful get misscount 30 for Cluster Synchronization Services.
此时開始报has a disk HB, but no network HB:
2014-10-25 22:02:53.012: [ CSSD][664778496]clssnmvDHBValidateNcopy: node 2, node2, has a disk HB, but no network HB, DHB has rcfg 309527290, wrtcnt, 3048511, LATS 212362014, lastSeqNo 3048510, uniqueness
1414032467, timestamp 1414245773/212369384
2014-10-25 22:02:53.192: [ CSSD][667932416]clssnmvDiskPing: Writing with status 0x3, timestamp 1414245773/212362194
2014-10-25 22:02:53.352: [ CSSD][658470656]clssnmPollingThread: Removal started for node node2 (2), flags 0x220006, state 3, wt4c 0
2014-10-25 22:02:53.352: [ CSSD][658470656]clssnmMarkNodeForRemoval: node 2, node2 marked for removal
2014-10-25 22:02:53.352: [ CSSD][658470656]clssnmDiscHelper: node2, node(2) connection failed, endp (0x6b7), probe(0x7f0c00000000), ninf->endp 0x6b7
2014-10-25 22:02:53.352: [ CSSD][658470656]clssnmDiscHelper: node 2 clean up, endp (0x6b7), init state 5, cur state 5
2014-10-25 22:02:53.352: [GIPCXCPT][658470656] gipcInternalDissociate: obj 0x7f0bf00084c0 [00000000000006b7] { gipcEndpoint : localAddr 'gipcha://node1:c5bc-f486-c390-b48', remoteAddr 'gipcha://node2:nm2_node-cluster/b370-8934-efb4-3f2',
numPend 1, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 1, wobj 0x7f0bf000a3f0, sendp (nil)flags 0x38606, usrFlags 0x0 } not associated with any container, ret gipcretFail (1)
@
之后node2被踢出:
2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmNeedConfReq: No configuration to change
2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmDoSyncUpdate: Terminating node 2, node2, misstime(30000) state(5)
2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmDoSyncUpdate: Wait for 0 vote ack(s)
2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmCheckDskInfo: Checking disk info...
2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmCheckSplit: Node 2, node2, is alive, DHB (1414245773, 212369384) more than disk timeout of 27000 after the last NHB (1414245743, 212339734)
2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmCheckDskInfo: My cohort: 1
2014-10-25 22:02:53.353: [ CSSD][891287296]clssgmQueueGrockEvent: groupName(crs_version) count(3) master(0) event(2), incarn 3, mbrc 3, to member 2, events 0x0, state 0x0
2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmRemove: Start
2014-10-25 22:02:53.353: [ CSSD][655316736](:CSSNM00007:)clssnmrRemoveNode: Evicting node 2, node2, from the cluster in incarnation 309527290, node birth incarnation 309527289, death incarnation 309527290,
stateflags 0x224000 uniqueness value 1414032467
2014-10-25 22:02:53.353: [ CSSD][891287296]clssgmQueueGrockEvent: groupName(IGSSZGDBsszgdb) count(2) master(1) event(2), incarn 2, mbrc 2, to member 1, events 0x0, state 0x0
2014-10-25 22:02:53.353: [ default][655316736]kgzf_gen_node_reid2: generated reid cid=6d207e372096ef48ff1031c3298552d5,icin=309527289,nmn=2,lnid=309527289,gid=0,gin=0,gmn=0,umemid=0,opid=0,opsn=0,lvl=node
hdr=0xfece0100
Node2被隔离:
2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmrFenceSage: Fenced node node2, number 2, with EXADATA, handle 0
2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmSendShutdown: req to node 2, kill time 212362354
2014-10-25 22:02:53.353: [ CSSD][891287296]clssgmQueueGrockEvent: groupName(CRF-) count(4) master(0) event(2), incarn 4, mbrc 4, to member 2, events 0x38, state 0x0
2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmsendmsg: not connected to node 2
2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmSendShutdown: Send to node 2 failed
2014-10-25 22:02:53.353: [ CSSD][655316736]clssnmWaitOnEvictions: Start
防止node2写数据文件:
2014-10-25 22:02:53.354: [ CSSD][664778496]clssnmvDiskEvict: Kill block write, file /dev/asm_datafile flags 0x00010004, kill block unique 1414032467, stamp 212362354/212362354
crsd.log显示原来执行的Node2上的资源被切换到node1上执行:
2014-10-25 22:03:00.946: [ CRSPE][870311680]{1:61167:7502} CRS-2676: Start of 'ora.node2.vip' on 'node1' succeeded
2014-10-25 22:03:00.946: [ CRSPE][870311680]{1:61167:7502} CRS-2676: Start of 'ora.node2.vip' on 'node1' succeeded
2014-10-25 22:03:02.358: [ CRSPE][870311680]{1:61167:7502} CRS-2676: Start of 'ora.LISTENER_SCAN2.lsnr' on 'node1' succeeded
而且此时Node2上的crs已经无法通信:
node2-> crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.
将em4 up:
ifup em4
[root@node1 ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: None
MII Status: down
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: em4
MII Status: down
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: c8:1f:66:fb:6f:cd
Slave queue ID: 0
查看messages:
Oct 25 22:03:43 node1 kernel: bonding: bond0: setting mode to active-backup (1).
Oct 25 22:03:43 node1 kernel: bonding: bond0: Setting MII monitoring interval to 100.
Oct 25 22:03:43 node1 kernel: bonding: bond0: Adding slave em4.
Oct 25 22:03:44 node1 kernel: bonding: bond0: enslaving em4 as a backup interface with a down link.
Oct 25 22:03:47 node1 kernel: tg3 0000:02:00.1: em4: Link is up at 1000 Mbps, full duplex
Oct 25 22:03:47 node1 kernel: tg3 0000:02:00.1: em4: Flow control is on for TX and on for RX
Oct 25 22:03:47 node1 kernel: tg3 0000:02:00.1: em4: EEE is disabled
手动启动em4之后,em4被自己主动设置为Slave Interface,由于之前bond0已经被down,所以须要手动启动bond0
ifup bond0
[root@node1 ~]# ifup bond0
[root@node1 ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: em4
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: em4
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: c8:1f:66:fb:6f:cd
Slave queue ID: 0
Slave Interface: em2
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: c8:1f:66:fb:6f:cb
Slave queue ID: 0
此时messages显示bond0已经就绪:
Oct 25 22:05:25 node1 kernel: bonding: bond0: Adding slave em2.
Oct 25 22:05:26 node1 kernel: bonding: bond0: enslaving em2 as a backup interface with a down link.
Oct 25 22:05:26 node1 kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready
Oct 25 22:05:26 node1 kernel: 8021q: adding VLAN 0 to HW filter on device bond0
Oct 25 22:05:26 node1 kernel: bond0: link status definitely up for interface em4, 1000 Mbps full duplex.
Oct 25 22:05:26 node1 kernel: bonding: bond0: making interface em4 the new active one.
Oct 25 22:05:26 node1 kernel: bonding: bond0: first active interface up!
Oct 25 22:05:26 node1 kernel: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
Ocssd.log也显示能够连接到node2的信息:
2014-10-25 22:05:28.026: [GIPCHGEN][536868608] gipchaNodeAddInterface: adding interface information for inf 0x7f0c14222be0 { host '', haName 'CSS_node-cluster', local (nil), ip '10.10.10.105', subnet '10.10.10.0',
mask '255.255.255.0', mac 'c8-1f-66-fb-6f-cd', ifname 'bond0', numRef 0, numFail 0, idxBoot 0, flags 0x1841 }
2014-10-25 22:05:28.254: [GIPCHTHR][669509376] gipchaWorkerCreateInterface: created local interface for node 'node1', haName 'CSS_node-cluster', inf 'udp://10.10.10.105:60625'
2014-10-25 22:05:28.255: [GIPCHTHR][669509376] gipchaWorkerCreateInterface: created local bootstrap multicast interface for node 'node1', haName 'CSS_node-cluster', inf 'mcast://224.0.0.251:42424/10.10.10.105'
2014-10-25 22:05:30.560: [ CSSD][653739776]clssnmSendConnAck: connected to node 2, node2, con (0xa111d5), state 0
2014-10-25 22:05:30.560: [ CSSD][653739776]clssnmCompleteConnProtocol: node node2, 2, uniqueness 1414245779, msg uniqueness 1414245779, endp 0xa111d5 probendp 0x7f0c00000000 endp 0xa111d5
2014-10-25 22:05:31.465: [ CSSD][653739776]clssnmHandleJoin: node 2 JOINING, state 0->1 ninfendp 0x7f0c00a111d5
2014-10-25 22:05:31.940: [ CSSD][664778496]clssnmvReadDskHeartbeat: Reading DHBs to get the latest info for node(2/node2), LATSvalid(0), nodeInfoDHB uniqueness(1414032467)
2014-10-25 22:05:31.940: [ CSSD][664778496]clssnmvDHBValidateNcopy: Setting LATS valid due to uniqueness change for node(node2) number(2), nodeInfoDHB(1414032467), readInfo(1414245779)
2014-10-25 22:05:31.940: [ CSSD][664778496]clssnmvDHBValidateNcopy: Saving DHB uniqueness for node node2, number 2 latestInfo(1414245779), readInfo(1414245779), nodeInfoDHB(1414032467)
2014-10-25 22:05:32.366: [ CSSD][655316736]clssnmDoSyncUpdate: Initiating sync 309527291
2014-10-25 22:05:32.366: [ CSSD][655316736]clssscCompareSwapEventValue: changed NMReconfigInProgress val 1, from -1, changes 7
2014-10-25 22:05:32.366: [ CSSD][655316736]clssnmDoSyncUpdate: local disk timeout set to 200000 ms, remote disk timeout set to 200000
2014-10-25 22:05:32.366: [ CSSD][655316736]clssnmDoSyncUpdate: new values for local disk timeout and remote disk timeout will take effect when the sync is completed.
2014-10-25 22:05:32.366: [ CSSD][655316736]clssnmDoSyncUpdate: Starting cluster reconfig with incarnation 309527291
2014-10-25 22:05:32.366: [ CSSD][655316736]clssnmSetupAckWait: Ack message type (11)
2014-10-25 22:05:32.366: [ CSSD][655316736]clssnmSetupAckWait: node(1) is ALIVE
2014-10-25 22:05:32.366: [ CSSD][655316736]clssnmSetupAckWait: node(2) is ALIVE
2014-10-25 22:05:32.369: [ CSSD][655316736]clssnmDoSyncUpdate: node(2) is transitioning from joining state to active state
2014-10-25 22:05:32.370: [ CSSD][653739776]clssnmHandleUpdate: NODE 1 (node1) IS ACTIVE MEMBER OF CLUSTER
2014-10-25 22:05:32.370: [ CSSD][653739776]clssnmHandleUpdate: NODE 2 (node2) IS ACTIVE MEMBER OF CLUSTER
2014-10-25 22:05:32.370: [ CSSD][891287296]clssgmSuspendAllGrocks: done
crsd.log显示将节点1上执行的节点二资源停止并在node2上又一次启动:
2014-10-25 22:06:05.097: [ CRSPE][870311680]{2:25913:2} CRS-2677: Stop of 'ora.node2.vip' on 'node1' succeeded
2014-10-25 22:06:05.101: [ CRSPE][870311680]{2:25913:2} CRS-2672: Attempting to start 'ora.node2.vip' on 'node2'
此时资源又又一次飘回node2了:
node2-> crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....ER.lsnr ora....er.type ONLINE ONLINE node1
ora....N1.lsnr ora....er.type ONLINE ONLINE node2
ora....N2.lsnr ora....er.type ONLINE ONLINE node1
ora....N3.lsnr ora....er.type ONLINE ONLINE node1
ora.OCR.dg ora....up.type ONLINE ONLINE node1
ora.TEMP.dg ora....up.type ONLINE ONLINE node1
ora.UNDO.dg ora....up.type ONLINE ONLINE node1
ora.asm ora.asm.type ONLINE ONLINE node1
ora.cvu ora.cvu.type ONLINE ONLINE node1
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora....network ora....rk.type ONLINE ONLINE node1
ora....SM1.asm application ONLINE ONLINE node1
ora....E1.lsnr application ONLINE ONLINE node1
ora.node1.gsd application OFFLINE OFFLINE
ora.node1.ons application ONLINE ONLINE node1
ora.node1.vip ora....t1.type ONLINE ONLINE node1
ora....SM2.asm application ONLINE ONLINE node2
ora....E2.lsnr application ONLINE ONLINE node2
ora.node2.gsd application OFFLINE OFFLINE
ora.node2.ons application ONLINE ONLINE node2
ora.node2.vip ora....t1.type ONLINE ONLINE node2
ora.oc4j ora.oc4j.type ONLINE ONLINE node1
ora.ons ora.ons.type ONLINE ONLINE node1
ora.scan1.vip ora....ip.type ONLINE ONLINE node2
ora.scan2.vip ora....ip.type ONLINE ONLINE node1
ora.scan3.vip ora....ip.type ONLINE ONLINE node1
ora.sszgdb.db ora....se.type ONLINE ONLINE node1
我们我们能够将bond0的还有一个slave interface启用起来:
[root@node1 ~]# ifup em2
[root@node1 ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: em4
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: em4
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: c8:1f:66:fb:6f:cd
Slave queue ID: 0
Slave Interface: em2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: c8:1f:66:fb:6f:cb
Slave queue ID: 0
Messages:
Oct 25 22:05:29 node1 kernel: tg3 0000:01:00.1: em2: Link is up at 1000 Mbps, full duplex
Oct 25 22:05:29 node1 kernel: tg3 0000:01:00.1: em2: Flow control is on for TX and on for RX
Oct 25 22:05:29 node1 kernel: tg3 0000:01:00.1: em2: EEE is disabled
Oct 25 22:05:29 node1 kernel: bond0: link status definitely up for interface em2, 1000 Mbps full duplex.
能够看到em2已经就绪,私有网络又恢复到拥有2个网卡保驾护航的状态。
为RAC私有网络配置网卡Bonding的更多相关文章
- linux网络基础-网卡bonding技术
1.bondingbonding(绑定)是一种linux系统下的网卡绑定技术,可以把服务器上n个物理网卡在系统内部抽象(绑定)成一个逻辑上的网卡,实现本地网卡的冗余,带宽扩容和负载均衡.在应用部署中是 ...
- oracle 11g rac R2 for linux change(public,vip)IP ,hostname (oracle 11g rac R2 修改公有,虚拟,私有IP,网卡)
结构如下: 说明: 节点IP,主机名的修改步骤来自ORACLE support文档: 如何修改集群的公网信息(包括 VIP) (文档 ID 1674442.1) 实验环境情况如下: 实验 节点名称 数 ...
- 网卡bonding模式 - bond0、1、4配置
网卡bonding模式 - bond0.1.4配置 网卡bonding简介 网卡绑定就是把多张物理网卡通过软件虚拟成一个虚拟的网卡,配置完毕后,所有的物理网卡的ip和mac将会变成相同的.多网卡同时工 ...
- XenServer网卡Bonding
在给XenServer配置网卡bonding时,需要在所有节点都添加到集群之后再进行,这也是来自Citrix的建议:"Citrix recommends never joining a ho ...
- OpenStack 网络服务 Neutron 私有网络构建(十九)
本章内容基于之前提供者网络构建的基础上进行改动,之前文章参考如下: Openstack 网络服务 Neutron介绍和控制节点部署 (九) Openstack 网络服务 Neutron计算节点部署(十 ...
- 在VMware中配置网卡之NAT模式
为什么要在VMware中配置网卡? 因为在远程连接服务器时,需要虚拟机连接网络 虚拟机网络配置的三种模式:桥接模式,NAT模式,主机模式 NAT模式也称之为网络转换模式,两层路由: 第一层路由:物理机 ...
- centOS7网络配置(nmcli,bonding,网络组)
关于网络接口命名 CentOS 6之前,网络接口使用连续号码命名: eth0. eth1等,当增加或删除网卡时,名称可能会发生变化.CentOS 7使用基于硬件,设备拓扑和设置类型命名. 网卡命名机制 ...
- linux下网卡bonding配置(转)
linux下网卡bonding配置 章节 bonding技术 centos7配置bonding centos6配置bonding 一.bonding技术 bonding(绑定)是一种linux系统 ...
- linux下网卡bonding配置
linux下网卡bonding配置 章节 bonding技术 centos7配置bonding centos6配置bonding 一.bonding技术 bonding(绑定)是一种linux系统 ...
随机推荐
- Android Studio集成crashlytics后无法编译的问题
http://blog.csdn.net/zhuobattle/article/details/50555393 问题描述: 在用fabric集成后编译出现如下错误, Error:Cause: hos ...
- linux 复制文件夹内所有文件到另一个文件夹
cp -Rf /home/user1/* /root/temp/将 /home/user1目录下的所有东西拷到/root/temp/下而不拷贝user1目录本身.即格式为:cp -Rf 原路径/ 目的 ...
- 模拟、字符串--P1042 乒乓球 题解
P1042 乒乓球 字符串string的基本使用 #include <iostream> #include <algorithm> #include <map> # ...
- C语言之static用法
1,static修饰全局变量 限定变量的作用域.被static修饰的全局变量存储域不变,依然存储在静态存储区,即bss段或data段.但作用域发生改变,被static修饰全局变量只能被本文件的函数访问 ...
- Python3.6中文文档 又来推荐一个,之前的Python3.52看得有点懵逼 https://www.rddoc.com/doc/Python/3.6.0/zh/
https://www.rddoc.com/doc/Python/3.6.0/zh/ 大家有空看下
- c++ 十进制转二进制 代码实现
我初中的时候就没搞清楚手动怎么算二进制 写这个代码的时候研究了好久百度 https://jingyan.baidu.com/article/597a0643614568312b5243c0.html ...
- 大数据学习——hive数仓DML和DDL操作
1 创建一个分区表 create table t_partition001(ip string,duration int) partitioned by(country string) row for ...
- 大数据学习——JAVA采集程序
1 需求 从外部购买数据,数据提供方会实时将数据推送到6台FTP服务器上,我方部署6台接口采集机来对接采集数据,并上传到HDFS中 提供商在FTP上生成数据的规则是以小时为单位建立文件夹(2016-0 ...
- 服务器端架构及实战 — C#分享
简介 此文是我2008年读研究生期间做的C#项目技术分享,给计算机专业学生的一些经验分享. 当时工作2年后读研. 计算机基础了解及介绍 了解计算机的核心课程和大家的理解 二进制的历史和原理 数字逻辑及 ...
- POJ3246-Balanced Lineup,好经典的题,做法和HDU-I hate it 一样~~
Balanced Lineup Time Limit: 5000MS Memory Limit: 65536K Case Time Limit: 2000MS Description For ...