1. Introduction
This document covers the installation/configuration of Red Hat 6.x Cluster software on a VMware virtual Linux VMs environment.
2. Assumptions
1. The base OS has been installed.
2. Networking on the Data/Management Virtual NIC(s) have already been configured.
3. The servers are registered with the satellite server and subscribed to the “RHEL Server High Availability” and “RHEL Server Resilient Storage” channels. (This is required to use yum to install the cluster software.)
4. OS is patched to the current level.
5. There are only 2 nodes in the cluster.
NOTE: A quorum disk is required as a result of only using 2 nodes. (Quorum is required for a third vote in fencing activities.) If building more than a 2 node cluster the quorum disk can be omitted.
6. A fencing account from vShpere (ESX) side has been created for the cluster to use.
7. All commands for the configuration will be run as root.
8. Root login must be enabled but can be disabled after build.
3. System Requirements
This document focuses on building a two node Red Hat Cluster on 2 Red Hat 6.2 Linux VMs.
The server names that will be used are;
Server Name | Version |
apm-bcclprmom01 | Red Hat 6.5 |
apm-bcclprmom02 | Red Hat 6.5 |
Note: You can directly build the Red Hat 6.5 server, or you can build Red Hat 6.1 or 6.0, and then upgrade them to 6.5 from Satellite.
As an example, we will assign 2x100 GB Shared SAN storage on both servers, and then create LVM volume to hold introscope application contents to between two nodes.
Also, we will need 30MB shared storage for the quorum disk.
There are 2 Network Interfaces on each server, one for public, another is for private heartbeat. These are the interfaces that had to be used in our ESX VM configuration.
eth0 Public Data
eth1 Private heartbeat
NOTE: for the current configuration (until today April 1, 2013), we have not configured these two servers to use dedicated heartbeats (eth1), but as we keep the heartbeats NICs(eth1) on both servers, so it’s easy to reconfigure to them to use them at any time if it’s needed.
As an example, we will need below IPs for this cluster configuration, add/configure below IPs on both servers under /etc/hosts file:
# Data IPs:
10.194.18.81 intrbcclqamom03 intrbcclqamom03.bmogc.net
10.194.18.84 intrbcclqamom04 intrbcclqamom04.bmogc.net
# Heartbeat IPs:
10.10.10.11 mom03-node1
10.10.10.12 mom04-node2
# VIPs:
10.194.18.88 intrbcclqahobvp intrbcclqahobvp.bmogc.net
10.194.18.89 intrbcclqahubvp intrbcclqahubvp.bmogc.net
NOTE: as if we’re using the GFS2, so we might not need VIPs, but need to confirm with application team.
We will create 2x100GB GFS2 filesystems and shared on both servers as following:
intrbcclqamom03/04:
/opt/wily/EMHOB
/opt/wily/EMHUB
.
4. Shared SAN Storage Configuration
N/A
5. Install Clustering and Cluster File systems Packages
In order to install all Red Hat Cluster software suit, we need to register the both Linux VM to Satellite server for below three channels:
- RHEL Base
- RHEL Cluster
- RHEL Cluster-Storage
On all servers;
yum groupinstall “High Availability”
yum groupinstall “High Availability Management”
yum groupinstall “Resilient Storage”
Below is a sample of the summary that you will see of the packages and dependencies that will need to be installed. Accept the list and install. (If you receive a message that some packages are not available, you need to have the servers added to the RHEL Cluster and RHEL Cluster-Storage channels on the Satellite server.)
Transaction Summary
===========================================================================
Install XX Package(s)
Total download size: XX.X MB
Is this ok [y/N]: y
On all servers;
getenforce
Disabled ßThe output from the system should be disabled from the image. If not disabled do the following;
grep SELINUX= /etc/selinux/config |grep -v “#”
Ensure that the following line is in the file;
SELINUX=disabled
The cluster services need to run at level 3 (Multi-User Mode with Networking)
Note: level 5 enables the service in Xwindows if it is installed. In most cases it won’t be.
chkconfig --level 01246 cman off
chkconfig --level 01246 clvmd off
chkconfig --level 01246 rgmanager off
chkconfig --level 01246 luci off
chkconfig --level 01246 ricci off
chkconfig --level 01246 gfs2 off
chkconfig --level 35 cman on
chkconfig --level 35 clvmd on
chkconfig --level 35 rgmanager on
chkconfig --level 35 luci on
chkconfig --level 35 ricci on
chkconfig --level 35 gfs2 on
The following services will cause the cluster to malfunction and must be disabled.
Note: iptables are the Linux firewall and acpid is power management.
chkconfig --level 0123456 iptables off
chkconfig --level 0123456 ip6tables off
chkconfig --level 0123456 acpid off
Reboot both servers;
reboot
or
shutdown -r now
Once the servers have rebooted, ensure that the required services have started up (and that others have not);
service cman status (This service may not run until the cluster is built.)
service rgmanager status
service ricci status
service luci status
service clvmd status (This service may not run until the cluster is built.)
service gfs2 status (This service will return GFS2: no entries found in /etc/fstab)
These services should be ‘stopped’;
service iptables status
service ip6tables status
service acpid status
6. Create the Cluster
We will create the cluster using LUCI (Web Interface), and all cluster configurations will be saved in “/etc/cluster/cluster.conf” on both nodes.
.
On the both servers, reset the password for “ricci”:
passwd ricci
Changing password for user ricci.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
Remember this password, so we can use it when we create the cluster from LUCI.
The following URL is to access the LUCI Web Interface on first server, we can login using the root credential.
Type the root credential for the first server, and then we will be able to get in:
Click on “OK”
Click “Manage Clusters à Create” button, and input below info:
Cluster Name: apm-qa-clus1
Node Name: apm-bcclq1mom01
Ricci Hostname: apm-bcclq1mom01
Password: “ricci”s password which we reset it in section 5.1
Select “Use Locally Installed Packages”
Checked “Enable Shared Storage Support”
Click “Create Cluster”, will get below screen:
Click “Nodes à Add” button, and add the second node:
Node Name: intrbcclqamom04
Ricci Hostname: intrbcclqamom04
Password: “ricci”s password which we reset it in section 5.1
Select “Use Locally Installed Packages”
Checked “Enable Shared Storage Support”
Click “Add Nodes” button, will get below screen:
We can also verify the cluster status from command line on any node:
[root@intrbcclqamom03 ~]# clustat
Cluster Status for intr-qa-clus1 @ Thu Mar 21 11:48:35 2013
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
intrbcclqamom03 1 Online, Local
intrbcclqamom04 2 Online
[root@intrbcclqamom03 ~]#
Reboot both nodes to make sure everything is ready for the cluster.
From above screen, select “Nodes” à select both nodes à Click “Reboot” button:
Click “Proceed”
From servers’ side, we can see:
[root@intrbcclqamom03 ~]#
Broadcast message from root@intrbcclqamom03
(unknown) at 11:51 ...
The system is going down for reboot NOW!
[root@intrbcclqamom04 ~]#
Broadcast message from root@intrbcclqamom04
(unknown) at 11:51 ...
The system is going down for reboot NOW!
After the servers are up, we can login back LUCI again, we will see below screen:
7. Configuring Shared Storage on All Servers
NOTE: Perform the following steps on ALL SERVERS to verify that shared disks are available and to identify them on each server.
Note: Perform these steps on both/all servers so that you know what the shared device name is on each one. Check the size of each disk so that you know which to use for application vgs and the quorum disk.
fdisk –l /dev/sdb (will be used for EMHOB disk )
Disk /dev/sdb: 107.4 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
fdisk -l /dev/sdc (will be used for EMHUB disk)
Disk /dev/sdc: 107.4 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
fdisk -l /dev/sdd (will be used for quorum )
Disk /dev/sdd: 31 MB, 31457280 bytes
64 heads, 32 sectors/track, 30 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
We will create below mount points on both servers:
mkdir -p /opt/wily/EM /opt/wily/data
Locking prevents one system from updating a file that is opened by another system preventing file corruption.
In order to enable cluster locking, run below command on both nodes;
lvmconf --enable-cluster
To verify it:
grep locking_type /etc/lvm/lvm.conf | grep -v "#"
locking_type = 3
NOTE: Only perform the following steps on ONE server.
First, we need to format the disk to Linux LVM for the first 100GB, will create for “hobvg”, the command and options are as below:
fdisk /dev/mapper/mpathb à n à p à 1 à enter à enter à t à 8e à w
The output will be as below:
We can Ignore the warning messages:
WARNING: Re-reading the partition table failed with error 22: Invalid argument.
The kernel still uses the old table. The new table will be used at
the next reboot or after you run partprobe(8) or kpartx(8)
Syncing disks.
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0xdada2e1d.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
switch off the mode (command 'c') and change display units to
sectors (command 'u').
Command (m for help): p
Disk /dev/sdb: 107.4 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xdada2e1d
Device Boot Start End Blocks Id System
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-13054, default 1):
Using default value 1
Last cylinder, +cylinders or +size{K,M,G} (1-13054, default 13054):
Using default value 13054
Command (m for help): t
Selected partition 1
Hex code (type L to list codes): 8e
Changed system type of partition 1 to 8e (Linux LVM)
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
Verify the new formatted disk:
fdisk –l |grep mapper
[root@apm-bcclq1mom02 ~]# fdisk -l |grep mapper
Disk /dev/mapper/rootvg-rootlv: 1073 MB, 1073741824 bytes
Disk /dev/mapper/rootvg-swaplv: 1073 MB, 1073741824 bytes
Disk /dev/mapper/mpathb: 214.7 GB, 214748364800 bytes
/dev/mapper/mpathbp1 1 26108 209712478+ 8e Linux LVM
Disk /dev/mapper/systemvg-systemlv: 32.2 GB, 32178700288 bytes
Disk /dev/mapper/rootvg-tmplv: 1073 MB, 1073741824 bytes
Disk /dev/mapper/rootvg-usrlv: 2684 MB, 2684354560 bytes
Disk /dev/mapper/rootvg-homelv: 1073 MB, 1073741824 bytes
Disk /dev/mapper/rootvg-usrlocallv: 536 MB, 536870912 bytes
Disk /dev/mapper/rootvg-optlv: 1073 MB, 1073741824 bytes
Disk /dev/mapper/rootvg-varlv: 2147 MB, 2147483648 bytes
Disk /dev/mapper/rootvg-bmcappslv: 2147 MB, 2147483648 bytes
Disk /dev/mapper/rootvg-crashlv: 2147 MB, 2147483648 bytes
Disk /dev/mapper/mpathbp1: 214.7 GB, 214745577984 bytes
NOTE: Only perform the following steps on ONE server.
First, we need to format the all SAN disks:
pvcreate /dev/mapper/mpathbp1
Physical volume "/dev/mapper/mpathbp1" successfully created
Create the apmvg Volume Group
vgcreate -c y apmvg /dev/mapper/mpathbp1
Clustered volume group "apmvg" successfully created
Note: if you are not able to create the volume, might need to activate apmvg by using “vgchange -cn apmvg” first (on same server), then try to create the volume again.
lvcreate -L 130G -n wilyEMlv apmvg
Logical volume "wilyEMlv" created
lvcreate -L 17G -n wilytraceslv apmvg
Logical volume "wilyEMlv" created
lvcreate -L 33G -n wilydatalv apmvg
Logical volume "wilyEMlv" created
So finally we will have below VGs/LVMs:
pvs
vgs | grep apmvg
[root@apm-bcclq1mom02 ~]# lvs | grep apmvg
wilyEMlv apmvg -wi-a----- 130.00g
wilydatalv apmvg -wi-a----- 33.00g
wilytraceslv apmvg -wi-a----- 17.00g
Note: -j 2 is for the number of journals. In this case 2 are required because it’s a 2 node cluster. If your cluster has more than 2 nodes change this number to suit.
mkfs.gfs2 -t apm-clus1:wilyEMlv -j 2 -p lock_dlm /dev/apmvg/wilyEMlv
The output is as below for the first filesystem, others are similar:
[root@apm-bcclq1mom02 ~]# mkfs.gfs2 -t apm-clus1:wilyEMlv -j 2 -p lock_dlm /dev/apmvg/wilyEMlv
This will destroy any data on /dev/apmvg/wilyEMlv.
It appears to contain: symbolic link to `../dm-13'
Are you sure you want to proceed? [y/n] y
Device: /dev/apmvg/wilyEMlv
Blocksize: 4096
Device Size 130.00 GB (34078720 blocks)
Filesystem Size: 130.00 GB (34078718 blocks)
Journals: 2
Resource Groups: 520
Locking Protocol: "lock_dlm"
Lock Table: "apm-qa-clus1:wilyEMlv"
UUID: d1d9607e-071e-a9de-7834-d9632335c99b
And then run the other 2 commands for other LVs
mkfs.gfs2 -t apm-clus1:wilytraceslv -j 2 -p lock_dlm /dev/apmvg/wilytraceslv
mkfs.gfs2 -t apm-clus1:wilydatalv -j 2 -p lock_dlm /dev/apmvg/wilydatalv
We need to modify the /etc/fstab file, and add below lines at the end:
### GFS2 filesystem:
/dev/mapper/apmvg-wilyEMlv /opt/wily/EM gfs2 defaults,noatime,nodiratime 0 0
/dev/mapper/apmvg-wilytraceslv /opt/wily/EM/traces gfs2 defaults,noatime,nodiratime 0 0
/dev/mapper/apmvg-wilydatalv /opt/wily/data gfs2 defaults,noatime,nodiratime 0 0
- Note: since traces will be mounted on top of EM, it needs to mount /opt/wily/EM first, and then create traces directory (mkdir –p /opt/wily/EM/traces), and then mount /opt/wily/EM/traces.
We need to modify the /opt/tivoli/tsm/client/ba/bin/dsm.sys file, and add below lines at the end:
** GFS2 backup:
VIRTUALMOUNTPOINT /opt/wily/EM
VIRTUALMOUNTPOINT /opt/wily/EM/traces
VIRTUALMOUNTPOINT /opt/wily/data
NOTE: Make sure that you create your quorum on the proper sdX disk!
mkqdisk -c /dev/mapper/mpathc -l apm-quorum2
mkqdisk v3.0.12.1
Writing new quorum disk label 'intrqa-quorum' to /dev/sdd.
WARNING: About to destroy all data on /dev/mapper/mpathc; proceed [N/y] ? y
Initializing status block for node 1...
Initializing status block for node 2...
Initializing status block for node 3...
Initializing status block for node 4...
Initializing status block for node 5...
Initializing status block for node 6...
Initializing status block for node 7...
Initializing status block for node 8...
Initializing status block for node 9...
Initializing status block for node 10...
Initializing status block for node 11...
Initializing status block for node 12...
Initializing status block for node 13...
Initializing status block for node 14...
Initializing status block for node 15...
Initializing status block for node 16...
We might have to scan the new PV/VG/LVM on second cluster nodes, if scan is not working, then we might have to reboot it.
To scan the SAN disks on second server, we might run below commands:
pvscan
PV /dev/sdb1 VG hobvg lvm2 [100.00 GiB / 0 free]
PV /dev/sdc1 VG hubvg lvm2 [100.00 GiB / 0 free]
PV /dev/sda2 VG rootvg lvm2 [24.50 GiB / 8.50 GiB free]
Total: 2 [124.50 GiB] / in use: 2 [124.50 GiB] / in no VG: 0 [0 ]
vgscan
Reading all physical volumes. This may take a while...
Found volume group "hobvg" using metadata type lvm2
Found volume group "hubvg" using metadata type lvm2
Found volume group "rootvg" using metadata type lvm2
lvscan
ACTIVE '/dev/hobvg/hobvglv' [100.00 GiB] inherit
ACTIVE '/dev/hubvg/hubvglv' [100.00 GiB] inherit
ACTIVE '/dev/rootvg/rootlv' [1.00 GiB] inherit
ACTIVE '/dev/rootvg/tmplv' [1.00 GiB] inherit
ACTIVE '/dev/rootvg/usrlv' [2.50 GiB] inherit
ACTIVE '/dev/rootvg/homelv' [1.00 GiB] inherit
ACTIVE '/dev/rootvg/usrlocallv' [512.00 MiB] inherit
ACTIVE '/dev/rootvg/optlv' [1.00 GiB] inherit
ACTIVE '/dev/rootvg/varlv' [2.00 GiB] inherit
ACTIVE '/dev/rootvg/bmcappslv' [2.00 GiB] inherit
ACTIVE '/dev/rootvg/crashlv' [2.00 GiB] inherit
ACTIVE '/dev/rootvg/swaplv' [1.00 GiB] inherit
ACTIVE '/dev/rootvg/installlv' [2.00 GiB] inherit
To verify all PV/VG/LVM, we can run below commands on both nodes:
pvs
PV VG Fmt Attr PSize PFree
/dev/sda2 rootvg lvm2 a-- 24.50g 8.50g
/dev/sdb1 hobvg lvm2 a-- 100.00g 0
/dev/sdc1 hubvg lvm2 a-- 100.00g 0
vgs
VG #PV #LV #SN Attr VSize VFree
hobvg 1 1 0 wz--n- 100.00g 0
hubvg 1 1 0 wz--n- 100.00g 0
rootvg 1 11 0 wz--n- 24.50g 8.50g
lvs
LV VG Attr LSize Pool Origin Data%
hobvglv hobvg -wi-a--- 100.00g
hubvglv hubvg -wi-a--- 100.00g
bmcappslv rootvg -wi-ao-- 2.00g
crashlv rootvg -wi-ao-- 2.00g
homelv rootvg -wi-ao-- 1.00g
installlv rootvg -wi-a--- 2.00g
optlv rootvg -wi-ao-- 1.00g
rootlv rootvg -wi-ao-- 1.00g
swaplv rootvg -wi-ao-- 1.00g
tmplv rootvg -wi-ao-- 1.00g
usrlocallv rootvg -wi-ao-- 512.00m
usrlv rootvg -wi-ao-- 2.50g
varlv rootvg -wi-ao-- 2.00g
We can run the following command on both nodes to check the quorum disk.
mkqdisk -L
mkqdisk v3.0.12.1
/dev/block/8:48:
/dev/disk/by-id/scsi-36000c291f36429c578807c0273a2c759:
/dev/disk/by-id/wwn-0x6000c291f36429c578807c0273a2c759:
/dev/disk/by-path/pci-0000:0b:00.0-scsi-0:0:2:0:
/dev/sdd:
Magic: eb7a62c2
Label: intrqa-quorum2
Created: Tue Mar 19 16:56:14 2013
Host: intrbcclqamom03
Kernel Sector Size: 512
Recorded Sector Size: 512
8. Configure the Cluster
We can use the following URL is to access the LUCI Web Interface on first server with root credential:
Click the “Configure à QDisk” Tab from cluster LUCI web interface, and input below information for quorum disk configuration:
Use a Quorum Disk
By Device Label: apm-quorum2
And click “Apply” button, will get below screen:
From server, we can verify that quorum device was added:
[root@intrbcclqamom03 ~]# clustat
Cluster Status for intr-qa-clus1 @ Thu Mar 21 12:40:19 2013
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
intrbcclqamom03 1 Online, Local
intrbcclqamom04 2 Online
/dev/block/8:48 0 Online, Quorum Disk
[root@intrbcclqamom03 ~]#
We will need to create one Fence Device on each node.
We will use “vmware_soap” fence agent for this VM cluster, as after RHEL 5.7, Red Hat has full production support for fencing off VMware guests from a cluster with vmware_soap fencing agent.
8.2.1 Create fence account from VMware vSphere (VC)
We need to send the request to VMware ESX admin to create a fence account from VMware vSphere (VC), this account should have the perm and ability to shutdown/reboot Linux VMs from VMware vSphere level
We’re using “bmilsrvc” account in this cluster, as it’s already created in Windows AD side.
8.2.2 Add fence account to cluster VM nodes
We’re using “bmilsrvc” account in this cluster, as it’s already created in Windows AD side.
VC42 Ã right click “intrbcclqamom03” Ã Add Permission Ã
Click “Add…” –> Domain: SYSDEV à Show Users First à BMILSrvc à Add.
Click on “OK”
Clicked “Assigned Role” drop down list and select “RHEL Soap Fencing”
Click on “OK”
Repeat the same procedure to add “intrbcclqamom04”.
8.2.3 Get the UUID for each VM cluster node
In order to configure the vmware_soap properly, we have to get the UUID (Universal Unique Identifier number) for each Linux VM.
We can get the UUID by VC center GUI :
Click ESX host from left panel, then click “Virtual Machines” from right panel, right click à View column à check “UUID”
For these two nodes, we got the below UUIDs:
intrbcclqamom03 421d685d-8fe6-b7fd-bd05-6e4ba48a2b1b
intrbcclqamom04 421dad22-76ba-8fba-f02f-6ba29b274be9
8.2.4 Create Fence Device for the cluster
Click the “Fence Devices” Tab from cluster LUCI web interface, and then click “Add” button, in new window “Add Fence Device (instance)” à Select a Fence Device à VMware Fencing (SOAP Interface), and input below information:
Name: v_fence
IP Address or Hostname: 10.193.61.68 (tem-vc42.sysdev.adroot.bmogc.net)
Login: bmilsrvc
Password: ******* (real password for “bmilsrvc”)
Click “Submit” button, then we will get below screen:
8.2.5 Create Fence Devices for first node
Click cluster name “apm-qa-clus1” Ã “Nodes” tab, click first node “apm-bcclq1mom01”, then we will get below screen, click “Add Fence Method”:
Method Name: v_fence_node1
Then click “Submit” button, then we can verify as below screen:
Click “Add Fence Instance” button:
Then click “Submit” button, we can verify it’s added.
8.2.6 Create Fence Devices for second node
Click cluster name “apm-qa-clus1” Ã “Nodes” tab, click second node “apm-bcclq1mom02”, then we will get below screen, click “Add Fence Method”:
Method Name: IPMI_Fence2
Then click “Submit” button, then we can verify as below screen:
Click “Add Fence Instance” button:
Then click “Submit” button, we can verify it’s added.
You can reference 6.3 to reboot both servers from LUCI GUI.
The following URL is to access the LUCI Web Interface on second server:
Type the root credential for the second server, and click the “Manage Clusters à Add” button:
Then input below information to add existing cluster:
Node Hostname: intrbcclqamom03
Password: ricci’s password
Click “Connect” button:
Check “Use the Same Password for All Nodes, then click “Add Cluster” button, then we get the following screen, so we have added this cluster on second server via LUCI successfully.
Click “Manage Clusters à intr-qa-clus1 à Nodes”, then we will see the both cluster nodes are showing as below:
9. To test the Cluster failover
We need to perform all the following steps on both servers, but not at the same time.
We can perform a step on one node and verify the cluster, once the cluster has been verified healthy, and then we can perform the step on the other node. Then move on to the next test.
We can always use “clustat” or the LUCI web interface to verify the cluster state on active node.
Before starting each step, all cluster “Members” should be Online and cluster state active.
On first node - intrbcclqamom03, run:
Cluster Status for intr-qa-clus1 @ Thu Mar 21 14:38:20 2013 Member Status: Quorate
Member Name ID Status ------ ---- ---- ------ intrbcclqamom03 1 Online, Local intrbcclqamom04 2 Online /dev/block/8:48 0 Online, Quorum Disk |
To monitor the cluster as the failover occurs, we can tail the messages to get the real time logs by log on both nodes in a separate window.
tail -f /var/log/messages
To fence the node2 (current active node), run below command on node1:
fence_node intrbcclqamom04
fence intrbcclqamom04 success |
So we will see the second node was fenced (rebooted),
We can verify that both GFS2 volumes are still working fine from another node.
Repeat the “fence_node” test on the remaining node when both nodes are online.
Log into the vShpere VM console for the first node and power it off.
From vShere VM console, select the first cluster node “intrbcclqamom03” à right click à Power à Power Off
The server was powered off, but then it will be powered on automatically by fence id “bmilsrvc”, this is expected.
The vSphere logs are as below:
Power On virtual machine
intrbcclqamom03
Completed
SYSDEV\BMILSrvc
TEM-VC42.sysdev.adroot.bmogc.net
21/03/2013 4:03:49 PM
21/03/2013 4:03:49 PM
21/03/2013 4:03:52 PM
We can verify that both GFS2 volumes are still working fine on another node.
Repeat the “Power Off” test on the remaining node when both nodes are online again.
Log into the vShpere VM console for the first node and shut it down (not powered off).
From vShere VM console, select the first cluster node “intrbcclqamom03” à right click à Power à Shut Down Guest
You can reference 9.1 and use “clustat” and “tail –f /var/log/messages” to watch/verify the real time activities.
The first server was down.
We can verify that both GFS2 volumes are still working fine on another node.
To continue with the next test, both nodes must be cluster members (Online).
Repeat the “Shut Down Guest” test on the remaining node.
On first node - intrbcclqamom03, run:
reboot
or
shutdown -r now
This will initiate an orderly shutdown of the current node.
You can reference 9.1 and use “clustat” and “tail –f /var/log/messages” to watch/verify the real time activities from another node.
We can verify that both GFS2 volumes are still working fine on another node.
To continue with the next test, both nodes must be cluster members (Online).
Repeat the “Reboot” test on the remaining node.
We will do two kind of “Network Down” test, one is for public network, and another one is for heartbeat network.
Disable the public NIC on first node - intrbcclqamom03:
From vShere VM console, select the first cluster node “intrbcclqamom03” à right click à Edit Settings à Select the public NIC (here is “Network adapter 1”) à uncheck “Connected” and “Connect at power on”à OK
Node1(mom03) was fenced, and node2(mom04) is still fictional for all GFS2 filesytesm;
You can reference 9.1 and use “clustat” and “tail –f /var/log/messages” to watch/verify the real time activities from another node.
To continue with the next test, both nodes must be cluster members (Online).
Repeat the “Public NIC Down” test on the remaining node.
Repeated above steps on mom04, it worked fine.