1. Introduction

This document covers the installation/configuration of Red Hat 6.x Cluster software on a VMware virtual Linux VMs environment.

2. Assumptions

1. The base OS has been installed.

2. Networking on the Data/Management Virtual NIC(s) have already been configured.

3. The servers are registered with the satellite server and subscribed to the “RHEL Server High Availability” and “RHEL Server Resilient Storage” channels. (This is required to use yum to install the cluster software.)

4. OS is patched to the current level.

5. There are only 2 nodes in the cluster.

NOTE: A quorum disk is required as a result of only using 2 nodes. (Quorum is required for a third vote in fencing activities.) If building more than a 2 node cluster the quorum disk can be omitted.

6. A fencing account from vShpere (ESX) side has been created for the cluster to use.

7. All commands for the configuration will be run as root.

8. Root login must be enabled but can be disabled after build.

3. System Requirements

3.1 Servers

This document focuses on building a two node Red Hat Cluster on 2 Red Hat 6.2 Linux VMs.

The server names that will be used are;

Server Name	Version
apm-bcclprmom01	Red Hat 6.5
apm-bcclprmom02	Red Hat 6.5

Note: You can directly build the Red Hat 6.5 server, or you can build Red Hat 6.1 or 6.0, and then upgrade them to 6.5 from Satellite.

3.2 SAN Storage

As an example, we will assign 2x100 GB Shared SAN storage on both servers, and then create LVM volume to hold introscope application contents to between two nodes.

Also, we will need 30MB shared storage for the quorum disk.

3.3 Network Interfaces

There are 2 Network Interfaces on each server, one for public, another is for private heartbeat. These are the interfaces that had to be used in our ESX VM configuration.

eth0 Public Data

eth1 Private heartbeat

NOTE: for the current configuration (until today April 1, 2013), we have not configured these two servers to use dedicated heartbeats (eth1), but as we keep the heartbeats NICs(eth1) on both servers, so it’s easy to reconfigure to them to use them at any time if it’s needed.

3.4 IP Addresses

As an example, we will need below IPs for this cluster configuration, add/configure below IPs on both servers under /etc/hosts file:

# Data IPs:

10.194.18.81 intrbcclqamom03 intrbcclqamom03.bmogc.net

10.194.18.84 intrbcclqamom04 intrbcclqamom04.bmogc.net

# Heartbeat IPs:

10.10.10.11 mom03-node1

10.10.10.12 mom04-node2

# VIPs:

10.194.18.88 intrbcclqahobvp intrbcclqahobvp.bmogc.net

10.194.18.89 intrbcclqahubvp intrbcclqahubvp.bmogc.net

NOTE: as if we’re using the GFS2, so we might not need VIPs, but need to confirm with application team.

3.5 Services

We will create 2x100GB GFS2 filesystems and shared on both servers as following:

intrbcclqamom03/04:

/opt/wily/EMHOB

/opt/wily/EMHUB

4. Shared SAN Storage Configuration

N/A

5. Install Clustering and Cluster File systems Packages

5.1 Required Packages

In order to install all Red Hat Cluster software suit, we need to register the both Linux VM to Satellite server for below three channels:

- RHEL Base

- RHEL Cluster

- RHEL Cluster-Storage

5.2 Install Packages

On all servers;

yum groupinstall “High Availability”

yum groupinstall “High Availability Management”

yum groupinstall “Resilient Storage”

Below is a sample of the summary that you will see of the packages and dependencies that will need to be installed. Accept the list and install. (If you receive a message that some packages are not available, you need to have the servers added to the RHEL Cluster and RHEL Cluster-Storage channels on the Satellite server.)

Transaction Summary

===========================================================================

Install XX Package(s)

Total download size: XX.X MB

Is this ok [y/N]: y

5.3 Disable selinux (Unsupported in RedHat Clusters)

On all servers;

getenforce

Disabled ßThe output from the system should be disabled from the image. If not disabled do the following;

grep SELINUX= /etc/selinux/config |grep -v “#”

Ensure that the following line is in the file;

SELINUX=disabled

5.4 Set init Level for Cluster Services (on all servers)

The cluster services need to run at level 3 (Multi-User Mode with Networking)

Note: level 5 enables the service in Xwindows if it is installed. In most cases it won’t be.

chkconfig --level 01246 cman off

chkconfig --level 01246 clvmd off

chkconfig --level 01246 rgmanager off

chkconfig --level 01246 luci off

chkconfig --level 01246 ricci off

chkconfig --level 01246 gfs2 off

chkconfig --level 35 cman on

chkconfig --level 35 clvmd on

chkconfig --level 35 rgmanager on

chkconfig --level 35 luci on

chkconfig --level 35 ricci on

chkconfig --level 35 gfs2 on

5.5 Disable Unsupported Services (on all servers)

The following services will cause the cluster to malfunction and must be disabled.

Note: iptables are the Linux firewall and acpid is power management.

chkconfig --level 0123456 iptables off

chkconfig --level 0123456 ip6tables off

chkconfig --level 0123456 acpid off

Reboot both servers;

reboot

shutdown -r now

5.6 Service init Confirmation (on all servers)

Once the servers have rebooted, ensure that the required services have started up (and that others have not);

service cman status (This service may not run until the cluster is built.)

service rgmanager status

service ricci status

service luci status

service clvmd status (This service may not run until the cluster is built.)

service gfs2 status (This service will return GFS2: no entries found in /etc/fstab)

These services should be ‘stopped’;

service iptables status

service ip6tables status

service acpid status

6. Create the Cluster

We will create the cluster using LUCI (Web Interface), and all cluster configurations will be saved in “/etc/cluster/cluster.conf” on both nodes.

6.1 Reset the password for “ricci” on both servers

On the both servers, reset the password for “ricci”:

passwd ricci

Changing password for user ricci.

New password:

Retype new password:

passwd: all authentication tokens updated successfully.

Remember this password, so we can use it when we create the cluster from LUCI.

6.2 Setup the basic cluster from LUCI on first server

The following URL is to access the LUCI Web Interface on first server, we can login using the root credential.

https://10.194.18.81:8084

Type the root credential for the first server, and then we will be able to get in:

Click on “OK”

Click “Manage Clusters àCreate” button, and input below info:

Cluster Name: apm-qa-clus1

Node Name: apm-bcclq1mom01

Ricci Hostname: apm-bcclq1mom01

Password: “ricci”s password which we reset it in section 5.1

Select “Use Locally Installed Packages”

Checked “Enable Shared Storage Support”

Click “Create Cluster”, will get below screen:

Click “Nodes à Add” button, and add the second node:

Node Name: intrbcclqamom04

Ricci Hostname: intrbcclqamom04

Password: “ricci”s password which we reset it in section 5.1

Select “Use Locally Installed Packages”

Checked “Enable Shared Storage Support”

Click “Add Nodes” button, will get below screen:

We can also verify the cluster status from command line on any node:

[root@intrbcclqamom03 ~]# clustat

Cluster Status for intr-qa-clus1 @ Thu Mar 21 11:48:35 2013

Member Status: Quorate

Member Name ID Status

------ ---- ---- ------

intrbcclqamom03 1 Online, Local

intrbcclqamom04 2 Online

[root@intrbcclqamom03 ~]#

6.3 Reboot both cluster nodes

Reboot both nodes to make sure everything is ready for the cluster.

From above screen, select “Nodes” à select both nodes à Click “Reboot” button:

Click “Proceed”

From servers’ side, we can see:

[root@intrbcclqamom03 ~]#

Broadcast message from root@intrbcclqamom03

(unknown) at 11:51 ...

The system is going down for reboot NOW!

[root@intrbcclqamom04 ~]#

Broadcast message from root@intrbcclqamom04

(unknown) at 11:51 ...

The system is going down for reboot NOW!

After the servers are up, we can login back LUCI again, we will see below screen:

7. Configuring Shared Storage on All Servers

NOTE: Perform the following steps on ALL SERVERS to verify that shared disks are available and to identify them on each server.

7.1 Determine What SAN Disk is Available

Note: Perform these steps on both/all servers so that you know what the shared device name is on each one. Check the size of each disk so that you know which to use for application vgs and the quorum disk.

fdisk –l /dev/sdb (will be used for EMHOB disk )

Disk /dev/sdb: 107.4 GB, 107374182400 bytes

255 heads, 63 sectors/track, 13054 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000

fdisk -l /dev/sdc (will be used for EMHUB disk)

Disk /dev/sdc: 107.4 GB, 107374182400 bytes

255 heads, 63 sectors/track, 13054 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000

fdisk -l /dev/sdd (will be used for quorum )

Disk /dev/sdd: 31 MB, 31457280 bytes

64 heads, 32 sectors/track, 30 cylinders

Units = cylinders of 2048 * 512 = 1048576 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000

7.2 Create Mountpoints (on all servers)

We will create below mount points on both servers:

mkdir -p /opt/wily/EM /opt/wily/data

7.3 Prepare LVM for Cluster Locking (on all servers)

Locking prevents one system from updating a file that is opened by another system preventing file corruption.

In order to enable cluster locking, run below command on both nodes;

lvmconf --enable-cluster

To verify it:

grep locking_type /etc/lvm/lvm.conf | grep -v "#"

locking_type = 3

7.4 Format the shared disks (ON ONE SERVER ONLY)

NOTE: Only perform the following steps on ONE server.

First, we need to format the disk to Linux LVM for the first 100GB, will create for “hobvg”, the command and options are as below:

fdisk /dev/mapper/mpathb à n à p à 1 à enter à enter à t à 8e à w

The output will be as below:

We can Ignore the warning messages:

WARNING: Re-reading the partition table failed with error 22: Invalid argument.

The kernel still uses the old table. The new table will be used at

the next reboot or after you run partprobe(8) or kpartx(8)

Syncing disks.

Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel

Building a new DOS disklabel with disk identifier 0xdada2e1d.

Changes will remain in memory only, until you decide to write them.

After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

WARNING: DOS-compatible mode is deprecated. It's strongly recommended to

switch off the mode (command 'c') and change display units to

sectors (command 'u').

Command (m for help): p

Disk /dev/sdb: 107.4 GB, 107374182400 bytes

255 heads, 63 sectors/track, 13054 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0xdada2e1d

Device Boot Start End Blocks Id System

Command (m for help): n

Command action

e extended

p primary partition (1-4)

Partition number (1-4): 1

First cylinder (1-13054, default 1):

Using default value 1

Last cylinder, +cylinders or +size{K,M,G} (1-13054, default 13054):

Using default value 13054

Command (m for help): t

Selected partition 1

Hex code (type L to list codes): 8e

Changed system type of partition 1 to 8e (Linux LVM)

Command (m for help): w

The partition table has been altered!

Calling ioctl() to re-read partition table.

Syncing disks.

Verify the new formatted disk:

fdisk –l |grep mapper

[root@apm-bcclq1mom02 ~]# fdisk -l |grep mapper

Disk /dev/mapper/rootvg-rootlv: 1073 MB, 1073741824 bytes

Disk /dev/mapper/rootvg-swaplv: 1073 MB, 1073741824 bytes

Disk /dev/mapper/mpathb: 214.7 GB, 214748364800 bytes

/dev/mapper/mpathbp1 1 26108 209712478+ 8e Linux LVM

Disk /dev/mapper/systemvg-systemlv: 32.2 GB, 32178700288 bytes

Disk /dev/mapper/rootvg-tmplv: 1073 MB, 1073741824 bytes

Disk /dev/mapper/rootvg-usrlv: 2684 MB, 2684354560 bytes

Disk /dev/mapper/rootvg-homelv: 1073 MB, 1073741824 bytes

Disk /dev/mapper/rootvg-usrlocallv: 536 MB, 536870912 bytes

Disk /dev/mapper/rootvg-optlv: 1073 MB, 1073741824 bytes

Disk /dev/mapper/rootvg-varlv: 2147 MB, 2147483648 bytes

Disk /dev/mapper/rootvg-bmcappslv: 2147 MB, 2147483648 bytes

Disk /dev/mapper/rootvg-crashlv: 2147 MB, 2147483648 bytes

Disk /dev/mapper/mpathbp1: 214.7 GB, 214745577984 bytes

7.5 Create apmvg Volume Group (ON ONE SERVER ONLY)

NOTE: Only perform the following steps on ONE server.

First, we need to format the all SAN disks:

pvcreate /dev/mapper/mpathbp1

Physical volume "/dev/mapper/mpathbp1" successfully created

Create the apmvg Volume Group

vgcreate -c y apmvg /dev/mapper/mpathbp1

Clustered volume group "apmvg" successfully created

7.6 Create Logical Volumes (ON ONE SERVER ONLY)

Note: if you are not able to create the volume, might need to activate apmvg by using “vgchange -cn apmvg” first (on same server), then try to create the volume again.

lvcreate -L 130G -n wilyEMlv apmvg

Logical volume "wilyEMlv" created

lvcreate -L 17G -n wilytraceslv apmvg

Logical volume "wilyEMlv" created

lvcreate -L 33G -n wilydatalv apmvg

Logical volume "wilyEMlv" created

7.7 Verify all VGs/LVMs (ON ONE SERVER ONLY)

So finally we will have below VGs/LVMs:

pvs

vgs | grep apmvg

[root@apm-bcclq1mom02 ~]# lvs | grep apmvg

wilyEMlv apmvg -wi-a----- 130.00g

wilydatalv apmvg -wi-a----- 33.00g

wilytraceslv apmvg -wi-a----- 17.00g

7.8 Create gfs2 Filesystems (ON ONE SERVER ONLY)

Note: -j 2 is for the number of journals. In this case 2 are required because it’s a 2 node cluster. If your cluster has more than 2 nodes change this number to suit.

mkfs.gfs2 -t apm-clus1:wilyEMlv -j 2 -p lock_dlm /dev/apmvg/wilyEMlv

The output is as below for the first filesystem, others are similar:

[root@apm-bcclq1mom02 ~]# mkfs.gfs2 -t apm-clus1:wilyEMlv -j 2 -p lock_dlm /dev/apmvg/wilyEMlv

This will destroy any data on /dev/apmvg/wilyEMlv.

It appears to contain: symbolic link to `../dm-13'

Are you sure you want to proceed? [y/n] y

Device: /dev/apmvg/wilyEMlv

Blocksize: 4096

Device Size 130.00 GB (34078720 blocks)

Filesystem Size: 130.00 GB (34078718 blocks)

Journals: 2

Resource Groups: 520

Locking Protocol: "lock_dlm"

Lock Table: "apm-qa-clus1:wilyEMlv"

UUID: d1d9607e-071e-a9de-7834-d9632335c99b

And then run the other 2 commands for other LVs

mkfs.gfs2 -t apm-clus1:wilytraceslv -j 2 -p lock_dlm /dev/apmvg/wilytraceslv

mkfs.gfs2 -t apm-clus1:wilydatalv -j 2 -p lock_dlm /dev/apmvg/wilydatalv

7.9 Add GFS2 Workaround to /etc/fstab file (ALL SERVERS)

We need to modify the /etc/fstab file, and add below lines at the end:

### GFS2 filesystem:

/dev/mapper/apmvg-wilyEMlv /opt/wily/EM gfs2 defaults,noatime,nodiratime 0 0

/dev/mapper/apmvg-wilytraceslv /opt/wily/EM/traces gfs2 defaults,noatime,nodiratime 0 0

/dev/mapper/apmvg-wilydatalv /opt/wily/data gfs2 defaults,noatime,nodiratime 0 0

Note: since traces will be mounted on top of EM, it needs to mount /opt/wily/EM first, and then create traces directory (mkdir –p /opt/wily/EM/traces), and then mount /opt/wily/EM/traces.

7.10 Add GFS2 Workaround to TSM Configuration (ALL SERVERS)

We need to modify the /opt/tivoli/tsm/client/ba/bin/dsm.sys file, and add below lines at the end:

** GFS2 backup:

VIRTUALMOUNTPOINT /opt/wily/EM

VIRTUALMOUNTPOINT /opt/wily/EM/traces

VIRTUALMOUNTPOINT /opt/wily/data

7.11 Create Quorum Disk (ON ONE SERVER ONLY)

NOTE: Make sure that you create your quorum on the proper sdX disk!

mkqdisk -c /dev/mapper/mpathc -l apm-quorum2

mkqdisk v3.0.12.1

Writing new quorum disk label 'intrqa-quorum' to /dev/sdd.

WARNING: About to destroy all data on /dev/mapper/mpathc; proceed [N/y] ? y

Initializing status block for node 1...

Initializing status block for node 2...

Initializing status block for node 3...

Initializing status block for node 4...

Initializing status block for node 5...

Initializing status block for node 6...

Initializing status block for node 7...

Initializing status block for node 8...

Initializing status block for node 9...

Initializing status block for node 10...

Initializing status block for node 11...

Initializing status block for node 12...

Initializing status block for node 13...

Initializing status block for node 14...

Initializing status block for node 15...

Initializing status block for node 16...

7.12 Check Filesystem Status (on both servers)

We might have to scan the new PV/VG/LVM on second cluster nodes, if scan is not working, then we might have to reboot it.

To scan the SAN disks on second server, we might run below commands:

pvscan

PV /dev/sdb1 VG hobvg lvm2 [100.00 GiB / 0 free]

PV /dev/sdc1 VG hubvg lvm2 [100.00 GiB / 0 free]

PV /dev/sda2 VG rootvg lvm2 [24.50 GiB / 8.50 GiB free]

Total: 2 [124.50 GiB] / in use: 2 [124.50 GiB] / in no VG: 0 [0 ]

vgscan

Reading all physical volumes. This may take a while...

Found volume group "hobvg" using metadata type lvm2

Found volume group "hubvg" using metadata type lvm2

Found volume group "rootvg" using metadata type lvm2

lvscan

ACTIVE '/dev/hobvg/hobvglv' [100.00 GiB] inherit

ACTIVE '/dev/hubvg/hubvglv' [100.00 GiB] inherit

ACTIVE '/dev/rootvg/rootlv' [1.00 GiB] inherit

ACTIVE '/dev/rootvg/tmplv' [1.00 GiB] inherit

ACTIVE '/dev/rootvg/usrlv' [2.50 GiB] inherit

ACTIVE '/dev/rootvg/homelv' [1.00 GiB] inherit

ACTIVE '/dev/rootvg/usrlocallv' [512.00 MiB] inherit

ACTIVE '/dev/rootvg/optlv' [1.00 GiB] inherit

ACTIVE '/dev/rootvg/varlv' [2.00 GiB] inherit

ACTIVE '/dev/rootvg/bmcappslv' [2.00 GiB] inherit

ACTIVE '/dev/rootvg/crashlv' [2.00 GiB] inherit

ACTIVE '/dev/rootvg/swaplv' [1.00 GiB] inherit

ACTIVE '/dev/rootvg/installlv' [2.00 GiB] inherit

To verify all PV/VG/LVM, we can run below commands on both nodes:

pvs

PV VG Fmt Attr PSize PFree

/dev/sda2 rootvg lvm2 a-- 24.50g 8.50g

/dev/sdb1 hobvg lvm2 a-- 100.00g 0

/dev/sdc1 hubvg lvm2 a-- 100.00g 0

vgs

VG #PV #LV #SN Attr VSize VFree

hobvg 1 1 0 wz--n- 100.00g 0

hubvg 1 1 0 wz--n- 100.00g 0

rootvg 1 11 0 wz--n- 24.50g 8.50g

lvs

LV VG Attr LSize Pool Origin Data%

hobvglv hobvg -wi-a--- 100.00g

hubvglv hubvg -wi-a--- 100.00g

bmcappslv rootvg -wi-ao-- 2.00g

crashlv rootvg -wi-ao-- 2.00g

homelv rootvg -wi-ao-- 1.00g

installlv rootvg -wi-a--- 2.00g

optlv rootvg -wi-ao-- 1.00g

rootlv rootvg -wi-ao-- 1.00g

swaplv rootvg -wi-ao-- 1.00g

tmplv rootvg -wi-ao-- 1.00g

usrlocallv rootvg -wi-ao-- 512.00m

usrlv rootvg -wi-ao-- 2.50g

varlv rootvg -wi-ao-- 2.00g

7.13 Check the Quorum Disk (on both servers)

We can run the following command on both nodes to check the quorum disk.

mkqdisk -L

mkqdisk v3.0.12.1

/dev/block/8:48:

/dev/disk/by-id/scsi-36000c291f36429c578807c0273a2c759:

/dev/disk/by-id/wwn-0x6000c291f36429c578807c0273a2c759:

/dev/disk/by-path/pci-0000:0b:00.0-scsi-0:0:2:0:

/dev/sdd:

Magic: eb7a62c2

Label: intrqa-quorum2

Created: Tue Mar 19 16:56:14 2013

Host: intrbcclqamom03

Kernel Sector Size: 512

Recorded Sector Size: 512

8. Configure the Cluster

We can use the following URL is to access the LUCI Web Interface on first server with root credential:

https://10.194.18.81:8084

8.1 Add the quorum disk to cluster

Click the “Configure à QDisk” Tab from cluster LUCI web interface, and input below information for quorum disk configuration:

Use a Quorum Disk

By Device Label: apm-quorum2

And click “Apply” button, will get below screen:

From server, we can verify that quorum device was added:

[root@intrbcclqamom03 ~]# clustat

Cluster Status for intr-qa-clus1 @ Thu Mar 21 12:40:19 2013

Member Status: Quorate

Member Name ID Status

------ ---- ---- ------

intrbcclqamom03 1 Online, Local

intrbcclqamom04 2 Online

/dev/block/8:48 0 Online, Quorum Disk

[root@intrbcclqamom03 ~]#

8.2 Create/configure Fence Devices for both nodes

We will need to create one Fence Device on each node.

We will use “vmware_soap” fence agent for this VM cluster, as after RHEL 5.7, Red Hat has full production support for fencing off VMware guests from a cluster with vmware_soap fencing agent.

8.2.1 Create fence account from VMware vSphere (VC)

We need to send the request to VMware ESX admin to create a fence account from VMware vSphere (VC), this account should have the perm and ability to shutdown/reboot Linux VMs from VMware vSphere level

We’re using “bmilsrvc” account in this cluster, as it’s already created in Windows AD side.

8.2.2 Add fence account to cluster VM nodes

We’re using “bmilsrvc” account in this cluster, as it’s already created in Windows AD side.

VC42 à right click “intrbcclqamom03” à Add Permission à

Click “Add…” –> Domain: SYSDEV à Show Users First à BMILSrvc à Add.

Click on “OK”

Clicked “Assigned Role” drop down list and select “RHEL Soap Fencing”

Click on “OK”

Repeat the same procedure to add “intrbcclqamom04”.

8.2.3 Get the UUID for each VM cluster node

In order to configure the vmware_soap properly, we have to get the UUID (Universal Unique Identifier number) for each Linux VM.

We can get the UUID by VC center GUI :

Click ESX host from left panel, then click “Virtual Machines” from right panel, right click à View column à check “UUID”

For these two nodes, we got the below UUIDs:

intrbcclqamom03 421d685d-8fe6-b7fd-bd05-6e4ba48a2b1b

intrbcclqamom04 421dad22-76ba-8fba-f02f-6ba29b274be9

8.2.4 Create Fence Device for the cluster

Click the “Fence Devices” Tab from cluster LUCI web interface, and then click “Add” button, in new window “Add Fence Device (instance)” à Select a Fence Device à VMware Fencing (SOAP Interface), and input below information:

Name: v_fence

IP Address or Hostname: 10.193.61.68 (tem-vc42.sysdev.adroot.bmogc.net)

Password: ******* (real password for “bmilsrvc”)

Click “Submit” button, then we will get below screen:

8.2.5 Create Fence Devices for first node

Click cluster name “apm-qa-clus1” à “Nodes” tab, click first node “apm-bcclq1mom01”, then we will get below screen, click “Add Fence Method”:

Method Name: v_fence_node1

Then click “Submit” button, then we can verify as below screen:

Click “Add Fence Instance” button:

Then click “Submit” button, we can verify it’s added.

8.2.6 Create Fence Devices for second node

Click cluster name “apm-qa-clus1” à “Nodes” tab, click second node “apm-bcclq1mom02”, then we will get below screen, click “Add Fence Method”:

Method Name: IPMI_Fence2

Then click “Submit” button, then we can verify as below screen:

Click “Add Fence Instance” button:

Then click “Submit” button, we can verify it’s added.

8.3 Reboot both servers

You can reference 6.3 to reboot both servers from LUCI GUI.

8.4 Configure LUCI on second server (optional)

The following URL is to access the LUCI Web Interface on second server:

https://10.194.18.84:8084

Type the root credential for the second server, and click the “Manage Clusters à Add” button:

Then input below information to add existing cluster:

Node Hostname: intrbcclqamom03

Password: ricci’s password

Click “Connect” button:

Check “Use the Same Password for All Nodes, then click “Add Cluster” button, then we get the following screen, so we have added this cluster on second server via LUCI successfully.

Click “Manage Clusters à intr-qa-clus1 à Nodes”, then we will see the both cluster nodes are showing as below:

9. To test the Cluster failover

We need to perform all the following steps on both servers, but not at the same time.

We can perform a step on one node and verify the cluster, once the cluster has been verified healthy, and then we can perform the step on the other node. Then move on to the next test.

We can always use “clustat” or the LUCI web interface to verify the cluster state on active node.

Before starting each step, all cluster “Members” should be Online and cluster state active.

9.1 “fence_node” test

On first node - intrbcclqamom03, run:

clustat

Cluster Status for intr-qa-clus1 @ Thu Mar 21 14:38:20 2013

Member Status: Quorate

Member Name ID Status

------ ---- ---- ------

intrbcclqamom03 1 Online, Local

intrbcclqamom04 2 Online

/dev/block/8:48 0 Online, Quorum Disk

To monitor the cluster as the failover occurs, we can tail the messages to get the real time logs by log on both nodes in a separate window.

tail -f /var/log/messages

To fence the node2 (current active node), run below command on node1:

fence_node intrbcclqamom04

fence intrbcclqamom04 success

So we will see the second node was fenced (rebooted),

We can verify that both GFS2 volumes are still working fine from another node.

Repeat the “fence_node” test on the remaining node when both nodes are online.

9.2 “Power Off” test

Log into the vShpere VM console for the first node and power it off.

From vShere VM console, select the first cluster node “intrbcclqamom03” à right click à Power à Power Off

The server was powered off, but then it will be powered on automatically by fence id “bmilsrvc”, this is expected.

The vSphere logs are as below:

Power On virtual machine

intrbcclqamom03

Completed

SYSDEV\BMILSrvc

TEM-VC42.sysdev.adroot.bmogc.net

21/03/2013 4:03:49 PM

21/03/2013 4:03:52 PM

We can verify that both GFS2 volumes are still working fine on another node.

Repeat the “Power Off” test on the remaining node when both nodes are online again.

9.3 “Shut Down Guest” test

Log into the vShpere VM console for the first node and shut it down (not powered off).

From vShere VM console, select the first cluster node “intrbcclqamom03” à right click à Power à Shut Down Guest

You can reference 9.1 and use “clustat” and “tail –f /var/log/messages” to watch/verify the real time activities.

The first server was down.

We can verify that both GFS2 volumes are still working fine on another node.

To continue with the next test, both nodes must be cluster members (Online).

Repeat the “Shut Down Guest” test on the remaining node.

9.4 “Reboot” test

On first node - intrbcclqamom03, run:

reboot

shutdown -r now

This will initiate an orderly shutdown of the current node.

You can reference 9.1 and use “clustat” and “tail –f /var/log/messages” to watch/verify the real time activities from another node.

We can verify that both GFS2 volumes are still working fine on another node.

To continue with the next test, both nodes must be cluster members (Online).

Repeat the “Reboot” test on the remaining node.

9.5 “Network Down” test

We will do two kind of “Network Down” test, one is for public network, and another one is for heartbeat network.

Disable the public NIC on first node - intrbcclqamom03:

From vShere VM console, select the first cluster node “intrbcclqamom03” à right click à Edit Settings à Select the public NIC (here is “Network adapter 1”) à uncheck “Connected” and “Connect at power on”à OK

Node1(mom03) was fenced, and node2(mom04) is still fictional for all GFS2 filesytesm;

You can reference 9.1 and use “clustat” and “tail –f /var/log/messages” to watch/verify the real time activities from another node.

To continue with the next test, both nodes must be cluster members (Online).

Repeat the “Public NIC Down” test on the remaining node.

Repeated above steps on mom04, it worked fine.

Red Hat 6.x GFS2 Cluster Build Guide

6.3 Reboot both cluster nodes

7. Configuring Shared Storage on All Servers

8.2.4 Create Fence Device for the cluster

9. To test the Cluster failover

9.1 “fence_node” test

Post a Comment

Contact Form