Red Hat 6.x GFS2 Cluster Build Guide

 

1. Introduction

 

This document covers the installation/configuration of Red Hat 6.x Cluster software on a VMware virtual Linux VMs environment. 

 

2. Assumptions

1.      The base OS has been installed.

2.      Networking on the Data/Management Virtual NIC(s) have already been configured.

3.      The servers are registered with the satellite server and subscribed to the “RHEL Server High Availability” and “RHEL Server Resilient Storage” channels.  (This is required to use yum to install the cluster software.)

4.      OS is patched to the current level.

5.      There are only 2 nodes in the cluster.

NOTE:    A quorum disk is required as a result of only using 2 nodes.  (Quorum is required for a third vote in fencing activities.)  If building more than a 2 node cluster the quorum disk can be omitted.

6.      A fencing account from vShpere (ESX) side has been created for the cluster to use. 

7.      All commands for the configuration will be run as root. 

8.      Root login must be enabled but can be disabled after build.

 

3. System Requirements

 

This document focuses on building a two node Red Hat Cluster on 2 Red Hat 6.2 Linux VMs.  

 

The server names that will be used are;

 

Server Name

Version

apm-bcclprmom01

Red Hat 6.5

apm-bcclprmom02

Red Hat 6.5

 

Note: You can directly build the Red Hat 6.5 server, or you can build Red Hat 6.1 or 6.0, and then upgrade them to 6.5 from Satellite.


 

 

As an example, we will assign 2x100 GB Shared SAN storage on both servers, and then create LVM volume to hold introscope application contents to between two nodes.

 

Also, we will need 30MB shared storage for the quorum disk.

           

There are 2 Network Interfaces on each server, one for public, another is for private heartbeat.  These are the interfaces that had to be used in our ESX VM configuration.

 

eth0                        Public Data

eth1                        Private heartbeat

 

NOTE: for the current configuration (until today April 1, 2013), we have not configured these two servers to use dedicated heartbeats (eth1), but as we keep the heartbeats NICs(eth1) on both servers, so it’s easy to reconfigure to them to use them at any time if it’s needed.

 

 

As an example, we will need below IPs for this cluster configuration, add/configure below IPs on both servers under /etc/hosts file:

 

# Data IPs:

10.194.18.81           intrbcclqamom03  intrbcclqamom03.bmogc.net

10.194.18.84           intrbcclqamom04  intrbcclqamom04.bmogc.net

 

# Heartbeat IPs:

10.10.10.11             mom03-node1

10.10.10.12             mom04-node2

 

# VIPs:

10.194.18.88           intrbcclqahobvp   intrbcclqahobvp.bmogc.net

10.194.18.89           intrbcclqahubvp   intrbcclqahubvp.bmogc.net

 

NOTE: as if we’re using the GFS2, so we might not need VIPs, but need to confirm with application team.

 

 

We will create 2x100GB GFS2 filesystems and shared on both servers as following:

 

intrbcclqamom03/04:

/opt/wily/EMHOB

/opt/wily/EMHUB

.


4. Shared SAN Storage Configuration

 

N/A

 

 

5. Install Clustering and Cluster File systems Packages

 

In order to install all Red Hat Cluster software suit, we need to register the both Linux VM to Satellite server for below three channels:

-         RHEL Base

-         RHEL Cluster

-         RHEL Cluster-Storage

On all servers;

               

            yum groupinstall “High Availability”

            yum groupinstall “High Availability Management”

            yum groupinstall “Resilient Storage”

 

 

Below is a sample of the summary that you will see of the packages and dependencies that will need to be installed.  Accept the list and install.  (If you receive a message that some packages are not available, you need to have the servers added to the RHEL Cluster and RHEL Cluster-Storage channels on the Satellite server.)

 

Transaction Summary

===========================================================================

Install      XX Package(s)

 

Total download size: XX.X MB

Is this ok [y/N]: y

 

On all servers;

 

            getenforce

 

        Disabled  ÃŸThe output from the system should be disabled from the image.  If not disabled do the following;

 

 

          grep SELINUX= /etc/selinux/config |grep -v “#”

 

Ensure that the following line is in the file;

               

          SELINUX=disabled

 

The cluster services need to run at level 3 (Multi-User Mode with Networking)

 

Note: level 5 enables the service in Xwindows if it is installed.  In most cases it won’t be.

 

chkconfig --level 01246 cman off

chkconfig --level 01246 clvmd off

chkconfig --level 01246 rgmanager off

chkconfig --level 01246 luci off

chkconfig --level 01246 ricci off

chkconfig --level 01246 gfs2 off

chkconfig --level 35 cman on

chkconfig --level 35 clvmd on

chkconfig --level 35 rgmanager on

chkconfig --level 35 luci on

chkconfig --level 35 ricci on

chkconfig --level 35 gfs2 on

 

 

The following services will cause the cluster to malfunction and must be disabled.

Note: iptables are the Linux firewall and acpid is power management.

 

            chkconfig --level 0123456 iptables off

            chkconfig --level 0123456 ip6tables off

            chkconfig --level 0123456 acpid off

 

 

Reboot both servers;

 

                reboot

            or

            shutdown -r now

 


 

Once the servers have rebooted, ensure that the required services have started up (and that others have not);

 

service cman status  (This service may not run until the cluster is built.)

service rgmanager status

service ricci status

service luci status

service clvmd status (This service may not run until the cluster is built.)

service gfs2 status (This service will return GFS2: no entries found in /etc/fstab)

 

These services should be ‘stopped’;

 

service iptables status

service ip6tables status

service acpid status

 


6. Create the Cluster

 

We will create the cluster using LUCI (Web Interface), and all cluster configurations will be saved in “/etc/cluster/cluster.conf” on both nodes.

.

 

On the both servers, reset the password for “ricci”:

 

            passwd ricci

 

Changing password for user ricci.

New password:

Retype new password:

passwd: all authentication tokens updated successfully.

 

Remember this password, so we can use it when we create the cluster from LUCI.

 

 

 

The following URL is to access the LUCI Web Interface on first server, we can login using the root credential.

 

https://10.194.18.81:8084

 

 

Type the root credential for the first server, and then we will be able to get in:

 

 

 

 

Click on “OK”

 

 

Click “Manage Clusters àCreate” button, and input below info:

Cluster Name:              apm-qa-clus1

Node Name:                apm-bcclq1mom01

Ricci Hostname:            apm-bcclq1mom01

Password:                    “ricci”s password which we reset it in section 5.1

 

Select “Use Locally Installed Packages”

Checked “Enable Shared Storage Support”

 

 

Click “Create Cluster”, will get below screen:

 

 

 

Click “Nodes à Add” button, and add the second node:

Node Name:                intrbcclqamom04

Ricci Hostname:            intrbcclqamom04

Password:                    “ricci”s password which we reset it in section 5.1

 

Select “Use Locally Installed Packages”

Checked “Enable Shared Storage Support”

 

 

Click “Add Nodes” button, will get below screen:

 

 

We can also verify the cluster status  from command line on any node:

 

[root@intrbcclqamom03 ~]# clustat

Cluster Status for intr-qa-clus1 @ Thu Mar 21 11:48:35 2013

Member Status: Quorate

 

 Member Name                             ID   Status

 ------ ----                             ---- ------

 intrbcclqamom03                             1 Online, Local

 intrbcclqamom04                             2 Online

 

[root@intrbcclqamom03 ~]#

 

6.3    Reboot both cluster nodes

 

Reboot both nodes to make sure everything is ready for the cluster.

           

 

From above screen, select “Nodes” à select both nodes à Click “Reboot” button:

 

 

Click “Proceed”

 

From servers’ side, we can see:

 

[root@intrbcclqamom03 ~]#

Broadcast message from root@intrbcclqamom03

        (unknown) at 11:51 ...

 

The system is going down for reboot NOW!

 

[root@intrbcclqamom04 ~]#

Broadcast message from root@intrbcclqamom04

        (unknown) at 11:51 ...

 

The system is going down for reboot NOW!

 

After the servers are up, we can login back LUCI again, we will see below screen:

 

 


 

7.   Configuring Shared Storage on All Servers

 

NOTE: Perform the following steps on ALL SERVERS to verify that shared disks are available and to identify them on each server.

Note:  Perform these steps on both/all servers so that you know what the shared device name is on each one. Check the size of each disk so that you know which to use for application vgs and the quorum disk.

 

         fdisk –l /dev/sdb (will be used for EMHOB disk )

 

Disk /dev/sdb: 107.4 GB, 107374182400 bytes

255 heads, 63 sectors/track, 13054 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000

 

 

fdisk -l /dev/sdc (will be used for EMHUB disk)

 

Disk /dev/sdc: 107.4 GB, 107374182400 bytes

255 heads, 63 sectors/track, 13054 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000

        

         fdisk -l /dev/sdd (will be used for quorum )

 

Disk /dev/sdd: 31 MB, 31457280 bytes

64 heads, 32 sectors/track, 30 cylinders

Units = cylinders of 2048 * 512 = 1048576 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000

        

 

We will create below mount points on both servers:

 

mkdir -p /opt/wily/EM    /opt/wily/data

 

Locking prevents one system from updating a file that is opened by another system preventing file corruption.

 

In order to enable cluster locking, run below command on both nodes;

 

lvmconf --enable-cluster

 

To verify it:

grep locking_type /etc/lvm/lvm.conf | grep -v "#"

 

            locking_type = 3     

 

 

NOTE:  Only perform the following steps on ONE server.

 

First, we need to format the disk to Linux LVM for the first 100GB, will create for “hobvg”, the command and options are as below:

 

         fdisk /dev/mapper/mpathb à n à p à 1 à enter à enter à t à 8e à w

 

The output will be as below:

We can Ignore the warning messages:

 

WARNING: Re-reading the partition table failed with error 22: Invalid argument.

The kernel still uses the old table. The new table will be used at

the next reboot or after you run partprobe(8) or kpartx(8)

Syncing disks.

 

Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel

Building a new DOS disklabel with disk identifier 0xdada2e1d.

Changes will remain in memory only, until you decide to write them.

After that, of course, the previous content won't be recoverable.

 

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

 

WARNING: DOS-compatible mode is deprecated. It's strongly recommended to

         switch off the mode (command 'c') and change display units to

         sectors (command 'u').

 

Command (m for help): p

 

Disk /dev/sdb: 107.4 GB, 107374182400 bytes

255 heads, 63 sectors/track, 13054 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0xdada2e1d

 

   Device Boot      Start         End      Blocks   Id  System

 

Command (m for help): n

Command action

   e   extended

   p   primary partition (1-4)

p

Partition number (1-4): 1

First cylinder (1-13054, default 1):

Using default value 1

Last cylinder, +cylinders or +size{K,M,G} (1-13054, default 13054):

Using default value 13054

 

Command (m for help): t

Selected partition 1

Hex code (type L to list codes): 8e

Changed system type of partition 1 to 8e (Linux LVM)

 

Command (m for help): w

The partition table has been altered!

 

Calling ioctl() to re-read partition table.

Syncing disks.

 

 

Verify the new formatted disk:

 

         fdisk –l |grep mapper

 

[root@apm-bcclq1mom02 ~]# fdisk -l |grep mapper

Disk /dev/mapper/rootvg-rootlv: 1073 MB, 1073741824 bytes

Disk /dev/mapper/rootvg-swaplv: 1073 MB, 1073741824 bytes

Disk /dev/mapper/mpathb: 214.7 GB, 214748364800 bytes

/dev/mapper/mpathbp1               1       26108   209712478+  8e  Linux LVM

Disk /dev/mapper/systemvg-systemlv: 32.2 GB, 32178700288 bytes

Disk /dev/mapper/rootvg-tmplv: 1073 MB, 1073741824 bytes

Disk /dev/mapper/rootvg-usrlv: 2684 MB, 2684354560 bytes

Disk /dev/mapper/rootvg-homelv: 1073 MB, 1073741824 bytes

Disk /dev/mapper/rootvg-usrlocallv: 536 MB, 536870912 bytes

Disk /dev/mapper/rootvg-optlv: 1073 MB, 1073741824 bytes

Disk /dev/mapper/rootvg-varlv: 2147 MB, 2147483648 bytes

Disk /dev/mapper/rootvg-bmcappslv: 2147 MB, 2147483648 bytes

Disk /dev/mapper/rootvg-crashlv: 2147 MB, 2147483648 bytes

Disk /dev/mapper/mpathbp1: 214.7 GB, 214745577984 bytes

 

 

NOTE:  Only perform the following steps on ONE server.

 

First, we need to format the all SAN disks:

 

         pvcreate /dev/mapper/mpathbp1

 

     Physical volume "/dev/mapper/mpathbp1" successfully created   

 

Create the apmvg Volume Group

 

         vgcreate -c y apmvg /dev/mapper/mpathbp1

 

          Clustered volume group "apmvg" successfully created

 

Note: if you are not able to create the volume, might need to activate apmvg by using “vgchange -cn apmvg” first (on same server), then try to create the volume again.

 

 

lvcreate -L 130G -n wilyEMlv apmvg

          Logical volume "wilyEMlv" created

 

lvcreate -L 17G -n wilytraceslv apmvg

          Logical volume "wilyEMlv" created

 

lvcreate -L 33G -n wilydatalv apmvg

          Logical volume "wilyEMlv" created

 

 

 

 

So finally we will have below VGs/LVMs:

 

pvs

 

 

vgs | grep apmvg

 

 [root@apm-bcclq1mom02 ~]# lvs | grep apmvg

  wilyEMlv     apmvg    -wi-a----- 130.00g

  wilydatalv   apmvg    -wi-a-----  33.00g

  wilytraceslv apmvg    -wi-a-----  17.00g

 

Note: -j 2 is for the number of journals.  In this case 2 are required because it’s a 2 node cluster.  If your cluster has more than 2 nodes change this number to suit.

 

      mkfs.gfs2 -t apm-clus1:wilyEMlv -j 2 -p lock_dlm /dev/apmvg/wilyEMlv

 

The output is as below for the first filesystem, others are similar:

 

[root@apm-bcclq1mom02 ~]# mkfs.gfs2 -t apm-clus1:wilyEMlv -j 2 -p lock_dlm /dev/apmvg/wilyEMlv

This will destroy any data on /dev/apmvg/wilyEMlv.

It appears to contain: symbolic link to `../dm-13'

 

Are you sure you want to proceed? [y/n] y

 

Device:                    /dev/apmvg/wilyEMlv

Blocksize:                 4096

Device Size                130.00 GB (34078720 blocks)

Filesystem Size:           130.00 GB (34078718 blocks)

Journals:                  2

Resource Groups:           520

Locking Protocol:          "lock_dlm"

Lock Table:                "apm-qa-clus1:wilyEMlv"

UUID:                      d1d9607e-071e-a9de-7834-d9632335c99b

 

And then run the other 2 commands for other LVs

 

mkfs.gfs2 -t apm-clus1:wilytraceslv -j 2 -p lock_dlm /dev/apmvg/wilytraceslv

 

mkfs.gfs2 -t apm-clus1:wilydatalv -j 2 -p lock_dlm /dev/apmvg/wilydatalv

 

 


We need to modify the /etc/fstab file, and add below lines at the end:

 

### GFS2 filesystem:

/dev/mapper/apmvg-wilyEMlv /opt/wily/EM      gfs2 defaults,noatime,nodiratime 0 0

/dev/mapper/apmvg-wilytraceslv /opt/wily/EM/traces        gfs2 defaults,noatime,nodiratime 0 0

/dev/mapper/apmvg-wilydatalv /opt/wily/data        gfs2 defaults,noatime,nodiratime 0 0

 

 

  • Note: since traces will be mounted on top of EM, it needs to mount /opt/wily/EM first, and then create traces directory (mkdir –p /opt/wily/EM/traces), and then mount /opt/wily/EM/traces.


We need to modify the /opt/tivoli/tsm/client/ba/bin/dsm.sys file, and add below lines at the end:

 

** GFS2 backup:

VIRTUALMOUNTPOINT              /opt/wily/EM     

VIRTUALMOUNTPOINT              /opt/wily/EM/traces

VIRTUALMOUNTPOINT              /opt/wily/data

 

 

NOTE: Make sure that you create your quorum on the proper sdX disk!

 

mkqdisk -c /dev/mapper/mpathc -l apm-quorum2

 

mkqdisk v3.0.12.1

 

Writing new quorum disk label 'intrqa-quorum' to /dev/sdd.

WARNING: About to destroy all data on /dev/mapper/mpathc; proceed [N/y] ? y

Initializing status block for node 1...

Initializing status block for node 2...

Initializing status block for node 3...

Initializing status block for node 4...

Initializing status block for node 5...

Initializing status block for node 6...

Initializing status block for node 7...

Initializing status block for node 8...

Initializing status block for node 9...

Initializing status block for node 10...

Initializing status block for node 11...

Initializing status block for node 12...

Initializing status block for node 13...

Initializing status block for node 14...

Initializing status block for node 15...

Initializing status block for node 16...

 

We might have to scan the new PV/VG/LVM on second cluster nodes, if scan is not working, then we might have to reboot it.

 

To scan the SAN disks on second server, we might run below commands:

 

         pvscan

 

  PV /dev/sdb1   VG hobvg    lvm2 [100.00 GiB / 0    free]

  PV /dev/sdc1   VG hubvg    lvm2 [100.00 GiB / 0    free]

  PV /dev/sda2   VG rootvg   lvm2 [24.50 GiB / 8.50 GiB free]

  Total: 2 [124.50 GiB] / in use: 2 [124.50 GiB] / in no VG: 0 [0   ]

 

         vgscan

 

  Reading all physical volumes.  This may take a while...

  Found volume group "hobvg" using metadata type lvm2

  Found volume group "hubvg" using metadata type lvm2

  Found volume group "rootvg" using metadata type lvm2

 

         lvscan

  ACTIVE            '/dev/hobvg/hobvglv' [100.00 GiB] inherit

  ACTIVE            '/dev/hubvg/hubvglv' [100.00 GiB] inherit

  ACTIVE            '/dev/rootvg/rootlv' [1.00 GiB] inherit

  ACTIVE            '/dev/rootvg/tmplv' [1.00 GiB] inherit

  ACTIVE            '/dev/rootvg/usrlv' [2.50 GiB] inherit

  ACTIVE            '/dev/rootvg/homelv' [1.00 GiB] inherit

  ACTIVE            '/dev/rootvg/usrlocallv' [512.00 MiB] inherit

  ACTIVE            '/dev/rootvg/optlv' [1.00 GiB] inherit

  ACTIVE            '/dev/rootvg/varlv' [2.00 GiB] inherit

  ACTIVE            '/dev/rootvg/bmcappslv' [2.00 GiB] inherit

  ACTIVE            '/dev/rootvg/crashlv' [2.00 GiB] inherit

  ACTIVE            '/dev/rootvg/swaplv' [1.00 GiB] inherit

  ACTIVE            '/dev/rootvg/installlv' [2.00 GiB] inherit

 

To verify all PV/VG/LVM, we can run below commands on both nodes:

 

         pvs

 

  PV         VG     Fmt  Attr PSize   PFree

  /dev/sda2  rootvg lvm2 a--   24.50g 8.50g

  /dev/sdb1  hobvg  lvm2 a--  100.00g    0

  /dev/sdc1  hubvg  lvm2 a--  100.00g    0

 

         vgs

 

  VG     #PV #LV #SN Attr   VSize  VFree

  hobvg    1   1   0 wz--n- 100.00g    0

  hubvg    1   1   0 wz--n- 100.00g    0

  rootvg   1  11   0 wz--n-  24.50g 8.50g

 

         lvs

                   

  LV         VG     Attr     LSize   Pool Origin Data%

  hobvglv    hobvg  -wi-a--- 100.00g 

  hubvglv    hubvg  -wi-a--- 100.00g                                          

  bmcappslv  rootvg -wi-ao--   2.00g                                          

  crashlv    rootvg -wi-ao--   2.00g                                          

  homelv     rootvg -wi-ao--   1.00g                                          

  installlv  rootvg -wi-a---   2.00g                                          

  optlv      rootvg -wi-ao--   1.00g                                          

  rootlv     rootvg -wi-ao--   1.00g                                          

  swaplv     rootvg -wi-ao--   1.00g                                          

  tmplv      rootvg -wi-ao--   1.00g                                          

  usrlocallv rootvg -wi-ao-- 512.00m                                          

  usrlv      rootvg -wi-ao--   2.50g                                          

  varlv      rootvg -wi-ao--   2.00g      

 

We can run the following command on both nodes to check the quorum disk.

 

mkqdisk -L

mkqdisk v3.0.12.1

 

/dev/block/8:48:

/dev/disk/by-id/scsi-36000c291f36429c578807c0273a2c759:

/dev/disk/by-id/wwn-0x6000c291f36429c578807c0273a2c759:

/dev/disk/by-path/pci-0000:0b:00.0-scsi-0:0:2:0:

/dev/sdd:

        Magic:                eb7a62c2

        Label:                intrqa-quorum2

        Created:              Tue Mar 19 16:56:14 2013

        Host:                 intrbcclqamom03

        Kernel Sector Size:   512

        Recorded Sector Size: 512


 

8. Configure the Cluster

We can use the following URL is to access the LUCI Web Interface on first server with root credential:

 

https://10.194.18.81:8084

 

 

 

 

Click the “Configure à QDisk” Tab from cluster LUCI web interface, and input below information for quorum disk configuration:

 

Use a Quorum Disk

By Device Label: apm-quorum2

 

And click “Apply” button, will get below screen:

 

 

 

 

 

From server, we can verify that quorum device was added:

 

[root@intrbcclqamom03 ~]# clustat

Cluster Status for intr-qa-clus1 @ Thu Mar 21 12:40:19 2013

Member Status: Quorate

 

 Member Name                             ID   Status

 ------ ----                             ---- ------

 intrbcclqamom03                             1 Online, Local

 intrbcclqamom04                             2 Online

 /dev/block/8:48                             0 Online, Quorum Disk

 

[root@intrbcclqamom03 ~]#

 

 

 

We will need to create one Fence Device on each node.

 

We will use “vmware_soap” fence agent for this VM cluster, as after RHEL 5.7, Red Hat has full production support for fencing off VMware guests from a cluster with vmware_soap fencing agent.

 

8.2.1    Create fence account from VMware vSphere (VC)

 

We need to send the request to VMware ESX admin to create a fence account from VMware vSphere (VC), this account should have the perm and ability to shutdown/reboot Linux VMs from VMware vSphere level

 

We’re using “bmilsrvc” account in this cluster, as it’s already created in Windows AD side.

 

8.2.2    Add fence account to cluster VM nodes

 

We’re using “bmilsrvc” account in this cluster, as it’s already created in Windows AD side.

 

VC42 à right click “intrbcclqamom03” à Add Permission à

 

 

Click “Add…” –>  Domain: SYSDEV à Show Users First à BMILSrvc à Add.

 

 

Click on “OK”

 

 

Clicked “Assigned Role” drop down list and select “RHEL Soap Fencing”

 

 

Click on “OK”

 

Repeat the same procedure to add “intrbcclqamom04”.

 

8.2.3    Get the UUID for each VM cluster node

 

In order to configure the vmware_soap properly, we have to get the UUID (Universal Unique Identifier number) for each Linux VM.

 

We can get the UUID by VC center GUI :

 

Click ESX host from left panel, then click “Virtual Machines” from right panel, right click à View column à check “UUID”

 

For these two nodes, we got the below UUIDs:

 

intrbcclqamom03          421d685d-8fe6-b7fd-bd05-6e4ba48a2b1b

intrbcclqamom04          421dad22-76ba-8fba-f02f-6ba29b274be9

 

8.2.4    Create Fence Device for the cluster

 

Click the “Fence Devices” Tab from cluster LUCI web interface, and then click “Add” button, in new window “Add Fence Device (instance)” à Select a Fence Device à VMware Fencing (SOAP Interface), and input below information:

 

Name:                                      v_fence

IP Address or Hostname:          10.193.61.68 (tem-vc42.sysdev.adroot.bmogc.net)

Login:                                       bmilsrvc

Password:                                ******* (real password for “bmilsrvc”)

 

 

 

 

Click “Submit” button, then we will get below screen:

 

 

 

8.2.5    Create Fence Devices for first node

 

Click cluster name “apm-qa-clus1” à “Nodes” tab, click first node “apm-bcclq1mom01”, then we will get below screen, click “Add Fence Method”:

 

 

Method Name: v_fence_node1

Then click “Submit” button, then we can verify as below screen:

 

 

Click “Add Fence Instance” button:

 

 

 

Then click “Submit” button, we can verify it’s added.

 

 

 

 

8.2.6    Create Fence Devices for second node

 

Click cluster name “apm-qa-clus1” à “Nodes” tab, click second node “apm-bcclq1mom02”, then we will get below screen, click “Add Fence Method”:

 

 

Method Name: IPMI_Fence2

Then click “Submit” button, then we can verify as below screen:

 

 

Click “Add Fence Instance” button:

 

 

 

Then click “Submit” button, we can verify it’s added.

           

 

 

 

You can reference 6.3 to reboot both servers from LUCI GUI.

 

 

The following URL is to access the LUCI Web Interface on second server:

 

https://10.194.18.84:8084

 

Type the root credential for the second server, and click the “Manage Clusters à Add” button:

 

 

Then input below information to add existing cluster:

 

Node Hostname:          intrbcclqamom03

Password:                    ricci’s password

 

Click “Connect” button:

 

 

Check “Use the Same Password for All Nodes, then click “Add Cluster” button, then we get the following screen, so we have added this cluster on second server via LUCI successfully.

 

 

Click “Manage Clusters à intr-qa-clus1 à Nodes”, then we will see the both cluster nodes are showing as below:

 

 

 

 


9.   To test the Cluster failover

 

We need to perform all the following steps on both servers, but not at the same time.

 

We can perform a step on one node and verify the cluster, once the cluster has been verified healthy, and then we can perform the step on the other node.  Then move on to the next test. 

 

We can always use clustat” or the LUCI web interface to verify the cluster state on active node.

 

Before starting each step, all cluster “Members” should be Online and cluster state active.

9.1    “fence_node” test

 

On first node - intrbcclqamom03, run:

 

              clustat

 

Cluster Status for intr-qa-clus1 @ Thu Mar 21 14:38:20 2013

Member Status: Quorate

 

 Member Name                             ID   Status

 ------ ----                             ---- ------

 intrbcclqamom03                             1 Online, Local

 intrbcclqamom04                             2 Online

 /dev/block/8:48                             0 Online, Quorum Disk   

 

To monitor the cluster as the failover occurs, we can tail the messages to get the real time logs by log on both nodes in a separate window.

 

tail -f /var/log/messages

 

To fence the node2 (current active node), run below command on node1:

 

fence_node intrbcclqamom04

 

fence intrbcclqamom04 success

 

So we will see the second node was fenced (rebooted),

 

We can verify that both GFS2 volumes are still working fine from another node.

 

Repeat the “fence_node” test on the remaining node when both nodes are online.

 

 

Log into the vShpere VM console for the first node and power it off. 

 

From vShere VM console, select the first cluster node “intrbcclqamom03” à right click à Power à Power Off

 

The server was powered off, but then it will be powered on automatically by fence id “bmilsrvc”, this is expected.

 

The vSphere logs are as below:

 

Power On virtual machine

intrbcclqamom03

Completed

SYSDEV\BMILSrvc

TEM-VC42.sysdev.adroot.bmogc.net

21/03/2013 4:03:49 PM

21/03/2013 4:03:49 PM

21/03/2013 4:03:52 PM

 

We can verify that both GFS2 volumes are still working fine on another node.

 

Repeat the “Power Off” test on the remaining node when both nodes are online again.

 

 

Log into the vShpere VM console for the first node and shut it down (not powered off). 

 

From vShere VM console, select the first cluster node “intrbcclqamom03” à right click à Power à Shut Down Guest

 

You can reference 9.1 and use “clustat” and “tail –f /var/log/messages” to watch/verify the real time activities.

 

The first server was down.

 

We can verify that both GFS2 volumes are still working fine on another node.

 

To continue with the next test, both nodes must be cluster members (Online). 

 

Repeat the “Shut Down Guest” test on the remaining node.

 

 

On first node - intrbcclqamom03, run:

 

            reboot

            or

            shutdown -r now

           

 

This will initiate an orderly shutdown of the current node.

 

You can reference 9.1 and use “clustat” and “tail –f /var/log/messages” to watch/verify the real time activities from another node.

 

We can verify that both GFS2 volumes are still working fine on another node.

 

To continue with the next test, both nodes must be cluster members (Online). 

 

Repeat the “Reboot” test on the remaining node.

 

 

We will do two kind of “Network Down” test, one is for public network, and another one is for heartbeat network.

 

Disable the public NIC on first node - intrbcclqamom03:

 

From vShere VM console, select the first cluster node “intrbcclqamom03” à right click à Edit Settings à Select the public NIC (here is “Network adapter 1”) à uncheck “Connected” and “Connect at power on”à OK

 

Node1(mom03) was fenced, and node2(mom04) is still fictional for all GFS2 filesytesm;

 

You can reference 9.1 and use “clustat” and “tail –f /var/log/messages” to watch/verify the real time activities from another node.

 

To continue with the next test, both nodes must be cluster members (Online). 

 

Repeat the “Public NIC Down” test on the remaining node.

 

Repeated above steps on mom04, it worked fine.

 

Post a Comment

Previous Post Next Post