Restoring OpenShift Container Platform components





OpenShift Backup/Restore


Backup process is a series of cron jobs that back up the etcds, definition files to a remote system.
It is distributed between the nodes for HA.
On the etcds the following cron entries were made
    0 22 * * * etcdctl backup --data-dir /var/lib/etcd/ --backup-dir /admin/etcd.bak; cp /var/lib/etcd/member/snap/db /admin/etcd.bak/member/snap/db; tar -zcf /admin/e001.tgz /admin/ex.bak; rm -rf /admin/etcd.bak
      
On the masters collect all of the files from the nodes with the following cron entry for each node
    30 22 * * * scp node@node1:/etc/oirgin/node/node-config.yaml /admin/backup/node1/node-config.yaml; scp node@node1:/var/spool/cron/root /admin/backup/node1/root_cron
      
On the masters collect all of the files from the etcds with the following cron entry for each etcd
    29 22 * * * scp etcd@etcd:/etcd/e001.tgz /admin/backup
      
On the masters collect all of the local config files for that master.
    40 22 * * * cp /etc/sysconfig/atomic-openshift-master-api /admin/backup/master/atomic-openshift-master-api
    40 22 * * * cp /etc/sysconfig/atomic-openshift-master-controllers /admin/backup/master/atomic-openshift-master-controllers
    40 22 * * * cp /etc/origin/master/master-config.yaml /admin/backup/master/master-config.yaml
    40 22 * * * cp /etc/origin/node/node-config.yaml /admin/backup/master/node-config.yaml
    40 22 * * * cp /etc/ansible/hosts /admin/backup/master/hosts
    40 22 * * * cp /var/spool/cron/root /admin/backup/master/root_cron
      
For each master
    45 23 * * * scp -r root@othermaster:/admin/backup/othermaster/* /root/backup/othermaster/
      
Create a tarball and send it to the backup server
    50 23 * * * tar -czf /root/m001_`/usr/bin/date "+%Y.%m.%d"`.tgz /root/backup
    55 23 * * * scp /root/m001_* cloud-user@128.31.22.168:/home/cloud-user/; mv /root/m001_* /root/sent
      
Restore process
  1. Create the VMs for the OpenShift cluster
  2. Run the ansible script to install OpenShift
  3. On each master, turn off the master processes.
  4. Update the etcd.
  5. Update the nodes
  6. update the masters
  7. start the masters.

Overview

In OpenShift Container Platform, you can restore your cluster and its components by recreating cluster elements, including nodes and applications, from separate storage.
To restore a cluster, you must first back it up.

The following process describes a generic way of restoring applications and the OpenShift Container Platform cluster. It cannot take into account custom requirements. You might need to take additional actions to restore your cluster.

Restoring a cluster

To restore a cluster, first reinstall OpenShift Container Platform.

Procedure

  1. Reinstall OpenShift Container Platform in the same way that you originally installed OpenShift Container Platform.
  2. Run all of your custom post-installation steps, such as changing services outside of the control of OpenShift Container Platform or installing extra services like monitoring agents.

Restoring a master host backup

After creating a backup of important master host files, if they become corrupted or accidentally removed, you can restore the files by copying the files back to master, ensuring they contain the proper content, and restarting the affected services.

Procedure

  1. Restore the /etc/origin/master/master-config.yaml file:
    # MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)*
    # cp /etc/origin/master/master-config.yaml /etc/origin/master/master-config.yaml.old
    # cp /backup/$(hostname)/$(date +%Y%m%d)/origin/master/master-config.yaml /etc/origin/master/master-config.yaml
    # master-restart api
    # master-restart controllers
    Restarting the master services can lead to downtime. However, you can remove the master host from the highly available load balancer pool, then perform the restore operation. Once the service has been properly restored, you can add the master host back to the load balancer pool.
    Perform a full reboot of the affected instance to restore the iptables configuration.
  2. If you cannot restart OpenShift Container Platform because packages are missing, reinstall the packages.
    1. Get the list of the current installed packages:
      $ rpm -qa | sort > /tmp/current_packages.txt
    2. View the differences between the package lists:
      $ diff /tmp/current_packages.txt ${MYBACKUPDIR}/packages.txt
      
      > ansible-2.4.0.0-5.el7.noarch
    3. Reinstall the missing packages:
      # yum reinstall -y <packages> 
      Replace <packages> with the packages that are different between the package lists.
  3. Restore a system certificate by copying the certificate to the /etc/pki/ca-trust/source/anchors/ directory and execute the update-ca-trust:
    $ MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)*
    $ sudo cp ${MYBACKUPDIR}/external_certificates/my_company.crt /etc/pki/ca-trust/source/anchors/
    $ sudo update-ca-trust
    Always ensure the user ID and group ID are restored when the files are copied back, as well as the SELinux context.

Restoring a node host backup

After creating a backup of important node host files, if they become corrupted or accidentally removed, you can restore the file by copying back the file, ensuring it contains the proper content and restart the affected services.

Procedure

  1. Restore the /etc/origin/node/node-config.yaml file:
    # MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
    # cp /etc/origin/node/node-config.yaml /etc/origin/node/node-config.yaml.old
    # cp /backup/$(hostname)/$(date +%Y%m%d)/etc/origin/node/node-config.yaml /etc/origin/node/node-config.yaml
    # reboot
Restarting the services can lead to downtime. See Node maintenance, for tips on how to ease the process.
Perform a full reboot of the affected instance to restore the iptables configuration.
  1. If you cannot restart OpenShift Container Platform because packages are missing, reinstall the packages.
    1. Get the list of the current installed packages:
      $ rpm -qa | sort > /tmp/current_packages.txt
    2. View the differences between the package lists:
      $ diff /tmp/current_packages.txt ${MYBACKUPDIR}/packages.txt
      
      > ansible-2.4.0.0-5.el7.noarch
    3. Reinstall the missing packages:
      # yum reinstall -y <packages> 
      Replace <packages> with the packages that are different between the package lists.
  2. Restore a system certificate by copying the certificate to the /etc/pki/ca-trust/source/anchors/ directory and execute the update-ca-trust:
    $ MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)*
    $ sudo cp ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors/my_company.crt /etc/pki/ca-trust/source/anchors/
    $ sudo update-ca-trust
    Always ensure proper user ID and group ID are restored when the files are copied back, as well as the SELinux context.
The web console runs as a pod on the master. The static assets required to run the web console are served by the pod. Administrators can also customize the web console using extensions, which let you run scripts and load custom stylesheets when the web console loads.
When you access the web console from a browser, it first loads all required static assets. It then makes requests to the OpenShift Container Platform APIs using the values defined from the openshift start option --public-master, or from the related parameter masterPublicURL in the webconsole-config config map defined in the openshift-web-console namespace. The web console uses WebSockets to maintain a persistent connection with the API server and receive updated information as soon as it is available.
Web Console Request Architecture
Figure 1. Web Console Request Architecture
The configured host names and IP addresses for the web console are whitelisted to access the API server safely even when the browser would consider the requests to be cross-origin. To access the API server from a web application using a different host name, you must whitelist that host name by specifying the --cors-allowed-origins option on openshift start or from the related master configuration file parameter corsAllowedOrigins.
Skip to end of metadata

Description

Restoring etcd v3 cluster when it doesn't start with "open wal error: wal: file not found" error.

Limits and bounds

Openshift 3.11 and etcd cluster with one member.

If your cluster has been upgraded from Openshift 3.7 and less which use etcd v2, then it will have etcd data v2 combined with etcd data v3 and process of recover will include more steps.
Official documentation for reference:

Pre-checks

Check etcd status and logs.
We need etcdctl3 tool. If it doesn't exist, please follow below steps:
To create a backup (a snapshot) of the current status of your cluster, first download the new version of etcdctl from the website:
wget https://github.com/coreos/etcd/releases/download/v3.2.0/etcd-v3.2.0-linux-amd64.tar.gz
tar xvf etcd-v3.2.0-linux-amd64.tar.gz
Once untarred, the folder will contain the new version of the etcdctl executable. To create a snapshot, run the following command:
ETCDCTL_API=3 ./etcdctl snapshot save snapshot.db --cacert /etc/ssl/etcd/ca.crt --cert /etc/ssl/etcd/client.crt --key /etc/ssl/etcd/client.key
This will create a snapshot.db file in the current directory. Do not omit the ETCDCTL_API environment variable, it defines the version of the API etcdctl will use to connect to the ETCD server.

Check status and logs
etcdctl3 endpoint status
docker ps -a | grep etcd #Check status and get CONTAINER ID
docker logs <CONTAINER ID>

Actions

If etcd is located on the master host.
Stop working containers
mkdir /etc/origin/node/pods-stopped<CURRENT DATE>
mv /etc/origin/node/pods//etc/origin/node/pods-stopped<CURRENT DATE>
docker stop <api and etcd containers if any>
docker rm <api and etcd containers if any>
systemctl stop origin-node
Backup etcd configuration and database:
mkdir /root/etcd-conf<CURRENT DATE>
cp -R /etc/etcd /root/etcd-conf/
cp -R /var/lib/etcd/member/snap/db /root/etcd-backup

Be sure that all mentioned files have been copied correctly. Double check sizes in the source and destination folders.
Remove current data-dir:
rm -rf /var/lib/etcd
Restore snapshot:
etcdctl3 snapshot restore /root/etcd-backup/db #what we are going to restore
--skip-hash-check=true #This option should be added if we use copied database
--name master-1 \ # get this parameter from etcd.conf file
--data-dir /var/lib/etcd # get this parameter from etcd.conf file
--initial-cluster "master-1=https://192.168.0.13:2380" # get this parameter from etcd.conf file
--initial-cluster-token "etcd-cluster-1" # get this parameter from etcd.conf file
--initial-advertise-peer-urls https://192.168.0.13:2380 # get this parameter from etcd.conf file

etcdctl3 snapshot restore /root/etcd-backup/db --skip-hash-check=true --name master-1 --data-dir /var/lib/etcd --initial-cluster "master-1=https://192.168.0.13:2380" --initial-cluster-token "etcd-cluster-1" --initial-advertise-peer-urls https://192.168.0.13:2380
mv /etc/origin/node/pods-stopped<CURRENT DATE>/etcd.yaml /etc/origin/node/pods/
systemctl start origin-node
#To ensure that etcd is started
docker ps
docker logs <CONTAINER ID>
#if etcd is running correctly and we need only one node etcd cluster then
mv /etc/origin/node/pods-stopped<CURRENT DATE>/* /etc/origin/node/pods/
#else
continue with adding etcd nodes

Note

Make sure those files are there in/etc/origin/node/pods when you start the origin-node.
  • etcd.yaml
  • controller.yaml
  • apiserver.yaml

Restoring etcd

The restore procedure for etcd configuration files replaces the appropriate files, then restarts the service or static pod.
If an etcd host has become corrupted and the /etc/etcd/etcd.conf file is lost, restore it using:
$ ssh master-0
# cp /backup/yesterday/master-0-files/etcd.conf /etc/etcd/etcd.conf
# restorecon -RvF /etc/etcd/etcd.conf
In this example, the backup file is stored in the /backup/yesterday/master-0-files/etcd.conf path where it can be used as an external NFS share, S3 bucket, or other storage solution.
If you run etcd as a static pod, follow only the steps in that section. If you run etcd as a separate service on either master or standalone nodes, follow the steps to restore data as required.

Restoring etcd data

The following process restores healthy data files and starts the etcd cluster as a single node, then adds the rest of the nodes if an etcd cluster is required.

PROCEDURE

  1. Stop all etcd services by removing the etcd pod definition and rebooting the host:
    # mkdir -p /etc/origin/node/pods-stopped
    # mv /etc/origin/node/pods/* /etc/origin/node/pods-stopped/
    # reboot
  2. To ensure the proper backup is restored, delete the etcd directories:
    • To back up the current etcd data before you delete the directory, run the following command:
      # mv /var/lib/etcd /var/lib/etcd.old
      # mkdir /var/lib/etcd
      # restorecon -RvF /var/lib/etcd/
    • Or, to delete the directory and the etcd, data, run the following command:
      # rm -Rf /var/lib/etcd/*
      In an all-in-one cluster, the etcd data directory is located in the /var/lib/origin/openshift.local.etcd directory.
  3. Restore a healthy backup data file to each of the etcd nodes. Perform this step on all etcd hosts, including master hosts collocated with etcd.
    # cp -R /backup/etcd-xxx/* /var/lib/etcd/
    # mv /var/lib/etcd/db /var/lib/etcd/member/snap/db
    # chcon -R --reference /backup/etcd-xxx/* /var/lib/etcd/
  4. Run the etcd service on one of your etcd hosts, forcing a new cluster.
    This creates a custom file for the etcd service, which overwrites the execution command adding the --force-new-cluster option:
    # mkdir -p /etc/systemd/system/etcd.service.d/
    # echo "[Service]" > /etc/systemd/system/etcd.service.d/temp.conf
    # echo "ExecStart=" >> /etc/systemd/system/etcd.service.d/temp.conf
    # sed -n '/ExecStart/s/"$/ --force-new-cluster"/p' \
        /usr/lib/systemd/system/etcd.service \
        >> /etc/systemd/system/etcd.service.d/temp.conf
    
    # systemctl daemon-reload
    # master-restart etcd
  5. Check for error messages:
    # master-logs etcd etcd
  6. Check for health status:
    # etcdctl3 cluster-health
    member 5ee217d17301 is healthy: got healthy result from https://192.168.55.8:2379
    cluster is healthy
  7. Restart the etcd service in cluster mode:
    # rm -f /etc/systemd/system/etcd.service.d/temp.conf
    # systemctl daemon-reload
    # master-restart etcd
  8. Check for health status and member list:
    # etcdctl3 cluster-health
    member 5ee217d17301 is healthy: got healthy result from https://192.168.55.8:2379
    cluster is healthy
    
    # etcdctl3 member list
    5ee217d17301: name=master-0.example.com peerURLs=http://localhost:2380 clientURLs=https://192.168.55.8:2379 isLeader=true
  9. After the first instance is running, you can add the remaining peers back into the cluster.

FIX THE PEERURLS PARAMETER

After restoring the data and creating a new cluster, the peerURLs parameter shows localhost instead of the IP where etcd is listening for peer communication:
# etcdctl3 member list
5ee217d17301: name=master-0.example.com peerURLs=http://*localhost*:2380 clientURLs=https://192.168.55.8:2379 isLeader=true
Procedure
  1. Get the member ID using etcdctl member list:
    `etcdctl member list`
  2. Get the IP where etcd listens for peer communication:
    $ ss -l4n | grep 2380
  3. Update the member information with that IP:
    # etcdctl3 member update 5ee217d17301 https://192.168.55.8:2380
    Updated member with ID 5ee217d17301 in cluster
  4. To verify, check that the IP is in the member list:
    $ etcdctl3 member list
    5ee217d17301: name=master-0.example.com peerURLs=https://*192.168.55.8*:2380 clientURLs=https://192.168.55.8:2379 isLeader=true

Restoring etcd snapshot

Snapshot integrity may be optionally verified at restore time. If the snapshot is taken with etcdctl snapshot save, it will have an integrity hash that is checked by etcdctl snapshot restore. If the snapshot is copied from the data directory, there is no integrity hash and it will only restore by using --skip-hash-check.
The procedure to restore the data must be performed on a single etcd host. You can then add the rest of the nodes to the cluster.

PROCEDURE

  1. Stop all etcd services by removing the etcd pod definition and rebooting the host:
    # mkdir -p /etc/origin/node/pods-stopped
    # mv /etc/origin/node/pods/* /etc/origin/node/pods-stopped/
    # reboot
  2. Clear all old data, because etcdctl recreates it in the node where the restore procedure is going to be performed:
    # rm -Rf /var/lib/etcd
  3. Run the snapshot restore command, substituting the values from the /etc/etcd/etcd.conf file:
    # etcdctl3 snapshot restore /backup/etcd-xxxxxx/backup.db \
      --data-dir /var/lib/etcd \
      --name master-0.example.com \
      --initial-cluster "master-0.example.com=https://192.168.55.8:2380" \
      --initial-cluster-token "etcd-cluster-1" \
      --initial-advertise-peer-urls https://192.168.55.8:2380 \
      --skip-hash-check=true
    
    2017-10-03 08:55:32.440779 I | mvcc: restore compact to 1041269
    2017-10-03 08:55:32.468244 I | etcdserver/membership: added member 40bef1f6c79b3163 [https://192.168.55.8:2380] to cluster 26841ebcf610583c
  4. Restore permissions and selinux context to the restored files:
    # restorecon -RvF /var/lib/etcd
  5. Start the etcd service:
    # systemctl start etcd
  6. Check for any error messages:
    # master-logs etcd etcd

Restoring etcd on a static pod

Before restoring etcd on a static pod:
  • etcdctl binaries must be available or, in containerized installations, the rhel7/etcd container must be available.
    You can install the etcdctl binary with the etcd package by running the following commands:
    # yum install etcd
    The package also installs the systemd service. Disable and mask the service so that it does not run as a systemd service when etcd runs in static pod. By disabling and masking the service, you ensure that you do not accidentally start it and prevent it from automatically restarting when you reboot the system.
    # systemctl disable etcd.service
    # systemctl mask etcd.service
To restore etcd on a static pod:
  1. If the pod is running, stop the etcd pod by moving the pod manifest YAML file to another directory:
    # mkdir -p /etc/origin/node/pods-stopped
    # mv /etc/origin/node/pods/etcd.yaml /etc/origin/node/pods-stopped
  2. Clear all old data:
    # rm -rf /var/lib/etcd
    You use the etcdctl to recreate the data in the node where you restore the pod.
  3. Restore the etcd snapshot to the mount path for the etcd pod:
    # export ETCDCTL_API=3
    # etcdctl snapshot restore /etc/etcd/backup/etcd/snapshot.db
      --data-dir /var/lib/etcd/
      --name ip-172-18-3-48.ec2.internal
      --initial-cluster "ip-172-18-3-48.ec2.internal=https://172.18.3.48:2380"
      --initial-cluster-token "etcd-cluster-1"
      --initial-advertise-peer-urls https://172.18.3.48:2380
      --skip-hash-check=true
    Obtain the values for your cluster from the $/backup_files/etcd.conf file.
  4. Set required permissions and selinux context on the data directory:
    # restorecon -RvF /var/lib/etcd/
  5. Restart the etcd pod by moving the pod manifest YAML file to the required directory:
    # mv /etc/origin/node/pods-stopped/etcd.yaml /etc/origin/node/pods/

Adding an etcd node

After you restore etcd, you can add more etcd nodes to the cluster. You can either add an etcd host by using an Ansible playbook or by manual steps.

Adding a new etcd host using Ansible

PROCEDURE

  1. In the Ansible inventory file, create a new group named [new_etcd] and add the new host. Then, add the new_etcd group as a child of the [OSEv3] group:
    [OSEv3:children]
    masters
    nodes
    etcd
    new_etcd 
    
    ... [OUTPUT ABBREVIATED] ...
    
    [etcd]
    master-0.example.com
    master-1.example.com
    master-2.example.com
    
    [new_etcd] 
    etcd0.example.com 
    Add these lines.
  2. From the host that installed OpenShift Container Platform and hosts the Ansible inventory file, change to the playbook directory and run the etcd scaleup playbook:
    $ cd /usr/share/ansible/openshift-ansible
    $ ansible-playbook  playbooks/openshift-etcd/scaleup.yml
  3. After the playbook runs, modify the inventory file to reflect the current status by moving the new etcd host from the [new_etcd] group to the [etcd] group:
    [OSEv3:children]
    masters
    nodes
    etcd
    new_etcd
    
    ... [OUTPUT ABBREVIATED] ...
    
    [etcd]
    master-0.example.com
    master-1.example.com
    master-2.example.com
    etcd0.example.com
  4. If you use Flannel, modify the flanneld service configuration on every OpenShift Container Platform host, located at /etc/sysconfig/flanneld, to include the new etcd host:
    FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379,https://etcd0.example.com:2379
  5. Restart the flanneld service:
    # systemctl restart flanneld.service

Manually adding a new etcd host

If you do not run etcd as static pods on master nodes, you might need to add another etcd host.

PROCEDURE

Modify the current etcd cluster
To create the etcd certificates, run the openssl command, replacing the values with those from your environment.
  1. Create some environment variables:
    export NEW_ETCD_HOSTNAME="*etcd0.example.com*"
    export NEW_ETCD_IP="192.168.55.21"
    
    export CN=$NEW_ETCD_HOSTNAME
    export SAN="IP:${NEW_ETCD_IP}, DNS:${NEW_ETCD_HOSTNAME}"
    export PREFIX="/etc/etcd/generated_certs/etcd-$CN/"
    export OPENSSLCFG="/etc/etcd/ca/openssl.cnf"
    The custom openssl extensions used as etcd_v3_ca_* include the $SAN environment variable as subjectAltName. See /etc/etcd/ca/openssl.cnf for more information.
  2. Create the directory to store the configuration and certificates:
    # mkdir -p ${PREFIX}
  3. Create the server certificate request and sign it: (server.csr and server.crt)
    # openssl req -new -config ${OPENSSLCFG} \
        -keyout ${PREFIX}server.key  \
        -out ${PREFIX}server.csr \
        -reqexts etcd_v3_req -batch -nodes \
        -subj /CN=$CN
    
    # openssl ca -name etcd_ca -config ${OPENSSLCFG} \
        -out ${PREFIX}server.crt \
        -in ${PREFIX}server.csr \
        -extensions etcd_v3_ca_server -batch
  4. Create the peer certificate request and sign it: (peer.csr and peer.crt)
    # openssl req -new -config ${OPENSSLCFG} \
        -keyout ${PREFIX}peer.key \
        -out ${PREFIX}peer.csr \
        -reqexts etcd_v3_req -batch -nodes \
        -subj /CN=$CN
    
    # openssl ca -name etcd_ca -config ${OPENSSLCFG} \
      -out ${PREFIX}peer.crt \
      -in ${PREFIX}peer.csr \
      -extensions etcd_v3_ca_peer -batch
  5. Copy the current etcd configuration and ca.crt files from the current node as examples to modify later:
    # cp /etc/etcd/etcd.conf ${PREFIX}
    # cp /etc/etcd/ca.crt ${PREFIX}
  6. While still on the surviving etcd host, add the new host to the cluster. To add additional etcd members to the cluster, you must first adjust the default localhost peer in the peerURLs value for the first member:
    1. Get the member ID for the first member using the member list command:
      # etcdctl --cert-file=/etc/etcd/peer.crt \
          --key-file=/etc/etcd/peer.key \
          --ca-file=/etc/etcd/ca.crt \
          --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \ 
          member list
      Ensure that you specify the URLs of only active etcd members in the --peers parameter value.
    2. Obtain the IP address where etcd listens for cluster peers:
      $ ss -l4n | grep 2380
    3. Update the value of peerURLs using the etcdctl member update command by passing the member ID and IP address obtained from the previous steps:
      # etcdctl --cert-file=/etc/etcd/peer.crt \
          --key-file=/etc/etcd/peer.key \
          --ca-file=/etc/etcd/ca.crt \
          --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \
          member update 511b7fb6cc0001 https://172.18.1.18:2380
    4. Re-run the member list command and ensure the peer URLs no longer include localhost.
  7. Add the new host to the etcd cluster. Note that the new host is not yet configured, so the status stays as unstarted until the you configure the new host.
    You must add each member and bring it online one at a time. When you add each additional member to the cluster, you must adjust the peerURLs list for the current peers. The peerURLs list grows by one for each member added. The etcdctl member add command outputs the values that you must set in the etcd.conf file as you add each member, as described in the following instructions.
    # etcdctl -C https://${CURRENT_ETCD_HOST}:2379 \
      --ca-file=/etc/etcd/ca.crt     \
      --cert-file=/etc/etcd/peer.crt     \
      --key-file=/etc/etcd/peer.key member add ${NEW_ETCD_HOSTNAME} https://${NEW_ETCD_IP}:2380 
    
    Added member named 10.3.9.222 with ID 4e1db163a21d7651 to cluster
    
    ETCD_NAME="<NEW_ETCD_HOSTNAME>"
    ETCD_INITIAL_CLUSTER="<NEW_ETCD_HOSTNAME>=https://<NEW_HOST_IP>:2380,<CLUSTERMEMBER1_NAME>=https:/<CLUSTERMEMBER2_IP>:2380,<CLUSTERMEMBER2_NAME>=https:/<CLUSTERMEMBER2_IP>:2380,<CLUSTERMEMBER3_NAME>=https:/<CLUSTERMEMBER3_IP>:2380"
    ETCD_INITIAL_CLUSTER_STATE="existing"
    In this line, 10.3.9.222 is a label for the etcd member. You can specify the host name, IP address, or a simple name.
  8. Update the sample ${PREFIX}/etcd.conf file.
    1. Replace the following values with the values generated in the previous step:
      • ETCD_NAME
      • ETCD_INITIAL_CLUSTER
      • ETCD_INITIAL_CLUSTER_STATE
    2. Modify the following variables with the new host IP from the output of the previous step. You can use ${NEW_ETCD_IP} as the value.
      ETCD_LISTEN_PEER_URLS
      ETCD_LISTEN_CLIENT_URLS
      ETCD_INITIAL_ADVERTISE_PEER_URLS
      ETCD_ADVERTISE_CLIENT_URLS
    3. If you previously used the member system as an etcd node, you must overwrite the current values in the /etc/etcd/etcd.conf file.
    4. Check the file for syntax errors or missing IP addresses, otherwise the etcd service might fail:
      # vi ${PREFIX}/etcd.conf
  9. On the node that hosts the installation files, update the [etcd] hosts group in the /etc/ansible/hosts inventory file. Remove the old etcd hosts and add the new ones.
  10. Create a tgz file that contains the certificates, the sample configuration file, and the ca and copy it to the new host:
    # tar -czvf /etc/etcd/generated_certs/${CN}.tgz -C ${PREFIX} .
    # scp /etc/etcd/generated_certs/${CN}.tgz ${CN}:/tmp/
Modify the new etcd host
  1. Install iptables-services to provide iptables utilities to open the required ports for etcd:
    # yum install -y iptables-services
  2. Create the OS_FIREWALL_ALLOW firewall rules to allow etcd to communicate:
    • Port 2379/tcp for clients
    • Port 2380/tcp for peer communication
      # systemctl enable iptables.service --now
      # iptables -N OS_FIREWALL_ALLOW
      # iptables -t filter -I INPUT -j OS_FIREWALL_ALLOW
      # iptables -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2379 -j ACCEPT
      # iptables -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2380 -j ACCEPT
      # iptables-save | tee /etc/sysconfig/iptables
      In this example, a new chain OS_FIREWALL_ALLOW is created, which is the standard naming the OpenShift Container Platform installer uses for firewall rules.
      If the environment is hosted in an IaaS environment, modify the security groups for the instance to allow incoming traffic to those ports as well.
  3. Install etcd:
    # yum install -y etcd
    Ensure version etcd-2.3.7-4.el7.x86_64 or greater is installed,
  4. Ensure the etcd service is not running by removing the etcd pod definition:
    # mkdir -p /etc/origin/node/pods-stopped
    # mv /etc/origin/node/pods/* /etc/origin/node/pods-stopped/
  5. Remove any etcd configuration and data:
    # rm -Rf /etc/etcd/*
    # rm -Rf /var/lib/etcd/*
  6. Extract the certificates and configuration files:
    # tar xzvf /tmp/etcd0.example.com.tgz -C /etc/etcd/
  7. Start etcd on the new host:
    # systemctl enable etcd --now
  8. Verify that the host is part of the cluster and the current cluster health:
    • If you use the v2 etcd api, run the following command:
      # etcdctl --cert-file=/etc/etcd/peer.crt \
                --key-file=/etc/etcd/peer.key \
                --ca-file=/etc/etcd/ca.crt \
                --peers="https://*master-0.example.com*:2379,\
                https://*master-1.example.com*:2379,\
                https://*master-2.example.com*:2379,\
                https://*etcd0.example.com*:2379"\
                cluster-health
      member 5ee217d19001 is healthy: got healthy result from https://192.168.55.12:2379
      member 2a529ba1840722c0 is healthy: got healthy result from https://192.168.55.8:2379
      member 8b8904727bf526a5 is healthy: got healthy result from https://192.168.55.21:2379
      member ed4f0efd277d7599 is healthy: got healthy result from https://192.168.55.13:2379
      cluster is healthy
    • If you use the v3 etcd api, run the following command:
      # ETCDCTL_API=3 etcdctl --cert="/etc/etcd/peer.crt" \
                --key=/etc/etcd/peer.key \
                --cacert="/etc/etcd/ca.crt" \
                --endpoints="https://*master-0.example.com*:2379,\
                  https://*master-1.example.com*:2379,\
                  https://*master-2.example.com*:2379,\
                  https://*etcd0.example.com*:2379"\
                  endpoint health
      https://master-0.example.com:2379 is healthy: successfully committed proposal: took = 5.011358ms
      https://master-1.example.com:2379 is healthy: successfully committed proposal: took = 1.305173ms
      https://master-2.example.com:2379 is healthy: successfully committed proposal: took = 1.388772ms
      https://etcd0.example.com:2379 is healthy: successfully committed proposal: took = 1.498829ms
Modify each OpenShift Container Platform master
  1. Modify the master configuration in the etcClientInfo section of the /etc/origin/master/master-config.yaml file on every master. Add the new etcd host to the list of the etcd servers OpenShift Container Platform uses to store the data, and remove any failed etcd hosts:
    etcdClientInfo:
      ca: master.etcd-ca.crt
      certFile: master.etcd-client.crt
      keyFile: master.etcd-client.key
      urls:
        - https://master-0.example.com:2379
        - https://master-1.example.com:2379
        - https://master-2.example.com:2379
        - https://etcd0.example.com:2379
  2. Restart the master API service:
    • On every master:
      # master-restart api
      # master-restart controllers
      The number of etcd nodes must be odd, so you must add at least two hosts.
  3. If you use Flannel, modify the flanneld service configuration located at /etc/sysconfig/flanneld on every OpenShift Container Platform host to include the new etcd host:
    FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379,https://etcd0.example.com:2379
  4. Restart the flanneld service:
    # systemctl restart flanneld.service

Bringing OpenShift Container Platform services back online

After you finish your changes, bring OpenShift Container Platform back online.

Procedure

  1. On each OpenShift Container Platform master, restore your master and node configuration from backup and enable and restart all relevant services:
    # cp ${MYBACKUPDIR}/etc/origin/node/pods/* /etc/origin/node/pods/
    # cp ${MYBACKUPDIR}/etc/origin/master/master.env /etc/origin/master/master.env
    # cp ${MYBACKUPDIR}/etc/origin/master/master-config.yaml.<timestamp> /etc/origin/master/master-config.yaml
    # cp ${MYBACKUPDIR}/etc/origin/node/node-config.yaml.<timestamp> /etc/origin/node/node-config.yaml
    # cp ${MYBACKUPDIR}/etc/origin/master/scheduler.json.<timestamp> /etc/origin/master/scheduler.json
    # master-restart api
    # master-restart controllers
  2. On each OpenShift Container Platform node, update the node configuration maps as needed, and enable and restart the atomic-openshift-node service:
    # cp /etc/origin/node/node-config.yaml.<timestamp> /etc/origin/node/node-config.yaml
    # systemctl enable atomic-openshift-node
    # systemctl start atomic-openshift-node

Restoring a project

To restore a project, create the new project, then restore any exported files by running oc create -f pods.json. However, restoring a project from scratch requires a specific order because some objects depend on others. For example, you must create the configmaps before you create any pods.

Procedure

  1. If the project was exported as a single file, import it by running the following commands:
    $ oc new-project <projectname>
    $ oc create -f project.yaml
    $ oc create -f secret.yaml
    $ oc create -f serviceaccount.yaml
    $ oc create -f pvc.yaml
    $ oc create -f rolebindings.yaml
    Some resources, such as pods and default service accounts, can fail to be created.

Restoring application data

You can restore application data by using the oc rsync command, assuming rsync is installed within the container image. The Red Hat rhel7 base image contains rsync.
This is a generic restoration of application data and does not take into account application-specific backup procedures, for example, special export and import procedures for database systems.
Other means of restoration might exist depending on the type of the persistent volume you use, for example, Cinder, NFS, or Gluster.

Procedure

Example of restoring a Jenkins deployment’s application data
  1. Verify the backup:
    $ ls -la /tmp/jenkins-backup/
    total 8
    drwxrwxr-x.  3 user     user   20 Sep  6 11:14 .
    drwxrwxrwt. 17 root     root 4096 Sep  6 11:16 ..
    drwxrwsrwx. 12 user     user 4096 Sep  6 11:14 jenkins
  2. Use the oc rsync tool to copy the data into the running pod:
    $ oc rsync /tmp/jenkins-backup/jenkins jenkins-1-37nux:/var/lib
    Depending on the application, you may be required to restart the application.
  3. Optionally, restart the application with new data:
    $ oc delete pod jenkins-1-37nux
    Alternatively, you can scale down the deployment to 0, and then up again:
    $ oc scale --replicas=0 dc/jenkins
    $ oc scale --replicas=1 dc/jenkins

Restoring Persistent Volume Claims

This topic describes two methods for restoring data. The first involves deleting the file, then placing the file back in the expected location. The second example shows migrating persistent volume claims. The migration would occur in the event that the storage needs to be moved or in a disaster scenario when the backend storage no longer exists.
Check with the restore procedures for the specific application on any steps required to restore data to the application.

Restoring files to an existing PVC

PROCEDURE

  1. Delete the file:
    $ oc rsh demo-2-fxx6d
    sh-4.2$ ls */opt/app-root/src/uploaded/*
    lost+found  ocp_sop.txt
    sh-4.2$ *rm -rf /opt/app-root/src/uploaded/ocp_sop.txt*
    sh-4.2$ *ls /opt/app-root/src/uploaded/*
    lost+found
  2. Replace the file from the server that contains the rsync backup of the files that were in the pvc:
    $ oc rsync uploaded demo-2-fxx6d:/opt/app-root/src/
  3. Validate that the file is back on the pod by using oc rsh to connect to the pod and view the contents of the directory:
    $ oc rsh demo-2-fxx6d
    sh-4.2$ *ls /opt/app-root/src/uploaded/*
    lost+found  ocp_sop.txt

Restoring data to a new PVC

The following steps assume that a new pvc has been created.

PROCEDURE

  1. Overwrite the currently defined claim-name:
    $ oc set volume dc/demo --add --name=persistent-volume \
      --type=persistentVolumeClaim --claim-name=filestore \ --mount-path=/opt/app-root/src/uploaded --overwrite
  2. Validate that the pod is using the new PVC:
    $ oc describe dc/demo
    Name:  demo
    Namespace: test
    Created: 3 hours ago
    Labels:  app=demo
    Annotations: openshift.io/generated-by=OpenShiftNewApp
    Latest Version: 3
    Selector: app=demo,deploymentconfig=demo
    Replicas: 1
    Triggers: Config, Image(demo@latest, auto=true)
    Strategy: Rolling
    Template:
      Labels: app=demo
      deploymentconfig=demo
      Annotations: openshift.io/container.demo.image.entrypoint=["container-entrypoint","/bin/sh","-c","$STI_SCRIPTS_PATH/usage"]
      openshift.io/generated-by=OpenShiftNewApp
      Containers:
       demo:
        Image: docker-registry.default.svc:5000/test/demo@sha256:0a9f2487a0d95d51511e49d20dc9ff6f350436f935968b0c83fcb98a7a8c381a
        Port: 8080/TCP
        Volume Mounts:
          /opt/app-root/src/uploaded from persistent-volume (rw)
        Environment Variables: <none>
      Volumes:
       persistent-volume:
        Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
        *ClaimName: filestore*
        ReadOnly: false
    ...omitted...
  3. Now that the deployement configuration uses the new pvc, run oc rsync to place the files onto the new pvc:
    $ oc rsync uploaded demo-3-2b8gs:/opt/app-root/src/
    sending incremental file list
    uploaded/
    uploaded/ocp_sop.txt
    uploaded/lost+found/
    
    sent 181 bytes  received 39 bytes  146.67 bytes/sec
    total size is 32  speedup is 0.15
  4. Validate that the file is back on the pod by using oc rsh to connect to the pod and view the contents of the directory:
    $ oc rsh demo-3-2b8gs
    sh-4.2$ ls /opt/app-root/src/uploaded/
    lost+found  ocp_sop.txt
Additional documentation for reference:
Restoring OpenShift Container Platform Components

Post a Comment

Previous Post Next Post