11 KiB
How to migrate a glusterfs brick to a new machine
- Step 1 - Prepare the new machine
- Step 2 - Check the current state of the volume
- Step 3 - Migrate the physical disk
- Step 4 - Add the peer and brick
- Step 5 - Remove old machine
For our Raspberry Pi cluster one of the maintenance tasks we want to be able to do whenever required is add a new machine to the cluster as a replacement for an existing machine. There could be a lot of reasons to do this, such as a hardware failure on an existing machine or wanting to do a rebuild of an existing machine.
On our Raspberry Pi cluster we run GlusterFS for our scalable network attached storage with each machine in the cluster having a physical storage device attached (/dev/sda1). Currently we are using these 4TB external mechanical drives as they are a cost effective option for the bulk low performance storage required. Each machine formats their drive as xfs and mounts it to /data/brick1.
If we want to swap a machine out of the cluster and replace it with a new one we need to be able to safely migrate a disk to the new machine and ensure we migrate the brick to the new server in glusterfs. Below is a step by step guide on how to do this.
Note: This process has only been tested on a "quiet" cluster, i.e. where no / limited writes were occurring on the glusterfs cluster at the time. It may be best to stop the volume if that can't be guaranteed.
Step 1 - Prepare the new machine
Follow the automated build process documented in the readme to build the new machine and ensure it has the required mount points created to be ready for glusterfs.
Step 2 - Check the current state of the volume
Let's note down and confirm which machine we are removing! Additionally we should check for any running volume tasks, i.e. rebalances or other operations that could be impacted by the removal of a brick.
The code blocks below can be run individually from within humacs by pressing ,, when in normal mode and with a cursor inside the block or all at once in sequence by pressing , b s when in normal mode and the cursor is within this subtree.
Define connection variables
Firstly we need to connect to a peer in the glusterfs cluster that is NOT the machine we are going to remove in order to run the commands required to migrate a brick.
To save some time editing in future let's set some variables for the connection parameters we'll use.
(setq machine_ip "192.168.1.124")
(setq machine_port "2124")
(setq machine_user "james")
(setq knock_sequence "[SEQUENCE HERE]")
echo "Machine IP: $ip"
echo "Machine Port: $port"
echo "Machine User: $user"
echo "Knock sequence: $sequence"
Machine IP: "192.168.1.124" Machine Port: "2124" Machine User: "james" Knock sequence: "[SEQUENCE HERE]"
Confirm the volume information
As a precaution let's confirm which volume we are working with. We can also check the list of bricks returned to verify which brick we will be migrating to a new machine.
# Port knock to open ssh on the machine
eval "knock $ip $(echo $sequence | tr -d '"') && sleep 2"
# Run the command via ssh
eval "ssh $user@$ip -p $port sudo gluster volume info"
Volume Name: jammaraid
Type: Distribute
Volume ID: e073c9ac-6d0a-4163-87c1-e8df26c95669
Status: Started
Snapshot Count: 0
Number of Bricks: 5
Transport-type: tcp
Bricks:
Brick1: 192.168.1.122:/data/brick1/jammaraid
Brick2: 192.168.1.126:/data/brick1/jammaraid
Brick3: 192.168.1.128:/data/brick1/jammaraid
Brick4: 192.168.1.124:/data/brick1/jammaraid
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
performance.client-io-threads: on
Confirm the volume status
We should also check to confirm that there are no glusterfs volume tasks running currently.
# Port knock to open ssh on the machine
eval "knock $ip $(echo $sequence | tr -d '"') && sleep 2"
# Run the command via ssh
eval "ssh $user@$ip -p $port sudo gluster volume status"
Status of volume: jammaraid
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 192.168.1.122:/data/brick1/jammaraid 49152 0 Y 722
Brick 192.168.1.128:/data/brick1/jammaraid 49152 0 Y 723
Brick 192.168.1.124:/data/brick1/jammaraid 49152 0 Y 886
Brick 192.168.1.126:/data/brick1/jammaraid 49152 0 Y 836
Task Status of Volume jammaraid
------------------------------------------------------------------------------
There are no active volume tasks
Step 3 - Migrate the physical disk
For this next step we are going to power down both the machine we are migrating from and machine we are migrating to so that we can unplug the disk from the old machine and plug it into the new one.
We could probably hot swap the disk but this is not something I have tested. This stage I have left as a manual step, as it involves physical hardware.
Once the disk has been attached to the new machine power the new machine on.
Step 4 - Add the peer and brick
With the new machine now attached to the disk and powered on we can get it added as a peer in the glusterfs cluster and use the add-brick command to bring the disk back into service.
Add the new machine as a peer
First up we need probe the machine from an existing peer in the cluster to ensure it can be reached and is ready.
Note Ensure firewall rules allow connectivity to the new peer, I needed to run sudo iptables -I INPUT -p all -s 192.168.1.130 -j ACCEPT on all the other peers in the cluster.
# Port knock to open ssh on the machine
eval "knock $ip $(echo $sequence | tr -d '"') && sleep 2"
# Run the command to add peer and check status via ssh
eval "ssh $user@$ip -p $port sudo gluster peer probe 192.168.1.130"
eval "ssh $user@$ip -p $port sudo gluster peer status"
With those commands run we should see our old machine showing in the peer status list as Disconnected and our new machine showing as Connected.
peer probe: success. Host 192.168.1.130 port 24007 added to peer list
Number of Peers: 4
Hostname: 192.168.1.122
Uuid: d8680054-d376-40c7-9944-465f5fdb7c65
State: Peer in Cluster (Connected)
Hostname: 192.168.1.128
Uuid: 9fb11fc6-7a6e-4353-a12e-1ba68755f128
State: Peer in Cluster (Connected)
Hostname: 192.168.1.126
Uuid: 71d5d33d-df54-49f1-a4f1-21e20390342c
State: Peer in Cluster (Disconnected)
Hostname: 192.168.1.130
Uuid: bd7d6d14-9654-41b0-84c3-0723763760cb
State: Peer in Cluster (Connected)
Add the 'new' brick back into the cluster
With our machine now in the cluster we need to add the 'new' brick to the cluster. I put 'new' in air quotes because it's the same brick we detached from our old machine.
We add the force flag to the add-brick command as otherwise glusterfs will complain that the brick is already part of a volume.
# Port knock to open ssh on the machine
eval "knock $ip $(echo $sequence | tr -d '"') && sleep 2"
# Run the command to add peer and check status via ssh
eval "ssh $user@$ip -p $port sudo gluster volume add-brick jammaraid 192.168.1.130:/data/brick1/jammaraid force"
With the brick added we should see that reflected in the volume status.
# Port knock to open ssh on the machine
eval "knock $ip $(echo $sequence | tr -d '"') && sleep 2"
# Run the command via ssh
eval "ssh $user@$ip -p $port sudo gluster volume status"
Status of volume: jammaraid
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 192.168.1.122:/data/brick1/jammaraid 49152 0 Y 722
Brick 192.168.1.128:/data/brick1/jammaraid 49152 0 Y 723
Brick 192.168.1.124:/data/brick1/jammaraid 49152 0 Y 886
Brick 192.168.1.130:/data/brick1/jammaraid 49152 0 Y 836
Task Status of Volume jammaraid
------------------------------------------------------------------------------
There are no active volume tasks
Step 5 - Remove old machine
With the glusterfs brick now migrated successfully we are ready to remove the old machine from the glusterfs cluster which will involve removing the brick and peer.
Remove the old brick
First up we need to remove the brick, we need to add the force flag to the remove-brick command and also confirm "y" at a prompt which we can automate with echo.
# Port knock to open ssh on the machine
eval "knock $ip $(echo $sequence | tr -d '"') && sleep 2"
# Run the command via ssh
eval "ssh $user@$ip -p $port 'echo "y" | sudo gluster volume remove-brick jammaraid 192.168.1.126:/data/brick1/jammaraid force'"
Remove-brick force will not migrate files from the removed bricks, so they will no longer be available on the volume.
Do you want to continue? (y/n) volume remove-brick commit force: success
Remove the old peer
With the brick removed we can safely remove the peer from the glusterfs cluster.
# Port knock to open ssh on the machine
eval "knock $ip $(echo $sequence | tr -d '"') && sleep 2"
# Run the command via ssh
eval "ssh $user@$ip -p $port sudo gluster peer detach 192.168.1.126"
peer detach: success
Job done! We have migrated a disk with its glusterfs brick from one machine to another :)