Jul 182021
 

This is a procedure to replace one working drive in a fully functional mirror vdev; if you are replacing a failed disk there is no advantages in following this procedure. Although if you have a somewhat functional disk it may be worth trying.

So why not simply yank out the working disk you want to replace? Well, you can of course and that would work but there is nothing Murphy likes more than a mirrored vdev temporarily down to a single disk – resilvering onto a new disk guarantees a higher chance of failure of the previously working disk (I have actually seen this happening).

So I’m going to describe how to make a three-way mirror with three disks and then detach the disk you wanted to replace.

To do this there are some prerequisites :-

  1. You will need space to install an additional disk into your system; perhaps temporarily in an “unsuitable” location.
  2. You will need a spare SATA controller port to plug the new disk into. If necessary with an additional PCIe SATA controller (which sounds expensive but safety is worth the cost).
  3. You will need a SATA data cable and a SATA power cable.

The first step is to make very careful note of what devices you are going to “swap over” – ideally using their WWNs. If you don’t use WWNs, sorting out which disk is which is going to be a bit trickier.

The second step is to practice the steps involved using a ‘fake’ storage pool backed up by tiny disk files :-

# cd /pool1/temp
# for w in one two three
do
  dd if=/dev/zero of=test-disk-${w}.img bs=1M count=1000
done
# zpool create test mirror /pool1/temp/test-disk-one.img /pool1/temp/test-disk-two.img
# zpool attach test /pool1/temp/test-disk-one.img /pool/temp/test-disk-three.img
# zpool detach test /pool1/temp/test-disk-one.img

That’s pretty much it in a nutshell.

The real process is a bit more disturbing of course and most of the work is physical. The first difference from practice is that when you attach the new disk to one or other of the existing devices within the mirror, you will have to wait until the resilvering process is complete.

Whilst you will receive an estimate for that if you run zpool status, the estimate that you get :-

  scan: resilver in progress since Sun Jul 18 08:20:54 2021
	8.25T scanned at 1.09G/s, 7.28T issued at 981M/s, 8.25T total
	995G resilvered, 88.23% done, 0 days 00:17:16 to go

(Only showing the relevant part as the full output from my system is confusing and deceptive)

Is wildly inaccurate – partially because the resilvering process takes second place to any ordinary file system activity. My own estimate (1 hour per Tbyte) is probably also wildly inaccurate; basically it is done when it is done.

Detaching the old device is fast – you won’t need to sit down to wait for it.