Jul 182021
 

This is a procedure to replace one working drive in a fully functional mirror vdev; if you are replacing a failed disk there is no advantages in following this procedure. Although if you have a somewhat functional disk it may be worth trying.

So why not simply yank out the working disk you want to replace? Well, you can of course and that would work but there is nothing Murphy likes more than a mirrored vdev temporarily down to a single disk – resilvering onto a new disk guarantees a higher chance of failure of the previously working disk (I have actually seen this happening).

So I’m going to describe how to make a three-way mirror with three disks and then detach the disk you wanted to replace.

To do this there are some prerequisites :-

  1. You will need space to install an additional disk into your system; perhaps temporarily in an “unsuitable” location.
  2. You will need a spare SATA controller port to plug the new disk into. If necessary with an additional PCIe SATA controller (which sounds expensive but safety is worth the cost).
  3. You will need a SATA data cable and a SATA power cable.

The first step is to make very careful note of what devices you are going to “swap over” – ideally using their WWNs. If you don’t use WWNs, sorting out which disk is which is going to be a bit trickier.

The second step is to practice the steps involved using a ‘fake’ storage pool backed up by tiny disk files :-

# cd /pool1/temp
# for w in one two three
do
  dd if=/dev/zero of=test-disk-${w}.img bs=1M count=1000
done
# zpool create test mirror /pool1/temp/test-disk-one.img /pool1/temp/test-disk-two.img
# zpool attach test /pool1/temp/test-disk-one.img /pool/temp/test-disk-three.img
# zpool detach test /pool1/temp/test-disk-one.img

That’s pretty much it in a nutshell.

The real process is a bit more disturbing of course and most of the work is physical. The first difference from practice is that when you attach the new disk to one or other of the existing devices within the mirror, you will have to wait until the resilvering process is complete.

Whilst you will receive an estimate for that if you run zpool status, the estimate that you get :-

  scan: resilver in progress since Sun Jul 18 08:20:54 2021
	8.25T scanned at 1.09G/s, 7.28T issued at 981M/s, 8.25T total
	995G resilvered, 88.23% done, 0 days 00:17:16 to go

(Only showing the relevant part as the full output from my system is confusing and deceptive)

Is wildly inaccurate – partially because the resilvering process takes second place to any ordinary file system activity. My own estimate (1 hour per Tbyte) is probably also wildly inaccurate; basically it is done when it is done.

Detaching the old device is fast – you won’t need to sit down to wait for it.

Jun 212020
 

If you are just running Ubunbtu with ZFS without poking into the details, you may not be aware of the scrubber running. For background information, and for the benefit of those who prefer to go their own way, this is all about that little scrubber.

A pool scrub operation is where the kernel runs through checking all of the data in a pool and makes any necessary repairs. Whilst ZFS does check the integrity of the data (using checksums) when performing reads, a regular scrub repairs these issues in advance.

It need only be run weekly for larger systems or monthly for normal systems (it’s a pretty arbitrary border line). And can be started manually with :-

# zpool scrub pool0

(And “pool0” being the name of the pool to scrub)

Whilst a scrub is going on in the background, the only effect on the system is that disk accesses to that pool will be slightly slower than normal. Usually not enough to notice unless you are benchmarking!

When in progress the output of zpool status pool0 will show the current state and how long it is expected to take to complete the scrub. Once finished the status will look like :-

# zpool status | grep scan:
  scan: scrub repaired 0B in 0 days 09:19:27 with 0 errors on Sun Jun 21 10:36:28 2020

May 172020
 

There are two aspects to ZFS that I will be covering here – checksums and error-correcting memory. The first is a feature of ZFS itself; the second is a feature of the hardware that you are running and some claim that it is required for ZFS.

Checksums

By default ZFS keeps checksums of the blocks of data that it writes to later verify that the data block hasn’t been subject to silent corruption. If it detects corruption, it can use resilience (if any) to correct the corruption or it can indicate there’s a problem.

If you have only one disk and don’t ask to keep multiple copies of each block, then checksums will do little more than protect the most important metadata and tell you when things go wrong.

All that checksum calculation does make file operations slightly slower but frankly without benchmarks you are unlikely to notice. And it gives extra protection to your data.

For those who do not believe that silent data corruption exists, take a look at the relevant Wikipedia page. Everyone who has old enough files has come across occasional weird corruption in them, and whilst there are many possible causes, silent data corruption is certainly one of them.

Personally I feel like a probably unnoticeable loss of performance is more than balanced by greater data resilience.

Error-Correcting Memory

(Henceforth “ECC”)

I’m an enthusiast for ECC memory – my main workstation has a ton of it, and I’ve insisted on ECC memory for years. I’ve seen errors being corrected (although that was back when I was running an SGI Indigo2). Reliability is everything.

However there are those who will claim you cannot run ZFS without ECC memory. Or that ZFS without ECC is more dangerous than any other file system format without ECC.

Not really.

Part of the problem is that those with the most experience of ZFS are salty old Unix veterans who would are justifiably contemptuous of server hardware that lacks ECC memory (that includes me). We would no sooner consider running a serious file server on hardware that lacks ECC memory than rely on disk ‘reliability’ and not mirror or RAID those fallible pieces of spinning rust.

ZFS will run fine without ECC memory.

But will it make it worse?

It’s exceptionally unlikely – there are arguable examples of exceptionally esoteric failure conditions that may make things worse (the “scrub of death”) but I side with those who feel that such situations are not likely to occur in the real world.

And as always, why isn’t your data backed up anyway?

Apr 262020
 

Experimenting with Ubuntu’s “new” (relatively so) ZFS installation option is all very well, but encryption is not optional for a laptop that is taken around the place.

Whether I should have spent more time poking around the installer to find the option is a possibility, but post-install enabling encryption isn’t so difficult.

The first step is to create an encrypted filesystem – encryption only works on newly created filesystems and cannot be turned on later :-

zfs create -o encryption=on \
  -o keyformat=passphrase \
  rpool/USERDATA/ehome

You will be asked for the passphrase as it is created. Forgetting this is extremely inadvisable!

One created, reboot to check that :-

  1. You get prompted for the passphrase (as of Ubuntu 20.04 you do).
  2. That the encrypted filesystem gets mounted automatically (likewise).

At this point you should be able to create the filesystems for the relevant home directories :-

zfs create rpool/USERDATA/ehome/root
cd /root
rsync -arv . /ehome/root
cd /
zfs set mountpoint=/root rpool/USERDATA/ehome/root
(An error will result as there is something already there but it does the important bit)
zfs set mountpoint=none rpool/USERDATA/root_xyzzy
(A similar error)

Repeat this for each user on the system, and reboot. See if you can login and your files are present.

This leaves the old unencrypted home directories around (which can be removed with zfs destroy -r rpool/USERDATA/root_xyzzy). It is possible that this re-arrangement of how home directories work will break some of Ubuntu’s features – such as scheduled snapshots of home directories (which is why the destroy command needs the “-r” flag before).

But it’s getting there.

Apr 262020
 

A number of those who have experimented with Ubuntu’s ZFS install option (which as of 20.04 is marked as “experimental”) have expressed bewilderment over the number of filesystems created :-

The short answer as to why is that there are two schools of thought amongst grizzled old Unix veterans as to whether one big filesystem should be the way to go or lots of little ones. There are pros and cons to both approaches, and whilst I have a preference for lots of filesystems (especially on servers), I don’t care enough to change it on a laptop install.

Even though those who insist on one big filesystem are wrong.

As to the longer explanation …

Some History

A long time ago – the 1970s or the 1980s – Unix systems lacked sophisticated disk management software, and the disks were very much smaller (I started off with 80Mbyte disks and no that isn’t a typo, and many started with much smaller disks). On larger Unix servers, you couldn’t fit everything onto one disk, so we got used to splitting up the filesystem into many separate filesystem – / on one disk partition (or slice), /usr on another, /var on a third, /home on yet another, etc.

These very frequently got further subdivided – /var/mail, /var/tmp, /var/spool, etc. as Unix servers got larger and busier.

Those days are long past, and nobody is keen to go back to those days so why do some still like to split things up?

The Fringe Benefits of Splitting

It turns out that there was a fringe benefit to splitting up the filesystems – disk space exhaustion on one wouldn’t cause a problem elsewhere. For example if a mail server had a separate /var/spool/mail filesystem for operating within it would still continue to operate if /var filled up; similarly a DNS server wouldn’t crash and burn if it had a /var/named filesystem and /var filled up.

Both of those examples are known to me personally – and there are many other examples.

Of course there is also a downside – if you create a separate /var/spool/mail filesystem you need to make sure it is large enough to operate not just normally but in reasonable exceptional circumstances. Or your mail server crashes and burns.

On the other hand, if you don’t separate things out then when something goes berserk and fills up all the disk space then you will have a good deal of trouble actually logging in to fix things.

In a sense, the “everything in one” camp and “lots of little filesystems” camp are determined by what troubles we’ve seen over the years (and in some cases decades).

With something like ZFS you can set quotas to limit the size of any filesystem so managing the sizes of these separate filesystems is a great deal easier than it ever was in the past! Ubuntu does not set quotas by default on a desktop installation; for a server it may well be worth checking quotas and setting them appropriately.

And Snapshots …

One of the other things that Ubuntu does with ZFS and filesystem snapshots (we’ll worry about what those are another time) is to offer to rollback a broken update. People worry that upgrading their system will break things and the ability to quickly revert to the previous state is very comforting.

But the Unix file layout “standard” and the later Linux file layout standard were not designed with snapshots in mind, and simply rolling back the whole of “/” would have negative effects – not least you would lose any file changes you had made in /home and any mail stashed away in /var/mail.

So to implement the ability to rollback updates requires numerous separate filesystems to avoid losing important data.

It is also likely that it would be beneficial to tune separate filesystems for different requirements.

Finally

In short, don’t worry about it. It’ll have very little effect on your operation of a normal Ubuntu machine unless you choose to take advantage of it. And it makes possible certain features that you will probably like – such as the ability to revert updates.