Dec 282025
 

If you happen to have tried upgrading Ubuntu 24.10 (probably – I didn’t check before getting this done) to Ubuntu 25.04 with ZFS, you will realise that the upgrade is blocked because of known issues. Specifically (without having seen the issue personally), the upgrade blocks at a certain point where the userland ZFS tools have been upgraded and the old kernel is still running.

Fair enough, but why hasn’t it been fixed? Or even a suggested work-around?

One suggestion I came across was to remove the ZFS storage pool(s), upgrade, and add them back in. For those not familiar with ZFS, this is done by simply importing the previously exported (or not) pool without loss of data.

Although backups are as always a good idea!

But there’s more to the suggestion than that, so here are my working notes … the ones written down long-hand with a pen on paper (something I rarely do these days) :-

  1. Shut down the virtual machines.
  2. Shut down the gooey – as in shut down the applications, and return to the login screen.
  3. Switch to a text console (most of the work is done here).
  4. Shut down the containers.
  5. Unmoun the ZFS filesystems
    • zfs unmount -a
    • Which failed, so killed off various running processes with pkill -u ${USER}
    • A second zfs unmount -a also failed and had to kill off various other processes until it worked.
  6. Export the pool which failed – including a second attempt when forced.
  7. Removed the ZFS packages :-
    • dpkg –remove zfs-zed zfs-utils-linux zfs-dkms
  8. Rebooted as Linux still thinks ZFS is enabled.
  9. Upgrade started in text mode.
    • Skips past the ZFS block and completes normally.
  10. Added the ZFS pages back (zfs-zed zfs-utils-linux zfs-dkms) and imported the pool. This did issue a dire warning about potential data loss with ZFS and this version of the Linux kernel. With any luck this is an outdated warning and perhaps more to do with ZFS root.

But that dire warning is probably worth avoiding the upgrade.

Model lighthouse in a lake.
The Lighthouse
Oct 302021
 

Ever since adding a couple of additional network interfaces to my workstation I have had a problem with reboots – the systemd-networkd-wait-online.service service “lingers” as it waits for all of the NICs to come online (and fails). Not especially problematic as everything works fine after the boot process has finished, but it slows down reboots (which are slow enough on this rather complicated desktop) and gives me an amber ✗ in my window manager’s status bar.

After spending some time re-jigging my storage (which consisted of far too many reboots), I finally decided to fix it.

Which basically consisted of making the relevant NICs “optional” in netplan. :-

    enp9s0f1:
      dhcp4: false
      accept-ra: false
      addresses: [172.16.76.0/24]
      optional: true

This isn’t one of the NICs that I actually use – I added the NIC configuration in an earlier attempt at making things work … unsuccessfully. The key part is the “optional: true” bit.

And whilst you’re in there, replacing the gatewayv4 and gatewayv6 specifications with the “new style” is worth doing too :-

      routes:
        - to: default
          via: 192.0.2.1
        - to: default
          via: 2001:db8:9c2:dead::1

(No those aren’t the real addresses)

This can be activated in the usual way – with a netplan apply (in my case a netplan try isn’t effective because of the use of bridges), although in this particular case a full reboot is called for.

The Round Table
Jun 212020
 

If you are just running Ubunbtu with ZFS without poking into the details, you may not be aware of the scrubber running. For background information, and for the benefit of those who prefer to go their own way, this is all about that little scrubber.

A pool scrub operation is where the kernel runs through checking all of the data in a pool and makes any necessary repairs. Whilst ZFS does check the integrity of the data (using checksums) when performing reads, a regular scrub repairs these issues in advance.

It need only be run weekly for larger systems or monthly for normal systems (it’s a pretty arbitrary border line). And can be started manually with :-

# zpool scrub pool0

(And “pool0” being the name of the pool to scrub)

Whilst a scrub is going on in the background, the only effect on the system is that disk accesses to that pool will be slightly slower than normal. Usually not enough to notice unless you are benchmarking!

When in progress the output of zpool status pool0 will show the current state and how long it is expected to take to complete the scrub. Once finished the status will look like :-

# zpool status | grep scan:
  scan: scrub repaired 0B in 0 days 09:19:27 with 0 errors on Sun Jun 21 10:36:28 2020

May 172020
 

There are two aspects to ZFS that I will be covering here – checksums and error-correcting memory. The first is a feature of ZFS itself; the second is a feature of the hardware that you are running and some claim that it is required for ZFS.

Checksums

By default ZFS keeps checksums of the blocks of data that it writes to later verify that the data block hasn’t been subject to silent corruption. If it detects corruption, it can use resilience (if any) to correct the corruption or it can indicate there’s a problem.

If you have only one disk and don’t ask to keep multiple copies of each block, then checksums will do little more than protect the most important metadata and tell you when things go wrong.

All that checksum calculation does make file operations slightly slower but frankly without benchmarks you are unlikely to notice. And it gives extra protection to your data.

For those who do not believe that silent data corruption exists, take a look at the relevant Wikipedia page. Everyone who has old enough files has come across occasional weird corruption in them, and whilst there are many possible causes, silent data corruption is certainly one of them.

Personally I feel like a probably unnoticeable loss of performance is more than balanced by greater data resilience.

Error-Correcting Memory

(Henceforth “ECC”)

I’m an enthusiast for ECC memory – my main workstation has a ton of it, and I’ve insisted on ECC memory for years. I’ve seen errors being corrected (although that was back when I was running an SGI Indigo2). Reliability is everything.

However there are those who will claim you cannot run ZFS without ECC memory. Or that ZFS without ECC is more dangerous than any other file system format without ECC.

Not really.

Part of the problem is that those with the most experience of ZFS are salty old Unix veterans who would are justifiably contemptuous of server hardware that lacks ECC memory (that includes me). We would no sooner consider running a serious file server on hardware that lacks ECC memory than rely on disk ‘reliability’ and not mirror or RAID those fallible pieces of spinning rust.

ZFS will run fine without ECC memory.

But will it make it worse?

It’s exceptionally unlikely – there are arguable examples of exceptionally esoteric failure conditions that may make things worse (the “scrub of death”) but I side with those who feel that such situations are not likely to occur in the real world.

And as always, why isn’t your data backed up anyway?

Apr 262020
 

Experimenting with Ubuntu’s “new” (relatively so) ZFS installation option is all very well, but encryption is not optional for a laptop that is taken around the place.

Whether I should have spent more time poking around the installer to find the option is a possibility, but post-install enabling encryption isn’t so difficult.

The first step is to create an encrypted filesystem – encryption only works on newly created filesystems and cannot be turned on later :-

zfs create -o encryption=on \
  -o keyformat=passphrase \
  rpool/USERDATA/ehome

You will be asked for the passphrase as it is created. Forgetting this is extremely inadvisable!

One created, reboot to check that :-

  1. You get prompted for the passphrase (as of Ubuntu 20.04 you do).
  2. That the encrypted filesystem gets mounted automatically (likewise).

At this point you should be able to create the filesystems for the relevant home directories :-

zfs create rpool/USERDATA/ehome/root
cd /root
rsync -arv . /ehome/root
cd /
zfs set mountpoint=/root rpool/USERDATA/ehome/root
(An error will result as there is something already there but it does the important bit)
zfs set mountpoint=none rpool/USERDATA/root_xyzzy
(A similar error)

Repeat this for each user on the system, and reboot. See if you can login and your files are present.

This leaves the old unencrypted home directories around (which can be removed with zfs destroy -r rpool/USERDATA/root_xyzzy). It is possible that this re-arrangement of how home directories work will break some of Ubuntu’s features – such as scheduled snapshots of home directories (which is why the destroy command needs the “-r” flag before).

But it’s getting there.