Oct 302021
 

Ever since adding a couple of additional network interfaces to my workstation I have had a problem with reboots – the systemd-networkd-wait-online.service service “lingers” as it waits for all of the NICs to come online (and fails). Not especially problematic as everything works fine after the boot process has finished, but it slows down reboots (which are slow enough on this rather complicated desktop) and gives me an amber ✗ in my window manager’s status bar.

After spending some time re-jigging my storage (which consisted of far too many reboots), I finally decided to fix it.

Which basically consisted of making the relevant NICs “optional” in netplan. :-

    enp9s0f1:
      dhcp4: false
      accept-ra: false
      addresses: [172.16.76.0/24]
      optional: true

This isn’t one of the NICs that I actually use – I added the NIC configuration in an earlier attempt at making things work … unsuccessfully. The key part is the “optional: true” bit.

And whilst you’re in there, replacing the gatewayv4 and gatewayv6 specifications with the “new style” is worth doing too :-

      routes:
        - to: default
          via: 192.0.2.1
        - to: default
          via: 2001:db8:9c2:dead::1

(No those aren’t the real addresses)

This can be activated in the usual way – with a netplan apply (in my case a netplan try isn’t effective because of the use of bridges), although in this particular case a full reboot is called for.

The Round Table
Oct 162021
 

The one you’re running.

A bit of a simplistic answer but there’s a great deal of truth to it. It is too easy to get distracted by the new shiny and keep changing distributions. When the time could be far better spent just learning Linux – to a great extent all Linux distributions are the same. You can get Firefox (or whatever browser you prefer) with any of them; similarly LibreOffice is nearly always available. It’s the software you use on a daily basis that is important; not which distribution you’re using.

Similarly the desktop environment you use is selectable – this laptop has a distribution-specific flavour of GNOME, Awesome, Xmonad, and i3 (although I spend most of my time in Awesome). You might be able to tell something about my preferences for “desktop environments” from that list! A whole new desktop environment and a whole new look is just a quick software install away.

And a whole lot quicker and less disruptively than you can install a different distribution.

Different distributions offer different feature sets and different system administration commands (dpkg vs yum), but it isn’t that difficult to adjust to these differences especially when most of the time you are just using the computer to do real stuff rather than just managing it.

The Round Table
Aug 282021
 

Dealing with a potentially problematic SATA controller, I came across a little issue – which disks were connected to which controller? Not a problem most people would have to deal with but I do have rather a lot of disks. What I wanted was a tool that would list the controllers (lspci) with disks (block devices) shown per controller (lsblk).

I couldn’t find on, so I knocked up a quick and nasty shell script to do the job.

This isn’t a proper product and probably has many bugs (in particular it doesn’t like disks that are members of a volume group), but it works well enough for my use case :-

» ./print-block-tree 
01:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] X399 Series Chipset USB 3.1 xHCI Controller (rev 02)
  sr0:  PIONEER BD-RW_BDR-UD04 41443030303030303030303030303030 1024M
01:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] X399 Series Chipset SATA Controller (rev 02)
  sdi: 0x5000c50050ada74d ATA ST4000VN000-1H41 Z300H9GD 3.6T
  sdl: 0x500003992be00c53 ATA TOSHIBA_MG04ACA4 39DFK8S4FJKA 3.6T
  sdm: 0x500003992bf8077f ATA TOSHIBA_MG04ACA4 39CIK7DNFJKA 3.6T
  sdn: 0x500a075102fce9c7 ATA C300-CTFDDAC128M 00000000103402FCE9C7 119.2G
  sdo: 0x500003992bb80ede ATA TOSHIBA_MG04ACA4 39CAKCKDFJKA 3.6T
09:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9235 PCIe 2.0 x2 4-port SATA 6 Gb/s Controller (rev 11)
  sdp: 0x5002538f71100d76 ATA Samsung_SSD_870 S5STNG0R101271L 3.6T
  sdq: 0x50000399ec700c31 ATA TOSHIBA_MG04ACA4 30BXKC00FJKA 3.6T
41:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
  nvme0n1:  96G
42:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)
  sdb: 0x500080dc00b4e2e9 ATA TOSHIBA-TR200 28RB76F7K46S 223.6G
  sdc: 0x500080dc00b4e3f6 ATA TOSHIBA-TR200 28RB76MOK46S 223.6G
  sdd: 0x500080dc009263fa ATA TOSHIBA-TR200 976B607GK46S 223.6G
  sde: 0x500080dc00926416 ATA TOSHIBA-TR200 976B6088K46S 223.6G
  sdf: 0x50025388a09508a9 ATA Samsung_SSD_850 S1SMNSAG216528K 119.2G
  sdg: 0x50025385a01c8379 ATA Samsung_SSD_840 S1ANNSAF214088T 119.2G
  sdh: 0x500080dc009263f4 ATA TOSHIBA-TR200 976B607AK46S 223.6G
  sda: 0x50025388a09508b4 ATA Samsung_SSD_850 S1SMNSAG216534V 119.2G
44:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller
  sdj:  Generic- USB3.0_CRW_-SD 201404081410 59.5G

The script itself :-

#!/bin/sh
#
# Attempt at printing a "tree" of block devices

controllers=$(ls /dev/disk/by-path | awk -F- '{printf "%s-%s\n", $1, $2}' | uniq)
for c in $controllers
do
  rhs=$(echo ${c} | awk -F- '{print $2}')
  lspci -s ${rhs}
  blockdevices=$(ls -l /dev/disk/by-path/${c}* | grep -v part | awk '{print $NF}' | awk -F/ '{print $NF}' | uniq)
  for b in $blockdevices
  do
    exp=$(lsblk -no WWN,VENDOR,MODEL,SERIAL,SIZE /dev/${b} | head -1 | tr -s " ")
    if [ -n "${exp}" ]
    then
      echo "  ${b}: ${exp}"
    fi
  done
done
The Rower
Aug 282021
 

For a while now, my workstation has been spewing out this error in rather large volumes :-

Aug 27 00:00:07 pica multipathd[1686]: pktcdvd0: unusable path (wild) - checker failed

(about 18,000 per day)

The multipath daemon is for handling block devices (disks) with multiple connections and dynamically updating the geometry when errors occur. Not the sort of thing that you usually find in a workstation (or indeed most servers) and indeed it appears that I only have this installed because I started with the server install of Ubuntu.

It wasn’t causing any harm but it was annoying that it was spamming syslog log files, so I took a look at fixing it. Turns out it is rather easy. Just edit /etc/multipath.conf and add a “blacklist” section :-

blacklist {
       devnode "^pktcdvd0"
}

The parameter to “devnode” is a regular expression but in this case we can get away with a “^” (meaning beginning of string) followed by the name of the device.

At this point, you could restart the daemon :-

systemctl restart multipathd.service

This shouldn’t cause any problems on most machines without multiple paths; and it probably won’t be a problem for servers which do have multiple paths. But in the later case, I’d test it or just go for a full reboot.

Morning Lighthouse
Jul 182021
 

This is a procedure to replace one working drive in a fully functional mirror vdev; if you are replacing a failed disk there is no advantages in following this procedure. Although if you have a somewhat functional disk it may be worth trying.

So why not simply yank out the working disk you want to replace? Well, you can of course and that would work but there is nothing Murphy likes more than a mirrored vdev temporarily down to a single disk – resilvering onto a new disk guarantees a higher chance of failure of the previously working disk (I have actually seen this happening).

So I’m going to describe how to make a three-way mirror with three disks and then detach the disk you wanted to replace.

To do this there are some prerequisites :-

  1. You will need space to install an additional disk into your system; perhaps temporarily in an “unsuitable” location.
  2. You will need a spare SATA controller port to plug the new disk into. If necessary with an additional PCIe SATA controller (which sounds expensive but safety is worth the cost).
  3. You will need a SATA data cable and a SATA power cable.

The first step is to make very careful note of what devices you are going to “swap over” – ideally using their WWNs. If you don’t use WWNs, sorting out which disk is which is going to be a bit trickier.

The second step is to practice the steps involved using a ‘fake’ storage pool backed up by tiny disk files :-

# cd /pool1/temp
# for w in one two three
do
  dd if=/dev/zero of=test-disk-${w}.img bs=1M count=1000
done
# zpool create test mirror /pool1/temp/test-disk-one.img /pool1/temp/test-disk-two.img
# zpool attach test /pool1/temp/test-disk-one.img /pool/temp/test-disk-three.img
# zpool detach test /pool1/temp/test-disk-one.img

That’s pretty much it in a nutshell.

The real process is a bit more disturbing of course and most of the work is physical. The first difference from practice is that when you attach the new disk to one or other of the existing devices within the mirror, you will have to wait until the resilvering process is complete.

Whilst you will receive an estimate for that if you run zpool status, the estimate that you get :-

  scan: resilver in progress since Sun Jul 18 08:20:54 2021
	8.25T scanned at 1.09G/s, 7.28T issued at 981M/s, 8.25T total
	995G resilvered, 88.23% done, 0 days 00:17:16 to go

(Only showing the relevant part as the full output from my system is confusing and deceptive)

Is wildly inaccurate – partially because the resilvering process takes second place to any ordinary file system activity. My own estimate (1 hour per Tbyte) is probably also wildly inaccurate; basically it is done when it is done.

Detaching the old device is fast – you won’t need to sit down to wait for it.