Aug 282021
 

Dealing with a potentially problematic SATA controller, I came across a little issue – which disks were connected to which controller? Not a problem most people would have to deal with but I do have rather a lot of disks. What I wanted was a tool that would list the controllers (lspci) with disks (block devices) shown per controller (lsblk).

I couldn’t find on, so I knocked up a quick and nasty shell script to do the job.

This isn’t a proper product and probably has many bugs (in particular it doesn’t like disks that are members of a volume group), but it works well enough for my use case :-

» ./print-block-tree 
01:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] X399 Series Chipset USB 3.1 xHCI Controller (rev 02)
  sr0:  PIONEER BD-RW_BDR-UD04 41443030303030303030303030303030 1024M
01:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] X399 Series Chipset SATA Controller (rev 02)
  sdi: 0x5000c50050ada74d ATA ST4000VN000-1H41 Z300H9GD 3.6T
  sdl: 0x500003992be00c53 ATA TOSHIBA_MG04ACA4 39DFK8S4FJKA 3.6T
  sdm: 0x500003992bf8077f ATA TOSHIBA_MG04ACA4 39CIK7DNFJKA 3.6T
  sdn: 0x500a075102fce9c7 ATA C300-CTFDDAC128M 00000000103402FCE9C7 119.2G
  sdo: 0x500003992bb80ede ATA TOSHIBA_MG04ACA4 39CAKCKDFJKA 3.6T
09:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9235 PCIe 2.0 x2 4-port SATA 6 Gb/s Controller (rev 11)
  sdp: 0x5002538f71100d76 ATA Samsung_SSD_870 S5STNG0R101271L 3.6T
  sdq: 0x50000399ec700c31 ATA TOSHIBA_MG04ACA4 30BXKC00FJKA 3.6T
41:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
  nvme0n1:  96G
42:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)
  sdb: 0x500080dc00b4e2e9 ATA TOSHIBA-TR200 28RB76F7K46S 223.6G
  sdc: 0x500080dc00b4e3f6 ATA TOSHIBA-TR200 28RB76MOK46S 223.6G
  sdd: 0x500080dc009263fa ATA TOSHIBA-TR200 976B607GK46S 223.6G
  sde: 0x500080dc00926416 ATA TOSHIBA-TR200 976B6088K46S 223.6G
  sdf: 0x50025388a09508a9 ATA Samsung_SSD_850 S1SMNSAG216528K 119.2G
  sdg: 0x50025385a01c8379 ATA Samsung_SSD_840 S1ANNSAF214088T 119.2G
  sdh: 0x500080dc009263f4 ATA TOSHIBA-TR200 976B607AK46S 223.6G
  sda: 0x50025388a09508b4 ATA Samsung_SSD_850 S1SMNSAG216534V 119.2G
44:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller
  sdj:  Generic- USB3.0_CRW_-SD 201404081410 59.5G

The script itself :-

#!/bin/sh
#
# Attempt at printing a "tree" of block devices

controllers=$(ls /dev/disk/by-path | awk -F- '{printf "%s-%s\n", $1, $2}' | uniq)
for c in $controllers
do
  rhs=$(echo ${c} | awk -F- '{print $2}')
  lspci -s ${rhs}
  blockdevices=$(ls -l /dev/disk/by-path/${c}* | grep -v part | awk '{print $NF}' | awk -F/ '{print $NF}' | uniq)
  for b in $blockdevices
  do
    exp=$(lsblk -no WWN,VENDOR,MODEL,SERIAL,SIZE /dev/${b} | head -1 | tr -s " ")
    if [ -n "${exp}" ]
    then
      echo "  ${b}: ${exp}"
    fi
  done
done
The Rower
Aug 282021
 

For a while now, my workstation has been spewing out this error in rather large volumes :-

Aug 27 00:00:07 pica multipathd[1686]: pktcdvd0: unusable path (wild) - checker failed

(about 18,000 per day)

The multipath daemon is for handling block devices (disks) with multiple connections and dynamically updating the geometry when errors occur. Not the sort of thing that you usually find in a workstation (or indeed most servers) and indeed it appears that I only have this installed because I started with the server install of Ubuntu.

It wasn’t causing any harm but it was annoying that it was spamming syslog log files, so I took a look at fixing it. Turns out it is rather easy. Just edit /etc/multipath.conf and add a “blacklist” section :-

blacklist {
       devnode "^pktcdvd0"
}

The parameter to “devnode” is a regular expression but in this case we can get away with a “^” (meaning beginning of string) followed by the name of the device.

At this point, you could restart the daemon :-

systemctl restart multipathd.service

This shouldn’t cause any problems on most machines without multiple paths; and it probably won’t be a problem for servers which do have multiple paths. But in the later case, I’d test it or just go for a full reboot.

Morning Lighthouse
Jul 182021
 

This is a procedure to replace one working drive in a fully functional mirror vdev; if you are replacing a failed disk there is no advantages in following this procedure. Although if you have a somewhat functional disk it may be worth trying.

So why not simply yank out the working disk you want to replace? Well, you can of course and that would work but there is nothing Murphy likes more than a mirrored vdev temporarily down to a single disk – resilvering onto a new disk guarantees a higher chance of failure of the previously working disk (I have actually seen this happening).

So I’m going to describe how to make a three-way mirror with three disks and then detach the disk you wanted to replace.

To do this there are some prerequisites :-

  1. You will need space to install an additional disk into your system; perhaps temporarily in an “unsuitable” location.
  2. You will need a spare SATA controller port to plug the new disk into. If necessary with an additional PCIe SATA controller (which sounds expensive but safety is worth the cost).
  3. You will need a SATA data cable and a SATA power cable.

The first step is to make very careful note of what devices you are going to “swap over” – ideally using their WWNs. If you don’t use WWNs, sorting out which disk is which is going to be a bit trickier.

The second step is to practice the steps involved using a ‘fake’ storage pool backed up by tiny disk files :-

# cd /pool1/temp
# for w in one two three
do
  dd if=/dev/zero of=test-disk-${w}.img bs=1M count=1000
done
# zpool create test mirror /pool1/temp/test-disk-one.img /pool1/temp/test-disk-two.img
# zpool attach test /pool1/temp/test-disk-one.img /pool/temp/test-disk-three.img
# zpool detach test /pool1/temp/test-disk-one.img

That’s pretty much it in a nutshell.

The real process is a bit more disturbing of course and most of the work is physical. The first difference from practice is that when you attach the new disk to one or other of the existing devices within the mirror, you will have to wait until the resilvering process is complete.

Whilst you will receive an estimate for that if you run zpool status, the estimate that you get :-

  scan: resilver in progress since Sun Jul 18 08:20:54 2021
	8.25T scanned at 1.09G/s, 7.28T issued at 981M/s, 8.25T total
	995G resilvered, 88.23% done, 0 days 00:17:16 to go

(Only showing the relevant part as the full output from my system is confusing and deceptive)

Is wildly inaccurate – partially because the resilvering process takes second place to any ordinary file system activity. My own estimate (1 hour per Tbyte) is probably also wildly inaccurate; basically it is done when it is done.

Detaching the old device is fast – you won’t need to sit down to wait for it.

Oct 102020
 

One of the big names in the opensource world – Eric Raymond – has declared that Windows will soon be effectively a Linux distribution. Which seems like a ridiculous notion; except technically it might make a lot of sense.

How?

It seems impossible for Microsoft to replace Windows with Linux, but actually it could be done. Windows itself consists of a bunch of software applications which call Windows “APIs” which in turn make calls to the legacy NT kernel. If all that software is written cleanly (it won’t be, but bear with me), it should be possible to make modifications to both (or either) the Linux kernel and the Windows APIs to allow Windows software to run natively.

Impossible? Nope – it has already been done to a certain extent – Wine and Proton allow a considerable amount of Windows software (and games!) to run under Linux.

Why?

So it’s not impossible, but surely it is a lot of work. So why?

Microsoft has a bit of a problem – they don’t make a huge amount of money selling the Windows operating system, and maintaining it is hugely expensive. All those security fixes, all those bug fixes, and all those new features they want to introduce.

Now most of this is done to the “userland” rather than the kernel itself, but the kernel does still need to be maintained. But what if you could use the Linux kernel and get some level of maintenance supplied by those not employed by Microsoft?

Would that save Microsoft money? It seems quite possible, and you can bet someone in Microsoft has estimated whether it would or not.

Will It Happen?

There are those who point to certain actions by Microsoft – the Linux subsystem for Windows, the Edge browser for Linux, the rumour of an Office build under Linux, etc. as indicators that Microsoft is planning this.

I think they’re wrong to the extent that those actions don’t say whether Microsoft is planning to make Windows a Linux distribution or not. There are plenty of reasons why Microsoft is releasing Linux software not least because they will almost certainly have developers that believe that porting software is a good way of finding bugs.

The real answer is that the only people who know are inside Microsoft.

The Join
Jul 112020
 

So I am currently messing around with a tiling window manager on my laptop – I prefer tiling window managers in general (I use Awesome on my main desktops). These are (in general) not “desktop environments” but just manage windows (and sometimes a “status bar”).

As it happens the window manager I’m messing with doesn’t come as part of a distribution package with a pre-prepared file for GDM3 to use. So I created a ~/.xsession file – something that has worked since display managers first arrived.

Didn’t work.

Turns out that I need to “hack” GDM3 to make a long standard bit of functionality functional again. As an aside (and especially to the GNOME people), all you had to do to keep this functional was detect if someone had a ~/.xsession file and then offer that up as a menu option. Not that difficult to do and even if it isn’t your preferred way of doing things, it’s a nice thing to do for us old-timers.

Anyway, to restore this functionality all it took was to create a file in //usr/share/xsessions/ called xsession.desktop with the following contents :-

[Desktop Entry]
Name=XSession
Comment=This session uses the custom xsession file
Exec=/etc/X11/Xsession
Type=Application
DesktopNames=GNOME-Flashback;GNOME;
X-Ubuntu-Gettext-Domain=gnome-flashback

Dead simple.

And yes I stole this and adapted it myself – I’m putting this up here so that I know where to look when I need it again.