Aug 092018
 

Well that was a weird error; I recently discovered that ntpd had mysteriously stopped working; specifically it was not able to resolve NTP “pool” names :-

ntpd: error resolving pool europe.pool.ntp.org: Name or service not known (-2)

After some time spent blundering around down dead ends with the help of an appropriate search engine, I ended up resorting to strace. This is a tool most commonly used by developers but can be surprisingly useful for diagnosing system problems too.

As long as you can look past all the inscrutable output!

The strace tool runs a command and records every system call that the command calls together with the results. And of course most commands make zillions of system calls so you’re likely to end up with a huge output file.

To generate the output file, I ran the modern equivalent of ntpdate (ntpd -d) which tries to do the same thing using the actual NTP daemon. Usefully in this case because the command starts, configures itself (which is where the error occurs), and then exits (unlike the normal dæmon). It is important to redirect the output to have a file to trawl through later :-

strace ntpd -d > /var/tmp/ntpd.strace 2>&1

Once the output was generated, it was necessary to trawl through it to look for clues. The first thing was to search for “europe” (as I use europe.pool.ntp.org as one of my NTP servers). The first occurrence was the error claiming that the name didn’t exist :-

write(2, "error resolving pool europe.pool"..., 73error resolving pool europe.pool.ntp.org: Name or service not known (-2)

Which was somewhat odd because you would expect the string “europe” to occur within an instructable attempt resolve the name. Yet it appears as though the error occurs without any attempt to resolve the name!

As a bit of a guess I searched for “resolv.conf” which revealed :-

stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=362, ...}) = 0
openat(AT_FDCWD, "/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)

Apparently ntpd is unable to open the file due to a permissions problem!

Looking at my /etc/resolv.conf revealed an oddity dating back to when I tried configuring /etc/resolv.conf as a symbolic link to a file on a separate file system. The file itself was a symbolic link to /etc/resolv.conf.file.

For some reason ntpd didn’t like the symbolic link, which is a bit odd but changing it to an ordinary file fixed the problem.

Jul 302018
 

Alternatively, why does Windows use drive letters? Because if you are coming from an old unix background, drive letters are just as weird as the lack of them if you are coming from a Windows background.

I mean, why is Windows installed on drive C? What ever happened to drives A and B?

Technically Linux does have the equivalent of drive letters but they are rarely used directly (unless you’re weird like I am). For example I currently have an SD card plugged into my desktop system, and it has the path /dev/disk/by-label/EOS_DIGITAL (or /dev/sdo1).

Historically, Unix (which is loosely the predecessor of Linux) ran on large minicomputers where system administrators would decide what disks were “mounted” where.  The Linux equivalent of drive C is effectively “/” (root), and you can attach (or “mount”) disks at any point underneath that – for example /home.

This allowed people to use an old Unix machine without worrying where this disks were; and allowed system administrators to add and remove disks as and where they were needed. These days we are all system administrators as well as users – that little voice you hear from time to time saying things like “When would be a good time to update the operating system?” and “I must clean up those temporary files all over the place” are your inner system administrator speaking up.

And if you don’t hear that inner voice, cultivate it!

With device paths, Linux has the opportunity to create sensible friendly names for disks, but a historical accident has resulted in almost every kind of disk being identified as a SCSI disk – SATA disks (a normal hard disk), SAS disks (server hard disks), Fiber Channel disks (SAN hard disks), and even USB storage devices all use SCSI commands.

So nearly all Linux disks are identified as /dev/sd followed by a letter (a “drive letter” – we can’t get away from them) and a number indicating the partition. Fortunately there is also the relatively new /dev/disks directory that has slightly friendlier names for disk devices. If you are getting into low-level disk management, learn these directories; in particular if you are looking into enterprise disk management look at WWNs (each disk has a unique “world-wide-number”).

Now back to Windows. Windows is the descendent of DOS, which goes back to the time when PCs may not have had hard disks and by default would have booted off a floppy disk in drive A with a data disk in drive B. Later PCs came with hard disks which used drive C on the assumption that you would have one or two floppy drives.

Windows has been updated over the years and there is a great deal of sophistication under the surface, but it does act a bit conservatively when it comes to drive letters – A and B are by default reserved for floppy drives even though I haven’t seen one of those on an ordinary system for years. You can use A and B for other purposes such as mapping network drives – A makes a good drive for a NAS drive.

If we get away from the terminology of “drive letters” and “device paths” and instead refer to them as “storage device names”, both Linux and Windows have “storage device names” but Linux prefers to hide that level of detail.

Personally I prefer the Linux way, but whatever floats your boat.

Apr 012018
 

This is a continuation of an earlier post regarding ECC memory under Linux, and is how I added a little widget to display the current ECC memory status. Because I don’t really know lua, most of the work is carried out with a shell script that is run via cron on a frequent basis.

The shell script simply runs edac-util to obtain the number of correctable errors and uncorrectable errors, and formats the numbers in a way suitable for setting the text of a widget :-

#!/bin/zsh
#
# Use edac-util to report some numbers to display ...

correctables=$(edac-util --report=ce | awk '{print $NF}')
uncorrectables=$(edac-util --report=ue | awk '{print $NF}')

c="chartreuse"
if [[ "$correctables" != "0" ]]
then 
  c="orange"
fi
if [[ "$uncorrectables" != "0" ]]
then
  c="red"
fi

echo "ECC: $correctables/$uncorrectables "

This is run with a crontab entry :-

*/7 * * * * /site/scripts/gen-ecc-wtext > /home/mike/lib/awesome/widget-texts/ecc-status

Once the file is being generated, the Awesome configuration can take effect :-

-- The following function does what it says and is used in a number of dumb widgets
-- to gather strings from shell scripts
function readfiletostring (filename)
  file = io.open(filename, "r")
  io.input(file)
  s = io.read()
  io.close(file)
  return s
end

eccstatus = wibox.widget.textbox()
eccstatus:set_markup(readfiletostring(homedir .. "/lib/awesome/widget-texts/ecc-status"))
eccstatustimer = timer({ timeout = 60 })
eccstatustimer:connect_signal("timeout",
  function()
      eccstatus:set_markup(readfiletostring(homedir .. "/lib/awesome/widget-texts/ecc-status"))
  end
)
eccstatustimer:start()
...
layout = wibox.layout.fixed.horizontal, ... eccstatus, ...

There plenty of ways this could be improved – there’s nothing really that requires a separate shell script, but this works which is good enough for now.

Mar 092018
 

One of the things that annoys me about pagers such as lessmore, most, etc. is that they are dumb in the sense that they cannot detect the format of the text file they are displaying. For example, all of a sudden I find myself reading lots of markdown-formatted files, and I find myself using most to display it – never remembering that it is mdv I want.

As it happens, when I invoke a pager at the shell prompt, I typically use an alias (page or pg) to invoke a preferred pager, and by extending this functionality into a function I can start to approach what I want :-

function extension {
  printf "%s\n" ${${argv/*\./}:l}
}

function page {
 if [[ -z $argv ]]
 then
   $PAGER
 else
   case $(extension $argv) in
     "md")
       mdv -A $argv | $PAGER
       ;;
     "man")
       groff -m mandoc -Tutf8 $argv | $PAGER
       ;;
     *)
       $PAGER $argv
       ;;
     esac
   fi
}

(the function “extension” has had a quick bug fix applied to smash the filename extension to lowercase)

Of course there are undoubtedly umpteen errors in that, and probably better ways to do it too. And it won’t work properly on its own ($PAGER hasn’t been set).

But it’s the start of something I can use to display all sorts of text files in a terminal window without having to remember all those commands. But as for ‘intelligent’, nope it’s not that – just a bit smarter than the average pager.

Feb 022018
 

On occasions, I have run into issues where mounting a filesystem from /etc/fstab fails on a reboot because it depends on something else happening first. The easiest example to recall is when mounting a conventional filesystem constructed from a ZPool block device – the block device isn’t ready until ZFS has finished starting which often occurs after the filesystem mounts are attempted.

The fix is dead simple; just add the option “_netdev” to the options field in /etc/fstab and the problem is sorted :-

/dev/zvol/pool1/vol-splunk      /opt/splunk     ext2    noatime,_netdev         0 2

Yes the reason I am using a block device is that Splunk doesn’t support being installed on a ZFS filesystem.