Aug 092018
 

Well that was a weird error; I recently discovered that ntpd had mysteriously stopped working; specifically it was not able to resolve NTP “pool” names :-

ntpd: error resolving pool europe.pool.ntp.org: Name or service not known (-2)

After some time spent blundering around down dead ends with the help of an appropriate search engine, I ended up resorting to strace. This is a tool most commonly used by developers but can be surprisingly useful for diagnosing system problems too.

As long as you can look past all the inscrutable output!

The strace tool runs a command and records every system call that the command calls together with the results. And of course most commands make zillions of system calls so you’re likely to end up with a huge output file.

To generate the output file, I ran the modern equivalent of ntpdate (ntpd -d) which tries to do the same thing using the actual NTP daemon. Usefully in this case because the command starts, configures itself (which is where the error occurs), and then exits (unlike the normal dæmon). It is important to redirect the output to have a file to trawl through later :-

strace ntpd -d > /var/tmp/ntpd.strace 2>&1

Once the output was generated, it was necessary to trawl through it to look for clues. The first thing was to search for “europe” (as I use europe.pool.ntp.org as one of my NTP servers). The first occurrence was the error claiming that the name didn’t exist :-

write(2, "error resolving pool europe.pool"..., 73error resolving pool europe.pool.ntp.org: Name or service not known (-2)

Which was somewhat odd because you would expect the string “europe” to occur within an instructable attempt resolve the name. Yet it appears as though the error occurs without any attempt to resolve the name!

As a bit of a guess I searched for “resolv.conf” which revealed :-

stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=362, ...}) = 0
openat(AT_FDCWD, "/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)

Apparently ntpd is unable to open the file due to a permissions problem!

Looking at my /etc/resolv.conf revealed an oddity dating back to when I tried configuring /etc/resolv.conf as a symbolic link to a file on a separate file system. The file itself was a symbolic link to /etc/resolv.conf.file.

For some reason ntpd didn’t like the symbolic link, which is a bit odd but changing it to an ordinary file fixed the problem.