Oct 232015
 

Have you ever wondered if you can tinker with the ps command to change how and what is displayed? No? Well give up reading this post then.

I've known about ps for ages and also the way that the output can be tinkered with, but have not had an excuse to dig into it properly until I was looking for a way for ps to show the Linux container name for each process (don't get excited: ps -o machine is documented but not implemented at the time of writing). 

If you read the manual page for ps you will be quickly distracted by all the different options available. These can be grossly simplified into three different groups of options: which processes to list, what to output, and how to sort the output.

Which Processes?

This can be simplified down to almost nothing; ps on it's own lists just the processes running from the current terminal (window) :-

% ps
  PID TTY          TIME CMD
 2591 pts/17   00:00:00 zsh
13325 pts/17   00:00:00 ps

If you want to display all processes, add the "-e" option :-

% ps -e
  PID TTY          TIME CMD
    1 ?        00:00:03 systemd
    2 ?        00:00:00 kthreadd
    3 ?        00:00:00 ksoftirqd/0
    5 ?        00:00:00 kworker/0:0H
    7 ?        00:00:54 rcu_sched
    8 ?        00:00:00 rcu_bh
    9 ?        00:00:35 rcuos/0
   10 ?        00:00:00 rcuob/0
   11 ?        00:00:00 migration/0
   12 ?        00:00:00 watchdog/0
(cut)

And lastly (not literally – there are other options), add the "-p" option to list processes by process ID :-

% ps -p 1
  PID TTY          TIME CMD
    1 ?        00:00:03 systemd

Tuning The Output

By default the fields that ps outputs is somewhat peculiar until you realise that the output fields have been frozen in time. The default choice is somewhat minimal; and I'm not in favour of minimalism. And what use is the TTY and the TIME fields?

The TTY field shows you what terminal the process is running on – this was handy on a multi-user system where you could find out who was on what terminal and then write a message directly to their screen. A great way of winding people up, but not so much use these days. And TIME? We're no longer billed for the cpu time we consume, so the time spent running on the cpu is a rather pointless thing to list.

The "-f" option displays more information :-

% ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
mike     26486 31092  0 21:24 pts/24   00:00:00 ps -f
mike     31092 31091  0 20:11 pts/24   00:00:00 -zsh

But the output is still somewhat peculiar, and there are other more interesting fields to display.

There are various options for choosing the output format amongst a set of predefined choices, but the best bet is to ignore these and jump straight into selecting the individual fields that you want. These can be found in the manual page in the "STANDARD FORMAT SPECIFIERS" section. Simply list the fields you want after the "-o" option :-

% ps -o pid,comm,pcpu,pmem,nlwp,user,stat,sgi_p,wchan,class,pri,nice,flags
  PID COMMAND         %CPU %MEM NLWP USER     STAT P WCHAN  CLS PRI  NI F
28061 ps               0.0  0.0    1 mike     R+   3 -      TS   19   0 0
31092 zsh              0.0  0.0    1 mike     Ss   * -      TS   19   0 0

Obviously typing this in every time is somewhat less than ideal, but fortunately the authors of ps have already thought of this. By listing the fields within the PS_FORMAT environment variable, there is no need to specify -o :-

% export PS_FORMAT="pid,comm,pcpu,pmem,nlwp,user,stat,sgi_p,wchan,class,pri,nice,flags"
% ps
  PID COMMAND         %CPU %MEM NLWP USER     STAT P WCHAN  CLS PRI  NI F
29440 ps               0.0  0.0    1 mike     R+   5 -      TS   19   0 0
31092 zsh              0.0  0.0    1 mike     Ss   * -      TS   19   0 0

To make this pernament, add this to your shell startup rc file; whilst editing you may as well set PS_PERSONALITY to "linux".

Sorting The Output

According to the ps documentation, by default the output is not sorted. In that case either my kernel's process table is remarkably well organised, or the distributions I use "cheat" and sort the output in process ID order. In the distant past where computers were shared amongst too many people, and the machines themselves were quite slow, it made sense for the output of ps to be unsorted. But it certainly doesn't make sense now.

And the ps command allows processes to be sorted by any field that you can specify in the "STANDARD FORMAT SPECIFIERS" section which conveniently enough you are now intimately acquianted. Simply add the relevant field to the –sort option :-

% ps --sort pcpu
  PID COMMAND         %CPU %MEM NLWP USER     STAT P WCHAN  CLS PRI  NI F
31092 zsh              0.0  0.0    1 mike     Ss   * -      TS   19   0 0
31743 ps               0.0  0.0    1 mike     R+   5 -      TS   19   0 0

With just a short list (and such a low percentage of the cpu in use) it doesn't make sense, but added to -e, it does.

Rather than change the default sort order, I personally prefer to configure aliases to do the job for me :-

% alias pscpu='ps --sort pcpu'
% alias psmem='ps --sort pmem'

Preferring to use an alias here is rather convenient as there doesn't seem to be a way to configure the default sort order – officially there isn't one!

Reading through the ps manual page (during which you will notice many different options referring to old Unix varients) is a reminder of just how long and bitter the fight over which ps varient was the best. And now for a completely irrelevant picture :-

damascus-unix-prompt

Oct 032015
 

More up to date information can be found here.

One thing that has always puzzled me about Linux Containers was why it is necessary to configure the network address in two places – the container configuration, and the operating system configuration. The short answer is that it isn’t.

If you configure network addresses statically within the container configuration :-

» grep net /var/lib/lxc/mango/config 
# networking
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = br0
lxc.network.ipv4 = 10.0.0.35/16
lxc.network.ipv4.gateway = 10.0.0.1
lxc.network.ipv6 =         2001:0db8:ca2c:dead:0000:0000:0000:000a/64
lxc.network.ipv6.gateway = 2001:0db8:ca2c:dead:0000:0000:0000:0001

Then the configuration within the container’s operating system can simply be :-

» cat /var/lib/lxc/mango/rootfs/etc/network/interfaces
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet manual
iface eth0 inet6 manual

And that works fine.

Oct 032015
 

A newer post has more information (and more reliable information).

One of the things that has been mildly irritating me about my little collection of Linux containers has been that in addition to the statically defined IPv6 addresses, there is also an automatically defined IPv6 address :-

» lxc-ls --fancy
NAME      STATE    IPV4       IPV6                                                              AUTOSTART  
---------------------------------------------------------------------------------------------------------
apricot   RUNNING  10.0.0.34  2001:db8:ca2c:dead:21e:a0ff:feb6:6a, 2001:db8:ca2c:dead::3eb      YES        
chagers   RUNNING  10.0.0.32  2001:db8:ca2c:dead:804a:bfff:fe83:f98d, 2001:db8:ca2c:dead::5e11  YES        
glanders  RUNNING  10.0.0.31  2001:db8:ca2c:dead:21e:a0ff:feb6:66, 2001:db8:ca2c:dead::ba11     YES        
lyme      RUNNING  10.0.0.30  2001:db8:ca2c:dead:21e:a0ff:feb6:65, 2001:db8:ca2c:dead::cafe     YES        
mango     RUNNING  10.0.0.35  2001:db8:ca2c:dead:6c42:24ff:fe7d:4e9, 2001:db8:ca2c:dead::a      YES        
peach     RUNNING  10.0.0.33  2001:db8:ca2c:dead:21e:a0ff:feb6:68, 2001:db8:ca2c:dead::3a11     YES        
rhubarb   RUNNING  10.0.0.40  2001:db8:ca2c:dead:21e:a0ff:feb6:69, 2001:db8:ca2c:dead::dead     YES

Now this is hardly the end of the world, but it is not tidy and it is the sort of thing that may lead to problems down the road if servers are communicating on an address that is not reverse DNS registered. Or indeed when someone contacts a server on an address such as 2001:db8:ca2c:dead::3eb and the reply comes from 2001:db8:ca2c:dead:21e:a0ff:feb6:6a.

After any number of false starts, the answer is quite simple – use sysctl to turn off autoconfigured address from within the container; which doesn’t make much sense logically – containers don’t have a kernel of their own, so the global kernel should be the one that is tuned. However :-

for container in $(lxc-ls)
do
  echo net.ipv6.conf.eth0.autoconf = 0 >> /var/lib/lxc/$container/rootfs/etc/sysctl.conf
done

Does the trick (after a reboot)  :-

» lxc-ls --fancy
NAME      STATE    IPV4       IPV6                                                              AUTOSTART  
---------------------------------------------------------------------------------------------------------
apricot   RUNNING  10.0.0.34  2001:db8:ca2c:dead:21e:a0ff:feb6:6a, 2001:db8:ca2c:dead::3eb      YES        
chagers   RUNNING  10.0.0.32  2001:db8:ca2c:dead:18d9:99ff:fe28:3591, 2001:db8:ca2c:dead::5e11  YES        
glanders  RUNNING  10.0.0.31  2001:db8:ca2c:dead:21e:a0ff:feb6:66, 2001:db8:ca2c:dead::ba11     YES        
lyme      RUNNING  10.0.0.30  2001:db8:ca2c:dead::cafe                                          YES        
mango     RUNNING  10.0.0.35  2001:db8:ca2c:dead:2411:80ff:feb9:6600, 2001:db8:ca2c:dead::a     YES        
peach     RUNNING  10.0.0.33  2001:db8:ca2c:dead::3a11                                          YES        
rhubarb   RUNNING  10.0.0.40  2001:db8:ca2c:dead::dead                                          YES        

Except for the older containers 🙁

I’ve obviously missed something, but fixing nearly half of the containers is a good start.

After attending to pending upgrades (some of my old containers were still running wheezy), and setting the network configuration to manual, one of the recalictrant containers (glanders) lost it’s autoconfigured address.

Two more containers lost their unwanted extra addresses after “fixing” their configuration. I’m not sure what was wrong with the old configuration, but after copying and modifying a recently created container configuration, they rebooted with just one IPv6 address. The last one was mango, but after an extra reboot, it also was fixed :-

» lxc-ls --fancy
NAME      STATE    IPV4       IPV6                      AUTOSTART  
-----------------------------------------------------------------
apricot   RUNNING  10.0.0.34  2001:db8:ca2c:dead::3eb   YES        
chagers   RUNNING  10.0.0.32  2001:db8:ca2c:dead::5e11  YES        
glanders  RUNNING  10.0.0.31  2001:db8:ca2c:dead::ba11  YES        
lyme      RUNNING  10.0.0.30  2001:db8:ca2c:dead::cafe  YES        
mango     RUNNING  10.0.0.35  2001:db8:ca2c:dead::a     YES        
peach     RUNNING  10.0.0.33  2001:db8:ca2c:dead::3a11  YES        
rhubarb   RUNNING  10.0.0.40  2001:db8:ca2c:dead::dead  YES        
May 022015
 

I have recently been upgrading my Linux containers from Debian wheezy to jessie, and each time have encountered a problem preventing the container from booting. Or rather as it turns out, preventing the equivalent of init from starting any daemons. Which is systemd of course.

Now this is not some addition to the Great Systemd Debate (although my contribution to that debate may well arrive someday), but a simple fix, or at this stage a workaround (to use the dreaded ITIL phrase).

The fix is to re-install the traditional SystemV init package replacing the new systemd package. This can be done during the upgrade by running the following at the end of the usual process :-

apt-get install sysvinit-core

Of course you will probably be reading this after you have encountered the problem. There are probably many ways of dealing with the situation after you have tried rebooting and encountered this issue, but my choice is to run the following commands from what I tend to call the "global container" :-

chroot ${container root filesystem}
apt-get install sysvinit-core

As mentioned before, this is not a fix. And indeed the problem may be my own fault – perhaps it doesn't help having the "global container" still running wheezy. Perhaps there are some instructions in the Debian upgrade manual that details some extra step you should run. And of course by switching back to System V init, we are missing out on all of the systemd fun.

Apr 252015
 

So for ages I've been having these mysterious slow downs in connecting to some of my internal servers. A few seconds, but once connected things are working normally.

And of course I kept putting off having a look into the problem, because firstly I'm lazy, secondly there are other more interesting things to look at, and thirdly I'd already discounted the obvious (actually I'd "fixed" it but made certain assumptions). But it's finally time to have a look.

Now I said I'd earlier discounted the obvious but decided to have a look any way. The thing to remember is that when you connect to a server it almost always performs a DNS lookup on your network address, so a mysterious slow down could well indicate that DNS resolution is to blame. You could perform diagnostics to determine what the problem is, but in all the decades I've been solving issues with computers whenever a mysterious slow down has occurred when connecting over the network, then the problem has almost always been the DNS resolver.

Taking a look at /etc/resolv.conf on the relevant server (a Linux container), and I find the file has a nameserver within it that was retired several weeks ago! Fixing that solved the issue.

Lessons learnt :-

  1. Just because you have a centrally distributed /etc/resolv.conf that is automatically installed on all your home network doesn't mean to say that it is always automatically installed. My Linux containers don't get that centrally distributed file (which had been corrected!).
  2. Don't assume that it's not the obvious even if you have reasons for thinking it couldn't possibly be the obvious (see #1).