Jan 212012
 

One of the things I miss from Solaris are the Solaris Containers – zones – which are extremely useful for isolating lightweight services in their own “virtual machines”. This blog entry is about LXC – Linux Containers. There are of course other methods of accomplishing much the same thing with Linux, but LXC has the advantage that the necessary kernel extensions are included by default.

Or in other words it isn’t necessary to compile a custom kernel. Which has advantages in certain environments.

There are of course some disadvantages too – LXC isn’t quite as mature as some of the alternatives, and there’s a few missing features. But it works well enough.

What?

Operating system level virtualisation, or what I prefer to call lightweight virtualisation is a method by which you can run multiple virtual servers on a single physical (or virtual!) machine. Like normal virtualisation supplied by products such as ESX, Hyper-V, etc., light-weight virtualisation allows you to run multiple servers on a single instance of server hardware (actually you can do this on a virtual server too!).

Operating system level virtualisation is not quite the same as full virtualisation where you get a complete virtual machine with a VGA display, a keyboard, mouse, etc. Instead you trick the first program that starts on a normal Unix (or Linux) system – /sbin/init – into believing that it is running on a machine by itself when it is in fact running inside a specially created environment. That is if you want a full virtual operating system; it is also possible to setup an environment so that a container simply starts a single application.

In some ways this is similar to the ancient chroot mechanism which was often recommended for securely installing applications such as BIND which were prone to attack. But it has been improved with greater isolation from the operating system running on the hardware itself.

Note that I said /sbin/init – these containers do not run their own kernel. Merely their own user processes.

Why?

So why are these containers useful ? Especially in a world where virtualisation is ubiquitous and can even be free (VirtualBox, and various KVM solutions). Well of course they are, or they wouldn’t exist – the equivalent of BSD Jails has been introduced for every single remaining Unix-based system as well as Linux.

First of all, containers provide a perfectly viable virtualisation mechanism if all you require are numbers of Linux machines. Indeed it is possible to use this kind of virtualisation on already virtualised machines – for example on a Cloud-based virtual server (as you might get from Amazon) which could potentially save you money.

Secondly, containers provide lightweight virtualisation in that there is little to no overhead in using them. There is no virtualised CPU, no I/O virtualisation, etc. In most “heavyweight” virtualisation solutions the overhead of virtualisation is negligible except for I/O where there is often a considerable performance hit for disk-based applications.

Next if carefully setup, it is possible to reduce the incremental cost of each server installation if you use containers rather than full virtual machines. Every additional server you run has a cost associated with maintaining it in terms of money and time; a variety of different mechanisms can reduce this incremental cost, but there is still a cost there. Containers can be another means of reducing this incremental cost by making it easier to manage the individual servers.

As an example, it is possible to update which DNS servers each container uses by simply copying the /etc/resolv.conf file to each container :-

for container in $(lxc-ls | sort | uniq)
do
  cp /etc/resolv.conf /srv/lxc/${container}/rootfs/etc/resolv.conf
done

It’s also very handy for testing – create a container on a test server to mess around with some component or other to find out how it should be installed, and then throw away the container. This avoids “corrupting” a test server with the results of repeated experiments over time.

Of course there are also reasons why you should not use them.

First of all, it is another virtualisation technology to learn. Not a difficult one, but there is still more to learn.

In addition, it does not provide complete isolation – if you need to reboot the physical server, you will have to reboot all of the containers. So it is probably not a good technology for multiple servers that need to stay up forever – although the only real way of arranging that is to use a load balancer in front of multiple servers (even clustering sometimes requires an outage).

There is also the fact that this is not entirely a mature technology. That is not to say it isn’t stable, but more that the tools are not quite polished as yet

Finally there are hints that containers do not provide complete isolation – someone with root access on a container might be able to escape from the container. Thus it is probably not a good solution to provide isolation for security reasons.

How ?

The following instructions assume the use of Debian, although most modern distributions should be perfectly fine too – I’ve also done this with SLES, and seen instructions for ArchLinux. You can also mix and match distributions – SLES as the master operating system, and Debian for the distribution in the containers. That works perfectly fine.

To see if your currently running kernel supports the relevant extensions, see if the cgroups filesystem is available :-

# grep cgroup /proc/filesystems
nodev	cgroup

If the grep command doesn’t return the output as shown, you will need to upgrade and/or build your own kernel. Which is a step beyond where I’m going.

Initial Setup

Before installing your first container, you need to setup your server to support LXC. This is all pretty simple – the most complicated part is to setup your server to use a bridged adapter as it’s network interface. Which we will tackle first.

To start with, install the bridge-utils package :-

# apt-get install bridge-utils

Next edit your /etc/interfaces file. Leave the section dealing with the loopback interface (lo) alone and comment out anything relating to eth0. Once that’s done, add something to setup the bridge interface (br0) with :-

auto br0
iface br0 inet dhcp
   bridge_ports eth0
   bridge_fd 0

That sets up a bridged interface configured with an address supplied by a DHCP server – not perhaps the best idea for a production server, but perfectly fine for a test server. At least if you have a DHCP server on the network! If you need to configure a static address, copy the relevant parts from your commented out section – the address, netmask, and gateway keywords can be added below bridge_fd.

At this point it is worth rebooting the server to check that the new network interface is working fine. You could perform all of the steps in this initial setup section and do a reboot just at the end, but I prefer to reboot after each step when I’m trying something new. It’s easier to find the horrible mistake when you do it step by step.

Assuming the reboot fine, the next step is to automatically mount the cgroup filesystem. Add the following to the end of the /etc/fstab file :-

cgroup        /cgroup        cgroup        defaults    0    0

Make the /cgroup directory, and try to mount the filesystem :-

mkdir /cgroup
mount /cgroup

At this point you should be able to see a whole bunch of files in the /cgroup directory, but it won’t show up if you try a df. You should also reboot at this point to make sure that the filesystem automatically mounts when the system boots.

The final stage is to install the LXC runtime package :-

apt-get install lxc debootstrap

Note that the LXC package is available for many distributions, and the debootstrap package is a shell script runnable under most distributions given the presence of the right set of tools.

Now we are ready to start creating containers. Well, almost.

Creating Containers

When creating containers, it is usual to use a helper script to do the donkey work of setting this up. The LXC runtime package includes a number of such helper scripts. These scripts are very useful, but it is also worth indicating that they may require some local customisation – for instance the template script I used sets the root password to root; whilst it does suggest that this should be changed when the container is built, it is also very sensible to change this initial password to something at least half-reasonable.

And the longer and more extensively you use containers, the more local customisations you are likely to want. For instance, I tend to use a bind to ensure that the /site filesystem is mounted under each container, so I can be sure that my local scripts and configuration files are available on each and every container easily. So :-

  cd /usr/lib/lxc/templates
  cp lxc-debian lxc-local

When not using Debian, it is possible that the template scripts are installed in the main $PATH. In which case you may choose to remove them, or move them somewhere else to avoid their use in preference to your own versions.

It is also worth creating a template configuration file for the lxc-create script :-

cat > /etc/lxc/lxc.conf
lxc.network.type=veth
lxc.network.link=br0
lxc.network.flags=up
^D

To create your first container :-

  mkdir -p /var/lib/lxc/first
  lvcreate --name root-first --size=768M /dev/debian
  mkfs -t ext3 /dev/debian/root-first
  {Edit /etc/fstab to mount the new filesystem at /var/lib/lxc/first}
  mount /var/lib/lxc/root-first
  lxc-create -n first -t local -f /etc/lxc/lxc.conf

This goes through the process of “bootstrapping” Debian into a directory on your current system and setting up a configuration for your LXC container. Once complete, you are ready to start but first you will notice that I created a filesystem for the container’s root filesystem which should be self-evidently necessary if you want to avoid the possibility of a badly behaved container from bringing down other containers.

There are of course other things you can do at this stage before starting the container for the first time – for instance editing /srv/lxc/first/rootfs/etc/network/interfaces may be necessary to enter a static address which is particularly useful

Starting The Container

Once you have created a container, you will probably want to start it :-

lxc-start --daemon --name=first

You can start a container without the “daemon” option, but this means you are immediately connected to the console and it can be difficult to escape from. To connect to the system’s console try lsx-console, which should result in something like :-

# lxc-console --name=first

Type  to exit the console
--hit return here--
Debian GNU/Linux 5.0 first tty1

first login:

At this point you can login as root, and “fix” your container as you would do with an ordinary server. Install the software you need, make any changes you want, etc.

But there is one most noticeable oddity – rebooting a container does not seem to work properly. It seems to get stuck at the point where a normal server would reset the machine to boot again. Undoubtedly this will be fixed at some point … it’s possible that the fix is relatively simple anyway.

But for now the sequence :-

(in the container) shutdown -h now
(in the server) lxc-stop --name ${name-of-container}
(in the server) lxc-start --name ${name-of-container}

Will have to do.

Oct 202011
 

So there I was, installing a Linux distribution on my new laptop. Got to the end of the installation when it refused to install grub in the master boot record. Opted to try another partition, and rebooted. At which point the infamous error “Error: the symbol ‘grub_xputs’ not found” was shown with a “grub rescue” prompt.

At which point I had a laptop that wouldn’t boot of course.

To cut a long story short, because it’s only the fix I’m interested in recording for posterity, I sorted this out by booting off an emergency USB stick (unetbootin is a good tool for writing one … if you have a working system). Once booted, I setup an environment where chroot would function well. This is basically where you start a shell whose root directory is a directory under the normal root directory. This allows commands to be run almost as if the non-bootable system was booted.

mount /dev/sda5 /mnt # Mount the root filesystem of the unbootable system under /mnt
mount /dev/sda1 /mnt/boot # And the /boot filesystem
mount -o bind /proc /mnt/proc
mount -o bind /dev /mnt/dev
mount -o bind /sys /mnt/sys
chroot /mnt

Once that is done, there are quite a few things that can be done to repair a broken system, but I just needed to re-install grub to the MBR of /dev/sda :-

grub-install /dev/sda

Once that was done, everything booted fine.

Of course all that comes with the experience of a lot of time with Linux. Those who have not used it since the 1990s will not be as lucky, but there’s a few key points there :-

  1. Don’t panic. Just because it won’t boot doesn’t mean everything is lost.
  2. Write down the error message exactly as it appears on screen. A small mistake here can make searching for the error almost impossible.
  3. Get a rescue USB stick. Ideally before you break a system, but afterwards is usually possible even if you don’t have another working system – you have friends, or there are ways to write a USB stick at work.
  4. Search the Internet for the problem. You may have to spend quite a while reading other people’s problems that may or may not relate to your problem. You may have to improve your search methodology. Putting the error message in quotes is usually a good method.
  5. And if you find a solution to your problem online, check the date of the solution. Something that worked 5 years ago may not be the best solution today. And that applies to this page just as much as any other.
Oh! And to those who would jump and down screaming about this wouldn’t happen with Windows or OSX, please grow up. Such problems occur with any operating system – and I’ve seen them.
May 052011
 

For my own future reference …

Today I encountered an interesting little issue where I could not send an ABORT signal to a running process to kill it with a core dump because the process had a limit of 0 for the core dump size. Try as I might, I could not find a way to change that process’s core dump limit.

Turns out there is another way of tackling the problem, which is to use gdb to generate a core image :-

gdb
>attach PID
>gcore /var/tmp/core.PID

There is of course the gcore shell script wrapper for this, but that may not work if the working directory of the process no longer exists.

Jan 292011
 

In the dim and distant past when keyboards were enclosed in metal cases and you certainly didn’t tuck one under the arm and walk around with it (actually I don’t do now either), the placement of many keys was continually up for debate. But apart from the main QWERTY section, one of the key placements you could rely on was the Control key next to the “A” key. These days it’s been turned into one of those silly CapsLock keys.

Back when I previously did some keymapping, I neglected to mention how I mapped CapsLock into a Control key. As appropriate punishment, changing window managers has somehow meant that my previous mapping had been lost. So I had to figure out how to do it again.

First thing to do is to switch to a text console – I’ll be mapping this at a very low level.

Next thing to do is to find out the scancode of the key I want to map :-

# showkey -s

One started I have to press the key I am interested in within 10 seconds or the program will edit. I press CapsLock and I get two numbers displayed – 3a and ba (they’re in hexadecimal for the base-16 challenged). The first is the key press, and the second is the key release. We can discard the second as Linux is clever enough to figure out one from the other.

The next thing we want to do is to obtain the keycode of the key that we want to map to – in this case the left control key. It probably doesn’t matter here, but it is worth noting that the left and right control keys are different scancodes and keycodes. So you could map then to different things. Anyway, to obtain the keycode of the key we want, run :-

# showkey -k

And press the key to map to.

Lastly we want to construct the command to actually do the mapping :-

setkeycodes 3a 29

This of course has to be added to a script being run when the system boots – you want this mapped as early as possible.

Aug 112010
 

Ten years ago? Bloomin’ heck! Still as this still seems to get frequent hits, I suppose quick update is in order – I no longer use the Unicomp “naked” but go through a Soarer converter which is an easier way of mapping the keys and allows macros to be added.

Thanks to these guys (the thread eventually gets to the meaty details), I have changed my Unicomp 122-key keyboard (UB40T56) from a funky “be friendly to Windows IBM Terminal emulators” mode into something a little more interesting. Specifically each key should be sending a unique keycode – which if you select the right model from Unicomp, you’ll get.

This can be done by opening up the keyboard and removing the jumper from JP3 (just below a small chip and close to the scroll LED). In addition, it is strongly suggested that you set the kernel parameter “atkbd.softraw=0” which can be done with Ubuntu 10.04 with the following :-

  1. Edit /etc/default/grub and change the variable GRUB_CMDLINE_LINUX_DEFAULT to include “atkbd.softraw=0” at the end of what is already there.
  2. Finish editing and run grub-update and finally reboot.

Without this, showkey -s only shows the scancodes of keys that the kernel knows about – not what scancodes are sent by the keyboard! It is possible to show the unknown scancodes by enabling the i8042 module debug mode, but atkbd.softraw does the same thing more effectively.

This is currently a work in progress, and will need further checking before it can be taken as gospel, but …

Group Key Make Break Keycode
Function Keys
F13 5b db
F14 5c dc 95
F15 5d dd 183
F16 63 e3
F17 64 e4
F18 65 e5
F19 66 e6
F20 67 e7
F21 68 e8
F22 69 e9
F23 6a ea
F24 6b eb
(next row) F1 3b bb 59
F2 3c bc 60
F3 3d bd 61
F4 3e be 62
F5 3f bf 63
F6 40 c0 64
F7 41 c1 65
F8 42 c2 66
F9 43 c3 67
F10 44 c4 68
F11 57 d7 87
F12 58 d8 88
Left Keypad (Top left is “Esc”)
Esc 7e fe 121
Cent 76 f6 85
Print Screen 72 f2
Pause e1 1d 45 e1 9d c5 119
Print 74 f4
Help 6d ed
Record e0 2a e0 37 e0 b7 e0 aa 99
Play 6f ef
GUI (Windows) 75 f5
Menu 6c ec
Editing Pad (between QWERTY and Number Pad)
Backtab 5a da
Insert e0 49 e0 c9 104
PageUp e0 51 e0 d1 109
(next row) Blue Return e0 4f e0 cf 107
Delete e0 52 e0 d2 110
Page Down e0 53 e0 d3 111
(next row) Up Arrow e0 48 e0 c8 103
(next row) Left Arrow e0 4b e0 cb 105
Home e0 47 e0 c7 102
Right Arrow e0 4d e0 cd 106
(next row) Down Arrow e0 50 e0 d0 108
Number Pad
(top row) End 01 81 1
Scroll Lock 46 c6 70
(shifted Scroll Lock) Number Lock 45 c5 69
/ 37 b7 55
* e0 c5 e0 b5 98
(next row) KP-7 47 c7 71
KP-8 48 c8 72
KP-9 49 c9 73
4e ce 78
(next row) KP-4 41 cb 75
KP-5 4c cc 76
KP-6 4d cd 77
+ 4a ca 74
(next row) KP-1 4f cf 79
KP-2 50 d0 80
KP-3 51 d1 81
Enter e0 1c e0 9c 96
(next row) KP-0 52 d2 82
KP-. 53 d3 83
  • Group. To break things up a little, I’ve grouped the keys into the 5 separate parts of the keyboard – the function keys, the keypad to the left, the qwerty pad, the editing pad, and the number pad (“keypad”). The details of the qwerty pad will be the last as the other groups are more interesting (‘qwerty’ keys just work).
  • Key. This is the label on the key on my keyboard. This may be different on different variants so in all cases I have started with the top left, worked left and down (the “qwerty” row before the “asdf” row).
  • Make. This is the scancode produced when the key is pressed.
  • Break. This is the scancode produced when the key is released.
  • Keycode. The configured keycode produced on the Linux console. The red cells are values that are wrong, but in addition many are missing because they are not produced on a keypress. I say wrong because the keycode gives a result that does not match the key legend – in some cases dangerously wrong such as PageDown generating Delete. One thing to be aware of is that you must use “showkey -k” at the console to get the same numbers I have – X seems to add 8 to each keycode.

Three interesting oddities here … F14 and F15 have somehow been graced with keycodes by default; their scan codes must coincide with keys defined on more popular keyboards. And of course Num Lock and Scroll Lock sharing the same key is a little … odd. And lastly the Record key is effectively sending two keystrokes in one.

Fixing The Wrong Keys

The first place to start is to map the keys that return a keycode that represents a key other than that written on the keycap – such as the key marked “End” which thinks it is an “Esc” key. I have left out two of the wrong keys from this group as they fit better into the next section.

The wrong keys can be fixed with the following commands :-

setkeycodes 7e         1    # Esc
setkeycodes e049      82    # Insert
setkeycodes e051     105    # PageUp
setkeycodes e052     111    # Delete
setkeycodes e053     109    # PageDown
setkeycodes 01       107    # End
setkeycodes 37        98    # KP-/
setkeycodes e035      55    # KP-*
setkeycodes 4e        74    # KP--
setkeycodes 4a        78    # KP-+

I have left out a couple of the wrong keys from this section as they do not return dangerously incorrect values, and they fit more logically into the next section (being Record and Blue Return).

Dealing With The Extra Keys

Now onto dealing with the extra keys. The tricky bit here was coming up with new keycodes for these keys that did not conflict with existing keycodes, and were reasonable. This is effectively impossible, as xmodmap -pk appears to show no significant range of unused keycodes although some of the used keycodes are for things like “Shop” buttons!

So I picked a range with a larger number of useless key symbols and some unused ones :-

setkeycodes 5b       222    # F13
setkeycodes 5c       223    # F14
setkeycodes 5d       224    # F15
setkeycodes 63       225    # F16
setkeycodes 64       237    # F17
setkeycodes 65       238    # F18
setkeycodes 66       228    # F19
setkeycodes 67       229    # F20
setkeycodes 68       230    # F21
setkeycodes 69       231    # F22
setkeycodes 6a       232    # F23
setkeycodes 6b       233    # F24
setkeycodes 72        99    # Record (after keyswap)
setkeycodes 74       209    # Print
setkeycodes 6d       138    # Help
setkeycodes 6f       239    # Play
setkeycodes 75       234    # Windows (GUI)
setkeycodes 6c       240    # Menu
setkeycodes 5a       235    # Backtab
setkeycodes e04f     236    # BlueReturn

Once this has run, we can look at fixing the X mappings … which is why F17 and F18 are out of sequence in the above! One key has to be (at least until someone comes up with a better solution!) sorted out with a keycap swap. Take the keycap from the Record key and swap it for the one marked “Print Screen”. This is because the scancode for Record is effectively two scancodes in one and attempting to remap it will result in strange things happening.

Sorting Out X11

Once you have a set of keycodes that don’t do funny things under X (for instance F17 and F18 when in sequence produce not a keystroke under X11 but some other event), you can move onto configuring the X keyboard. The following attempts to map as close to the keycaps as possible without going to extremes :-

xmodmap -e "keycode 230 = F13"
xmodmap -e "keycode 231 = F14"
xmodmap -e "keycode 232 = F15"
xmodmap -e "keycode 233 = F16"
xmodmap -e "keycode 245 = F17"
xmodmap -e "keycode 246 = F18"
xmodmap -e "keycode 236 = F19"
xmodmap -e "keycode 237 = F20"
xmodmap -e "keycode 238 = F21"
xmodmap -e "keycode 239 = F22"
xmodmap -e "keycode 240 = F23"
xmodmap -e "keycode 241 = F24"
xmodmap -e "keycode 217 = Print"
xmodmap -e "keycode 9 = Escape 3270_Attn"
xmodmap -e "keycode  93 = cent bar"
xmodmap -e "keycode 175 = 3270_Record"
xmodmap -e "keycode 175 ="
xmodmap -e "keycode 247 = 3270_Play"
xmodmap -e "keycode 242 = Super_L"
xmodmap -e "keycode 248 = Multi_key"
xmodmap -e "keycode 243 = 3270_BackTab"
xmodmap -e "keycode 118 = Insert 3270_Duplicate"
xmodmap -e "keycode 112 = Prior 3270_Jump"
xmodmap -e "keycode 117 = Next 3270_Rule"

This results in a keyboard that more or less matches the key caps. For some of the blue symbols, you press the key in combination with shift.

The number pad could do with a little more attention in the realm of X-mapping, and there are a few blue symbols on the main qwerty pad that might be usefully mapped, but this is sufficient for my purposes.