Nov 242012
 

NTP is one of those strange services that are so vital to the operation of an organisation’s network; if the servers around the network get their time in a muddle, all sorts of strange things can start happening. Besides which most people expect their computers to be able to tell the right time.

But often it is one of the unloved services. After all no user is going to ask about the health of the NTP service. And if you are a senior manager involved in IT, do you know who manages your NTP infrastructure ? If so, have you ever asked them to explain the design of the NTP infrastructure ? If not, you may find a nasty surprise – your network’s NTP infrastructure may rely on whatever servers can be scavenged and with the minimum investment of time.

Of course, NTP is pretty reliable and in most circumstances extremely resilient. NTP has built in safeguards against against confused time servers sending wildly inappropriate time adjustments, and even in the event of a total NTP failure, servers should be able to keep reasonable time for at least a while. Even with a minimal of investment, an NTP infrastructure can often run merrily in the background for years without an issue.

Not that it is a good idea to ignore NTP for years. It is better by far to spend a little time and money on a yearly basis to keep things fresh – perhaps a little server, and a day’s time each year.

That was quite a long rambling introduction to the NTP “glitch” that I learned about this week, but perhaps goes some way to explaining why such a glitch occurred.

A number of organisations reported that their network had started reporting a time way back in the year 2000. It turns out that :-

  • The USN(aval)O(observatory) had a server that for 51 minutes reported the year as 2000 rather than 2012.
  • A number of organisations with an insufficient number of clock sources (i.e. just the erroneous USNO one) attempted to synchronise to the year 2000 causing the NTP daemon to stop.
  • Some “clever” servers noticed that NTP had stopped, and restarted it. Because most default NTP startup scripts set the clock on startup, these servers were suddenly sent back in time to the year 2000.

And a cascade of relative minor issues, becomes a major issue.

Reading around, the recommendations to prevent this sort of thing happening :-

  1. Use an appropriate number of time sources for your main NTP servers; various suggestions have been made ranging from 5 (probably too few) to 8 (perhaps about right) to 20 (possibly overkill).
  2. Have an appropriate number of main NTP servers for your servers (and other equipment) to synchronise their time with. Anything less than 3 is inadequate; more than 4 is recommended.
  3. Prevent your main NTP servers from setting their time when NTP is restarted and monitor the time on each server regularly.
  4. And a personal recommendation: Restart all your NTP daemons regularly – perhaps daily – to get them to check with the DNS for any updated NTP server names.
  5. And as suggested above, regularly review your NTP infrastructure.
Oct 172012
 

I have recently become interested in the amount of entropy available in Linux and decided to spend some time poking around on my Debian workstation. Specifically looking to increase the amount of entropy available to improve the speed of random number generation. There are a variety of different ways of accomplishing this including hardware devices (some of which cost rather too much for a simple experiment).

Eh?

Linux has a device (/dev/random) which makes available random numbers to software packages that really need access to a high quality source of random numbers. Any decently written cryptographic software will use /dev/random (and not /dev/urandom which does not generate “proper” random numbers of quality) to implement encryption.

Using poor quality random numbers can potentially result in encryption not being secure. Or perhaps more realisticallybecause Linux waits until there is sufficient entropy available before releasing numbers through /dev/random, software reading from that device may be subject to random stalling. Not necessarily long enough to cause a major problem, but perhaps enough to have an effect on performance.

Especially for a server in a virtualised environment!

Adding Entropy The Software Way (haveged)

HAVEGED is a way of using processor flutter to add entropy to the Linux /dev/random device. It can be installed relatively easily with :-

apt-get install haveged
/etc/init.d/haveged start

As soon as this was running the amount of entropy available (cat /proc/sys/kernel/random/entropy_avail) jumped from several hundred to close to 4,000.

Now does this increased entropy have an effect on performance? Copying a CD-sized ISO image file using ssh :-

Default entropy 29.496
With HAVEGED 28.636

A 2% improvement in performance is hardly a dramatic improvement, but every little bit helps and it may well have a more dramatic effect on a server which regularly exhausts entropy.

Checking The Randomness

But hang on … more important than performance is the randomness of the numbers generated. And you cannot mess with the generation of random numbers without checking the results. The first part of checking the randomness is making sure you have the right tools installed :-

apt-get install rng-tools

Once installed you can test the current set of random numbers :-

dd if=/dev/random bs=1k count=32768 iflag=fullblock| rngtest

This produces a whole bunch of output, but the key bits of output are the “FIPS 140-2 failures” and “FIPS 140-2 successes”; if you have too many failures something is wrong. For the record my failure rate is 0.05% with haveged running (without: tests ongoing).

Links

… to more information.

Jun 302012
 

Warning: This page details a shell script that I’ve produced for my own amusement; it isn’t a product. It hasn’t been tested in lots of environments, and it will take some hacking to get it to work for you. If you’re looking for something to use, move along; if you’re looking for ideas to improve a real wallpaper setting program, you might want to read on.

So elsewhere I’ve admitted to driving a stake through the heart of GNOME’s wallpaper plugin to allow my own wallpaper script to work. Well, I could hardly do that and not announce it could I? So here goes :-

  1. It doesn’t actually set the wallpaper; it lets hsetroot do that.
  2. It requires a parameter to determine which directory to choose – i.e. ~/lib/backgrounds/one~/lib/backgrounds/two, etc.
  3. It uses xrandr to pick out the “regions” of the default screen.
  4. It puts portrait images on my portrait monitor, and landscape images on my landscape monitor by overlaying them onto an overall image the size of both monitors added together.
  5. It waits a set duration, and then repeats.

If you’re still interested in getting a copy it’s available at http://zonky.org/src/set-random-background.

Jun 282012
 

If for some peculiar reason (I’ll come to those later) you want to prevent GNOME from setting the desktop wallpaper, you used to have a relatively easy option. If you search for how to disable the wallpaper setting in GNOME, you will find frequent mentions of the method. Unfortunately it no longer seems to work.

It seems that the GNOME developers in their infinite wisdom have seen fit to ignore any previous setting that allowed you to override GNOME and say “I’ll set the background myself”, and quite possibly no longer have that option available. Well, where there’s a will there’s a way :-

$ sudo zsh
# cd /usr/lib/gnome-settings-daemon-3.0
mv background.gnome-settings-plugin _background.gnome-settings-plugin
mv libbackground.so _libbackground.so
pkill gnome-settings-daemon
gnome-settings-daemon

At this point your terminal will be taken over by the gnome-settings-daemon and it will scroll tons of messages past your nose. If you scroll up, you will see close to the top a mention of being unable to load the background setting plugin. At which point you can use your favourite background setting tool (a word on that later) to set the background.

This is a rather brutal method of disabling this, and is prone to failure when the relevant software packages are upgraded – your favourite package manager is likely to replace the “missing” files for you. So if you’re listening, GNOME developers, please resurrect a sensible method for turning this plugin off!

BTW: You may want to check your favourite background setting tool actually works properly in your environment; I’ve found that in my environment both Imagemagick and xloadimage silently failed, but feh and hsetroot worked fine. This had me puzzled for a moment when I tried the first two!

As to why I want to disable the GNOME wallpaper plugin, there are several reasons :-

  1. I’m difficult and want to do it my own way.
  2. The GNOME background setting plugin has some limitations that are irritating to me.
  3. And I have some rather specialist requirements … stay tuned for more information.
Jan 212012
 

One of the things I miss from Solaris are the Solaris Containers – zones – which are extremely useful for isolating lightweight services in their own “virtual machines”. This blog entry is about LXC – Linux Containers. There are of course other methods of accomplishing much the same thing with Linux, but LXC has the advantage that the necessary kernel extensions are included by default.

Or in other words it isn’t necessary to compile a custom kernel. Which has advantages in certain environments.

There are of course some disadvantages too – LXC isn’t quite as mature as some of the alternatives, and there’s a few missing features. But it works well enough.

What?

Operating system level virtualisation, or what I prefer to call lightweight virtualisation is a method by which you can run multiple virtual servers on a single physical (or virtual!) machine. Like normal virtualisation supplied by products such as ESX, Hyper-V, etc., light-weight virtualisation allows you to run multiple servers on a single instance of server hardware (actually you can do this on a virtual server too!).

Operating system level virtualisation is not quite the same as full virtualisation where you get a complete virtual machine with a VGA display, a keyboard, mouse, etc. Instead you trick the first program that starts on a normal Unix (or Linux) system – /sbin/init – into believing that it is running on a machine by itself when it is in fact running inside a specially created environment. That is if you want a full virtual operating system; it is also possible to setup an environment so that a container simply starts a single application.

In some ways this is similar to the ancient chroot mechanism which was often recommended for securely installing applications such as BIND which were prone to attack. But it has been improved with greater isolation from the operating system running on the hardware itself.

Note that I said /sbin/init – these containers do not run their own kernel. Merely their own user processes.

Why?

So why are these containers useful ? Especially in a world where virtualisation is ubiquitous and can even be free (VirtualBox, and various KVM solutions). Well of course they are, or they wouldn’t exist – the equivalent of BSD Jails has been introduced for every single remaining Unix-based system as well as Linux.

First of all, containers provide a perfectly viable virtualisation mechanism if all you require are numbers of Linux machines. Indeed it is possible to use this kind of virtualisation on already virtualised machines – for example on a Cloud-based virtual server (as you might get from Amazon) which could potentially save you money.

Secondly, containers provide lightweight virtualisation in that there is little to no overhead in using them. There is no virtualised CPU, no I/O virtualisation, etc. In most “heavyweight” virtualisation solutions the overhead of virtualisation is negligible except for I/O where there is often a considerable performance hit for disk-based applications.

Next if carefully setup, it is possible to reduce the incremental cost of each server installation if you use containers rather than full virtual machines. Every additional server you run has a cost associated with maintaining it in terms of money and time; a variety of different mechanisms can reduce this incremental cost, but there is still a cost there. Containers can be another means of reducing this incremental cost by making it easier to manage the individual servers.

As an example, it is possible to update which DNS servers each container uses by simply copying the /etc/resolv.conf file to each container :-

for container in $(lxc-ls | sort | uniq)
do
  cp /etc/resolv.conf /srv/lxc/${container}/rootfs/etc/resolv.conf
done

It’s also very handy for testing – create a container on a test server to mess around with some component or other to find out how it should be installed, and then throw away the container. This avoids “corrupting” a test server with the results of repeated experiments over time.

Of course there are also reasons why you should not use them.

First of all, it is another virtualisation technology to learn. Not a difficult one, but there is still more to learn.

In addition, it does not provide complete isolation – if you need to reboot the physical server, you will have to reboot all of the containers. So it is probably not a good technology for multiple servers that need to stay up forever – although the only real way of arranging that is to use a load balancer in front of multiple servers (even clustering sometimes requires an outage).

There is also the fact that this is not entirely a mature technology. That is not to say it isn’t stable, but more that the tools are not quite polished as yet

Finally there are hints that containers do not provide complete isolation – someone with root access on a container might be able to escape from the container. Thus it is probably not a good solution to provide isolation for security reasons.

How ?

The following instructions assume the use of Debian, although most modern distributions should be perfectly fine too – I’ve also done this with SLES, and seen instructions for ArchLinux. You can also mix and match distributions – SLES as the master operating system, and Debian for the distribution in the containers. That works perfectly fine.

To see if your currently running kernel supports the relevant extensions, see if the cgroups filesystem is available :-

# grep cgroup /proc/filesystems
nodev	cgroup

If the grep command doesn’t return the output as shown, you will need to upgrade and/or build your own kernel. Which is a step beyond where I’m going.

Initial Setup

Before installing your first container, you need to setup your server to support LXC. This is all pretty simple – the most complicated part is to setup your server to use a bridged adapter as it’s network interface. Which we will tackle first.

To start with, install the bridge-utils package :-

# apt-get install bridge-utils

Next edit your /etc/interfaces file. Leave the section dealing with the loopback interface (lo) alone and comment out anything relating to eth0. Once that’s done, add something to setup the bridge interface (br0) with :-

auto br0
iface br0 inet dhcp
   bridge_ports eth0
   bridge_fd 0

That sets up a bridged interface configured with an address supplied by a DHCP server – not perhaps the best idea for a production server, but perfectly fine for a test server. At least if you have a DHCP server on the network! If you need to configure a static address, copy the relevant parts from your commented out section – the address, netmask, and gateway keywords can be added below bridge_fd.

At this point it is worth rebooting the server to check that the new network interface is working fine. You could perform all of the steps in this initial setup section and do a reboot just at the end, but I prefer to reboot after each step when I’m trying something new. It’s easier to find the horrible mistake when you do it step by step.

Assuming the reboot fine, the next step is to automatically mount the cgroup filesystem. Add the following to the end of the /etc/fstab file :-

cgroup        /cgroup        cgroup        defaults    0    0

Make the /cgroup directory, and try to mount the filesystem :-

mkdir /cgroup
mount /cgroup

At this point you should be able to see a whole bunch of files in the /cgroup directory, but it won’t show up if you try a df. You should also reboot at this point to make sure that the filesystem automatically mounts when the system boots.

The final stage is to install the LXC runtime package :-

apt-get install lxc debootstrap

Note that the LXC package is available for many distributions, and the debootstrap package is a shell script runnable under most distributions given the presence of the right set of tools.

Now we are ready to start creating containers. Well, almost.

Creating Containers

When creating containers, it is usual to use a helper script to do the donkey work of setting this up. The LXC runtime package includes a number of such helper scripts. These scripts are very useful, but it is also worth indicating that they may require some local customisation – for instance the template script I used sets the root password to root; whilst it does suggest that this should be changed when the container is built, it is also very sensible to change this initial password to something at least half-reasonable.

And the longer and more extensively you use containers, the more local customisations you are likely to want. For instance, I tend to use a bind to ensure that the /site filesystem is mounted under each container, so I can be sure that my local scripts and configuration files are available on each and every container easily. So :-

  cd /usr/lib/lxc/templates
  cp lxc-debian lxc-local

When not using Debian, it is possible that the template scripts are installed in the main $PATH. In which case you may choose to remove them, or move them somewhere else to avoid their use in preference to your own versions.

It is also worth creating a template configuration file for the lxc-create script :-

cat > /etc/lxc/lxc.conf
lxc.network.type=veth
lxc.network.link=br0
lxc.network.flags=up
^D

To create your first container :-

  mkdir -p /var/lib/lxc/first
  lvcreate --name root-first --size=768M /dev/debian
  mkfs -t ext3 /dev/debian/root-first
  {Edit /etc/fstab to mount the new filesystem at /var/lib/lxc/first}
  mount /var/lib/lxc/root-first
  lxc-create -n first -t local -f /etc/lxc/lxc.conf

This goes through the process of “bootstrapping” Debian into a directory on your current system and setting up a configuration for your LXC container. Once complete, you are ready to start but first you will notice that I created a filesystem for the container’s root filesystem which should be self-evidently necessary if you want to avoid the possibility of a badly behaved container from bringing down other containers.

There are of course other things you can do at this stage before starting the container for the first time – for instance editing /srv/lxc/first/rootfs/etc/network/interfaces may be necessary to enter a static address which is particularly useful

Starting The Container

Once you have created a container, you will probably want to start it :-

lxc-start --daemon --name=first

You can start a container without the “daemon” option, but this means you are immediately connected to the console and it can be difficult to escape from. To connect to the system’s console try lsx-console, which should result in something like :-

# lxc-console --name=first

Type  to exit the console
--hit return here--
Debian GNU/Linux 5.0 first tty1

first login:

At this point you can login as root, and “fix” your container as you would do with an ordinary server. Install the software you need, make any changes you want, etc.

But there is one most noticeable oddity – rebooting a container does not seem to work properly. It seems to get stuck at the point where a normal server would reset the machine to boot again. Undoubtedly this will be fixed at some point … it’s possible that the fix is relatively simple anyway.

But for now the sequence :-

(in the container) shutdown -h now
(in the server) lxc-stop --name ${name-of-container}
(in the server) lxc-start --name ${name-of-container}

Will have to do.