Sep 252013
 

Tonight I caught a bit of a TV programme about the fashion choices of a celebrity (Kate Winslet); not my normal kind of TV which is why I very quickly turned over to something more interesting (to me) like the test card!

But before I did, I was treated to some self-important fashion gurus flaming some of the fashion choices of a younger version of the celebrity in question. In particular a 20-year old celebrity.

My initial reaction was: Of course a 20-year old celebrity makes some fashion mistakes. At that age we all make stupid choices; in fact without those stupid choices we don’t learn what is sensible and what is not.

But then I thought: Actually they weren’t mistakes at all. Young people should be experimenting, and sometimes experiments don’t work out. But they are not mistakes.

If we discourage young people from experimenting – especially with something as harmless as fashion experiments – we risk ruining what makes young people young. Not their age, but their sense of adventure and willingness to experiment.

Sep 252013
 

If you suspect a networking problem, how do you go about diagnosing that problem?

As in all problem solving, the process involves gathering information and performing tests. To adequately perform some of the tests, you need to prepare in advance – by obtaining copies of tools, creating a USB stick with the tools on, finding out how to use the tools, etc. You cannot expect to be able to perform anything useful without investing in that preparation time.

As an alternative to preparing a USB stick full of tools, it may be preferable to prepare a netbook with the tools on – at the very least swapping a network connection to a known working netbook will tell you whether the problem is in the computer or in the network!

Get The MAC Man!

The MAC address of the network connection is probably the single most important bit of information to get your hands on. Because it is the key for obtaining lots of other information – whether dhcp requests are being seen, whether the Ethernet switch can see that MAC address on any of it’s ports … or the expected port, etc. If you report a network issue without the MAC address of the machine in question, someone will bang their head on the desk. If you are locked out of the machine because the network “isn’t working”, and so are unable to run the usual tools to get at the MAC address, report that as a fault.

Obtaining the MAC address varies according to the operating system you want to get it from, and the method you choose to use to get it. I have chosen to document a command-line method; if this makes you unhappy, please feel free to document the graphical way, and I’ll add a link to it. In some cases, you will be choosing which MAC address is relevant to the active network card. If in any doubt, get all the MAC addresses, and suggest which one you think is the active network card; if it turns out you have guessed wrong, at least the right one will be in the list somewhere!

Windows

Start a command line, and run ipconfig :-

C:\Users\msm>ipconfig/all

Windows IP Configuration

   Host Name . . . . . . . . . . . . : w7
   Primary Dns Suffix  . . . . . . . :
   Node Type . . . . . . . . . . . . : Peer-Peer
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No
   DNS Suffix Search List. . . . . . : single-names.port.ac.uk
                                       iso.port.ac.uk
                                       eps.is.port.ac.uk

Ethernet adapter Local Area Connection:

   Connection-specific DNS Suffix  . : inside.zonky.org
   Description . . . . . . . . . . . : Intel(R) PRO/1000 MT Desktop Adapter
   Physical Address. . . . . . . . . : 08-00-27-84-0A-B4
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   IPv4 Address. . . . . . . . . . . : 10.0.2.15(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Lease Obtained. . . . . . . . . . : 15 September 2013 12:02:40
   Lease Expires . . . . . . . . . . : 16 September 2013 12:02:40
   Default Gateway . . . . . . . . . : 10.0.2.2
   DHCP Server . . . . . . . . . . . : 10.0.2.2
   DNS Servers . . . . . . . . . . . : 10.0.0.26
   NetBIOS over Tcpip. . . . . . . . : Enabled

Tunnel adapter Local Area Connection* 9:

   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Teredo Tunneling Pseudo-Interface
   Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes
   IPv6 Address. . . . . . . . . . . : 2001:0:9d38:953c:2c67:1675:f5ff:fdf0(Pre
erred)
   Link-local IPv6 Address . . . . . : fe80::2c67:1675:f5ff:fdf0%11(Preferred)
   Default Gateway . . . . . . . . . : ::
   NetBIOS over Tcpip. . . . . . . . : Disabled

Tunnel adapter isatap.inside.zonky.org:

   Media State . . . . . . . . . . . : Media disconnected
   Connection-specific DNS Suffix  . : inside.zonky.org
   Description . . . . . . . . . . . : Microsoft ISATAP Adapter #2
   Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes

Windows is being “helpful” here and listing all of the network adapters it knows of. Including the ones that are not plugged in. To find the address we want, we look for the “Ethernet adapter Local Area Connection” section, and within that look for the “Physical Address” which is given here as 08-00-27-84-0A-B4

Linux and OSX

Linux and OSX are pretty similar at this level – with the exception that linux calls Ethernet devices ethN (usually), and OSX calls ’em enN, the command and output is pretty much the same.

Again, start a command-line interface and run the command ifconfig :-

% ifconfig
eth0      Link encap:Ethernet  HWaddr 60:a4:4c:62:84:71  
          inet addr:10.0.0.28  Bcast:10.0.255.255  Mask:255.255.0.0
          inet6 addr: fe80::62a4:4cff:fe62:8471/64 Scope:Link
          inet6 addr: 2001:8b0:ca2c:dead::babe/64 Scope:Global
          UP BROADCAST RUNNING MULTICAST  MTU:1492  Metric:1
          RX packets:170663945 errors:0 dropped:0 overruns:0 frame:0
          TX packets:183200664 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:122771869945 (114.3 GiB)  TX bytes:170314898179 (158.6 GiB)
          Interrupt:73 Base address:0x2000 

ib0       Link encap:UNSPEC  HWaddr 80-00-00-48-FE-80-00-00-00-00-00-00-00-00-00-00  
          inet addr:10.255.0.1  Bcast:10.255.0.255  Mask:255.255.255.0
          inet6 addr: fe80::21a:4bff:ff0c:e1c5/64 Scope:Link
          inet6 addr: 2001:8b0:ca2c:d00d::1/64 Scope:Global
          UP BROADCAST RUNNING MULTICAST  MTU:4096  Metric:1
          RX packets:6037892 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12155324 errors:0 dropped:3079 overruns:0 carrier:0
          collisions:0 txqueuelen:256 
          RX bytes:314594872 (300.0 MiB)  TX bytes:21890697854 (20.3 GiB)

ib1       Link encap:UNSPEC  HWaddr 80-00-00-49-FE-80-00-00-00-00-00-00-00-00-00-00  
          inet addr:10.255.1.1  Bcast:10.255.1.255  Mask:255.255.255.0
          inet6 addr: fe80::21a:4bff:ff0c:e1c6/64 Scope:Link
          inet6 addr: 2001:8b0:ca2c:d00f::1/64 Scope:Global
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:4466937 errors:0 dropped:0 overruns:0 frame:0
          TX packets:429108 errors:0 dropped:47 overruns:0 carrier:0
          collisions:0 txqueuelen:256 
          RX bytes:232871358 (222.0 MiB)  TX bytes:17179018366 (15.9 GiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:65309 errors:0 dropped:0 overruns:0 frame:0
          TX packets:65309 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:4625183 (4.4 MiB)  TX bytes:4625183 (4.4 MiB)

This is an unusually complex configuration, but the MAC address can be picked out relatively easily. Just look for the ethN (here it’s “eth0”) and pick out the “HWaddr” which is 60:a4:4c:62:84:71 in this example.

Network Sniffing with Wireshark

Wireshark is a premium graded packet sniffer and packet analysis tool; it’s a tool that really justifies a complete book. But you can get quite a bit done with far less knowledge.

The absolute basic is how to capture packets. This should be fairly easy to accomplish from the graphical interface – it’s pretty much a case of picking a network interface to capture from, and clicking “Start”. Once you have captured 30 seconds or so of traffic, click the red cross, and save the result. All done.


Warning: Contains an enthusiastic American! 

Detailing exactly what you might see in a packet capture is definitely beyond the scope of this blog entry, but there are basically three different kinds of packets you should see :-

  1. Packets sent by your machine. Until you get to more advanced levels, these contain very little in the way of useful information.
  2. Packets sent to your machine. That is they are addressed specifically with your machine in mind. These are also to be ignored at this level.
  3. Finally packets sent out in broadcast mode – to every machine on the network.

The final category can tell you on which network you are … if you are connected to some kind of “special” private network, it is to be expected that an ordinary PC (or Mac) won’t work properly. If you look at enough examples of packet captures, it should eventually become evident what packets are broadcast ones, and what the contents of those packets mean :-

# tshark -i eth0.24 arp 
tshark: Lua: Error during loading:
 [string "/usr/share/wireshark/init.lua"]:45: dofile has been disabled
Running as user "root" and group "root". This could be dangerous.
Capturing on eth0.24
  0.000000 84:78:ac:19:64:41 -> Broadcast    ARP 60 Who has 148.197.24.2?  Tell 148.197.24.252

The packet in question is on the last line. It’s an ARP packet where a machine is asking if anyone knows the Ethernet address of 148.197.24.2 … which is a pretty good indication you are connected to that network.

A Better Ping

The standard ping tool is very useful, but it has a couple of one big disadvantages :-

  1. Machines with an aggressive firewall may not permit ICMP (i.e. ping) packets through. In which case they do not respond to standard pings.
  2. Because ping uses ICMP packets, it is subject to the lower priority that ICMP packets have … in the event of an overloaded network, routers and switches will prefer to drop ICMP to keep TCP and UDP packets flowing. This can result in a false impression of the network reliability.

Because of this, there have been a variety of different tools that accomplish the same sort of thing as ping by using either TCP or UDP (or even ICMP) packets. The latest and greatest of these tools is nping which is part of the nmap series of tools, and is available for just about every platform (including Windows). The default for nping is to send just 5 packets :-

# nping --tcp -p 22 10.0.0.28

Starting Nping 0.6.25 ( http://nmap.org/nping ) at 2013-09-23 20:56 BST
SENT (0.0058s) TCP 10.0.0.26:18384 > 10.0.0.28:22 S ttl=64 id=46091 iplen=40  seq=3907311816 win=1480 
RCVD (0.0062s) TCP 10.0.0.28:22 > 10.0.0.26:18384 SA ttl=64 id=0 iplen=44  seq=4059830626 win=14520 
SENT (1.0060s) TCP 10.0.0.26:18384 > 10.0.0.28:22 S ttl=64 id=46091 iplen=40  seq=3907311816 win=1480 
RCVD (1.0066s) TCP 10.0.0.28:22 > 10.0.0.26:18384 SA ttl=64 id=0 iplen=44  seq=4075461628 win=14520 
SENT (2.0070s) TCP 10.0.0.26:18384 > 10.0.0.28:22 S ttl=64 id=46091 iplen=40  seq=3907311816 win=1480 
RCVD (2.0075s) TCP 10.0.0.28:22 > 10.0.0.26:18384 SA ttl=64 id=0 iplen=44  seq=4091100198 win=14520 
SENT (3.0080s) TCP 10.0.0.26:18384 > 10.0.0.28:22 S ttl=64 id=46091 iplen=40  seq=3907311816 win=1480 
RCVD (3.0084s) TCP 10.0.0.28:22 > 10.0.0.26:18384 SA ttl=64 id=0 iplen=44  seq=4106740813 win=14520 
SENT (4.0090s) TCP 10.0.0.26:18384 > 10.0.0.28:22 S ttl=64 id=46091 iplen=40  seq=3907311816 win=1480 
RCVD (4.0094s) TCP 10.0.0.28:22 > 10.0.0.26:18384 SA ttl=64 id=0 iplen=44  seq=4122381613 win=14520 

Max rtt: 0.451ms | Min rtt: 0.259ms | Avg rtt: 0.342ms
Raw packets sent: 5 (200B) | Rcvd: 5 (230B) | Lost: 0 (0.00%)
Tx time: 4.00449s | Tx bytes/s: 49.94 | Tx pkts/s: 1.25
Rx time: 4.00476s | Rx bytes/s: 57.43 | Rx pkts/s: 1.25
Nping done: 1 IP address pinged in 4.01 seconds

The key information is displayed at the end … specifically the Max rtt (round trip time) which tells you how long it took for the slowest “conversation” to take place, and the “Lost” count of the number of packets lost. There are zillions of options to nping, but some of the most important include :-

Option Description
–tcp Use TCP probe mode, which is probably the preferred mode for testing
-p N Specify the destination port to probe. This can be either open (i.e. a service is running) or closed, but not firewalled.
–count N Tell nping how many packets to send. Increasing this can make the test much longer.
–delay Nms How long to wait between each packet. Always specify “ms” as a suffix to give you milliseconds. A delay of about 50ms is reasonable.

There’s a great deal more to nping than just this, but it’s a start.

How Fast? How Slow?

Does the network connection feel slow? Just how slow does it feel? Measure it

It is not uncommon to find that a network performance issue is actually a performance issue of some other kind. Measuring the network performance can tell you whether it really is the network, or something else. To do so, you need the right tool; measuring with the wrong tool can result in very inaccurate measurements.

Often people resort to using ftp to transfer large files back and forth, which works well enough in normal circumstances, but at higher network speeds you can find yourself measuring the speed of a slow hard disk rather than the network performance. So use the right tool – such as iperf which is available for all major platforms including Windows.

There is an additional tool available for Windows which offers a graphical interface, but I am describing the command-line interface. Partially because that’s the way I am, but partially because it is dead simple.

To run iperf you need to have the software installed on a client machine and a server machine. To run on a server, simply :-

$ iperf -p 32765 -s

Specifying the port number isn’t normally necessary, but I suggest choosing a random port around 32,000 to avoid conflicts. Just remember the port number you use! And on the client :-

% iperf -p 32765 -c polio
------------------------------------------------------------
Client connecting to polio, TCP port 5001
TCP window size: 23.4 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.28 port 36114 connected with 10.0.0.26 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   113 MBytes  95.0 Mbits/sec
% iperf -p 32765 -c 10.255.0.2
------------------------------------------------------------
Client connecting to 10.255.0.2, TCP port 5001
TCP window size: 28.8 KByte (default)
------------------------------------------------------------
[  3] local 10.255.0.1 port 43978 connected with 10.255.0.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  2.73 GBytes  2.35 Gbits/sec

I have admittedly cheated here by running two tests … to show what a normal 100Mbps ethernet speed should look like, and what something a bit quicker would look like. In the later case, I have used a slow InfiniBand connection that is only about 2.5 times quicker than 1000Mbps ethernet. Bear in mind that :-

  1. Ethernet signals work at 100Mbps or 1000Mbps (or faster for more esoteric Ethernet types), but you
    won’t get to that speed.

  2. You need to baseline a performance test to find out how quick normal speeds look like!
Aug 202013
 

Every so often I come across an old Linux box that doesn’t take kindly to being rebooted. Without console access, it is hard to see what is going on, but the Linux kernel gets stuck trying to mount the root file system. There are many possible fixes for this, but they all have one thing in common … a work-around has to be performed to get the box up and running.

The console gets stuck in a “mini-root” environment loaded when the initrd image is loaded and before the real root file system is mounted which means a lot of commands are not available, but lvm should be available. First of all, run lvm lvscan to get a list of the logical volumes that need activating :-

(initramfs) lvm lvscan
  inactive          '/dev/sys/root' [332.00 MiB] inherit
  inactive          '/dev/sys/usr' [8.38 GiB] inherit
  inactive          '/dev/sys/var' [2.79 GiB] inherit
  inactive          '/dev/sys/swap_1' [7.05 GiB] inherit
  inactive          '/dev/sys/tmp' [380.00 MiB] inherit
  inactive          '/dev/sys/home' [16.00 GiB] inherit
  inactive          '/dev/sys/opt' [24.00 GiB] inherit

For each volume group (the second column, middle word), run: lvm lvchange -ay ${volume-group-name}. In the case of my example :-

(initramfs) lvm vgchange -ay /dev/sys
  7 logical volume(s) in volume group "sys" now active

At which point you should be able to press ^D (or enter exit) to continue the boot process.

A slighter better work-around involves changing the Grub configuration to add a delay to the kernel parameters. This sections assumes that you are not using Grub Legacy!

Start by editing /etc/default/grub and changing the variable GRUB_CMDLINE_LINUX to include “rootdelay=20” :-

GRUB_CMDLINE_LINUX='console=tty0 console=ttyS0,19200n8 rootdelay=20'

Finalise by running update-grub. This adds a 20s delay to the boot process so is hardly an ideal solution.

Aug 192013
 

No.

Anyone who thinks so needs to read a bit of history on what life was like in real police states.

But on a day when news of an incident where a journalist was detained for 9 hours and his electronic media confiscated, we do have to ask ourselves whether we are headed in that direction. And whether we really want to go in that direction.

David Miranda was held under anti-terrorist legislation – specifically schedule 7 – in what was clearly an attempt at harassment for publishing stories embarrassing the UK and US governments. Now the victim here is clearly a journalist, and whilst it is possible for a journalist to be involved in terrorism, I really rather doubt this one has time to be particularly active at this time. This is a high profile case, but how many of the 61,145 other suspects detained under schedule 7 last year were detained for non-terrorism purposes?

Anti-terrorism legislation is very powerful, and whilst it may be justified to tackle terrorism, it certainly must not be used for other purposes. And in this case it was.

And undoubtedly we will have some sort of review of the case, a lot of noise, and very little action. It’s almost certain that the police who detained David Miranda will escape scot free, or with a notional slap on the wrist, and not with a prison sentence that they deserve.

Jul 292013
 

… which is of course massive overkill. But fun. It should increase the raw bandwidth available between the two machines from 1Gbps to 20Gbps (with one link) and 40Gbps with both links bonded. It was a bit of a surprise to me when I looked around at prices of second-hand kit to realise that InfiniBand was so much cheaper to acquire than Fibre Channel; the kit I acquired cost less than £100 all in whereas FC kit would be in the region of £1,000, and InfiniBand is generally quicker. There is of course 16Gb FC and 10Gb InfiniBand, but that is hardly comparing like with like. So what is this overkill for? Networking of course. I’ve acquired two HP InfiniBand dual link cards which means I can connect my workstation to my server :- InfiniBand Network Using dual links is of course overkill on top of overkill, but given that these cards have dual links, why not use them? And it does give a couple of experiments to try later. To prepare in advance, the following network addresses will be used :-

Server Link Number IPv4 Address IPv6 Address
A 1 10.255.0.1 AAISP:d00d::1
A 2 10.255.1.1 AAISP:d00f::1
B 1 10.255.0.254 AAISP:d00d:2
B 1 10.255.1.254 AAISP:d00f:2

Yes I have cheated for the IPv6 addresses! The first step is to configure each “server” … one is running Debian Linux, and the other is running FreeBSD.

Configuring Linux

This was subject to much delay whilst I believed that I had a problem with the InfiniBand card, but putting the card into a new desktop machine caused it to spring back to life. Either some sort of incompatibility with my old desktop (which was quite old), or some sort of problem with the BIOS settings.

Inserting the card should load the core module (mlx4_core) automatically, and spit out messages similar to the following :-

[    3.678189] mlx4_core 0000:07:00.0: irq 108 for MSI/MSI-X
[    3.678195] mlx4_core 0000:07:00.0: irq 109 for MSI/MSI-X
[    3.678199] mlx4_core 0000:07:00.0: irq 110 for MSI/MSI-X
[    3.678204] mlx4_core 0000:07:00.0: irq 111 for MSI/MSI-X
[    3.678208] mlx4_core 0000:07:00.0: irq 112 for MSI/MSI-X
[    3.678212] mlx4_core 0000:07:00.0: irq 113 for MSI/MSI-X
[    3.678216] mlx4_core 0000:07:00.0: irq 114 for MSI/MSI-X
[    3.678220] mlx4_core 0000:07:00.0: irq 115 for MSI/MSI-X
[    3.678223] mlx4_core 0000:07:00.0: irq 116 for MSI/MSI-X
[    3.678228] mlx4_core 0000:07:00.0: irq 117 for MSI/MSI-X
[    3.678232] mlx4_core 0000:07:00.0: irq 118 for MSI/MSI-X
[    3.678236] mlx4_core 0000:07:00.0: irq 119 for MSI/MSI-X
[    3.678239] mlx4_core 0000:07:00.0: irq 120 for MSI/MSI-X
[    3.678243] mlx4_core 0000:07:00.0: irq 121 for MSI/MSI-X
[    3.678247] mlx4_core 0000:07:00.0: irq 122 for MSI/MSI-X
[    3.678250] mlx4_core 0000:07:00.0: irq 123 for MSI/MSI-X
[    3.678254] mlx4_core 0000:07:00.0: irq 124 for MSI/MSI-X
[    3.678259] mlx4_core 0000:07:00.0: irq 125 for MSI/MSI-X
[    3.678263] mlx4_core 0000:07:00.0: irq 126 for MSI/MSI-X
[    3.678267] mlx4_core 0000:07:00.0: irq 127 for MSI/MSI-X
[    3.678271] mlx4_core 0000:07:00.0: irq 128 for MSI/MSI-X
[    3.678275] mlx4_core 0000:07:00.0: irq 129 for MSI/MSI-X

This is just the core driver; at this point additional modules are needed to do anything useful. You can manually load the modules with modprobe but sooner or later it is better to make sure they’re loaded automatically by adding their names to /etc/modules. The modules you want to load are :-

  1. mlx4_ib
  2. ib_umad
  3. ib_uverbs
  4. ib_ipoib

This is a minimal set necessary for networking (“IP”) rather than additional features such as SCSI. It’s generally better to start with a minimal set of features initially. At this point, it is generally a good idea to reboot to verify that things are getting closer. After a reboot, you should have one or more new network interfaces listed by ifconfig :-

ib0       Link encap:UNSPEC  HWaddr 80-00-00-48-FE-80-00-00-00-00-00-00-00-00-00-00  
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:256 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

ib1       Link encap:UNSPEC  HWaddr 80-00-00-49-FE-80-00-00-00-00-00-00-00-00-00-00  
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:256 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

Despite the appearance, we still have quite a way to go yet. The next step is to install some additional packages: ibutilsinfiniband-diags, and opensm. The last package is for a subnet manager which is unnecessary if you have an InfiniBand switch (but I don’t). The first step is to get opensm up and running. Edit /etc/default/opensm and change the PORTS variable to “ALL” (unless you want to restrict the managed ports, and make things more complicated). And start opensm: /etc/init.d/opensm start; update-rc.d opensm defaults.

At this point, you can configure the network addresses by editing /etc/network/interfaces. If you need help doing this, then you’re in the tech pool beyond your depth! Without something at the other end, these interfaces won’t work (obviously), so it’s time to start work on the other end …

Configuring FreeBSD

See: https://wiki.freebsd.org/InfiniBand I hadn’t had cause to build a custom kernel before, so the very first task was to use subversion to checkout a copy of the FreeBSD source code :-

svn co svn://svn0.us-east.FreeBSD.org/base/stable/9 /usr/src

Updating will of course require just: cd /usr/src && svn update. Once installed, create a symlink from /sys to /usr/src/sys if the link does not already exist: ln -s /usr/src/sys /sys

Go to the kernel configuration directory (/usr/src/sys/amd64/conf), copy the GENERIC configuration file to a new file, and edit the new file to add in certain options :-

# Infiniband stuff (locally added)
options         OFED
options         IPOIB_CM
device          ipoib
device          mlx4ib

Again, this is a minimal set that will not offer full functionality … but should be enough to get IP networking up and running. The next step is to build and install the kernel :-

make buildkernel KERNCONF=${NAME-OF-YOUR-CONFIG}; make installkernel KERNCONF=${NAME-OF-YOUR-CONFIG}

The next step is to build the “world”  :-

  1. Edit /etc/src.conf and add “WITH_OFED=’yes'” to that file.
  2. Change to /usr/src and run: make buildworld
  3. Finalise with make installworld

As it happens I had to build the user-land first, as the kernel compilation needed a new user-land feature.

After a reboot, the new network interface(s) should show up as ib0 upwards. And these can be configured with an address in exactly the same as any other network interface.

Testing The Network

A tip for making sure the interfaces you think are connected together is to configure one of the machines, send a broadcast ping to the relevant network address of each interface in turn, and run tcpdump on the other machine to verify that the packets coming down the wire match what you expect.

Below the level of IP, it is possible to run an InfiniBand ping to verify connectivity. First you need a GUID on “the server”, which can be obtained by running ibstat and looking for the “Port GUID”, which will be something like “0x0002c90200273985”. Next run ibping -S on the server.

Now on the other machine (“the client”), run ibping :-

# ibping -G 0x0002c90200273985
Pong from polio.inside.zonky.org (Lid 3): time 0.242 ms
Pong from polio.inside.zonky.org (Lid 3): time 0.153 ms
Pong from polio.inside.zonky.org (Lid 3): time 0.160 ms

The next step is to run an IP ping to one of the hosts. If that works, it is time to start looking at something that will do a reasonable attempt at a speed test.

This can be done in a variety of different ways, but I chose to use nttcp which is widely available. On one of the hosts, run nttcp -i to act as the “partner” (or server). On the sending server, run nntcp -T ${ip-address-to-test} which will give output something like :-

# nttcp -T 10.0.0.26
     Bytes  Real s   CPU s Real-MBit/s  CPU-MBit/s   Calls  Real-C/s   CPU-C/s
l  8388608    0.70    0.01     95.7975   5592.4053    2048   2923.51  170666.7
1  8388608    0.71    0.04     94.0667   1878.6950    5444   7630.87  152403.3

According to the documentation, the second line should begin with ‘r’, but for a simple speed test we can simply average the numbers in the “Real-MBit/s” to get an approximate speed. Oddly my gigabit ethernet seems to have mysteriously degraded to 100Mbps! At least it makes the InfiniBand speed slightly more impressive :-

# nttcp -T 10.255.0.2
     Bytes  Real s   CPU s Real-MBit/s  CPU-MBit/s   Calls  Real-C/s   CPU-C/s
l  8388608    0.03    0.00   2521.9415  16777.2160    2048  76963.55  512000.0
1  8388608    0.03    0.03   2206.6574   2568.6620    4032 132579.25  154329.0

Before getting into a panic over what appears to be a pretty poor result, it is worth bearing in mind that IP over InfiniBand isn’t especially efficient, and InfiniBand seems to suffer from marketing exaggeration. From what I understand, DDR’s 20Gbps signalling rate becomes 16Gbps, which in turn becomes 8.5Gbps when looking at the output of ibstatus (not ibstat) – why the halving here is a bit of a mystery, but that may become apparent later.

There has also been a hint that FreeBSD is due for a significant improvement in InfiniBand performance sometime after the release of 9.2.

As a late addition, it would appear that running OpenSM (the subnet manager) on both hosts means that when one or other is rebooting, the other can take over the duties of the subnet manager. To enable on FreeBSD, simply add opensm_enable=”YES” to the file /etc/rc.conf and reboot.