Feb 092023
 

Sometimes, computers seem to suck at maths :-

>>> 1.2 - 1.0
0.19999999999999996

To be fair, that’s a raw Python interface; an application intended for use as a calculator works a bit better :-

» qalc 
> 1.2 - 1.0

  1.2 − 1 = 0.2

The problem is related to low-level numeric types. Computers store numbers in a variety of different formats (called types by developers). Whole numbers (integers) are easy – just allocate a certain number of 8-bit bytes (more means you can store bigger numbers; but it takes more memory for each number) and you would have something that would store whole numbers with perfect accuracy.

Floating point (i.e. numbers with a ‘decimal’ point) on the other hand are much more a compromise between size and accuracy. Floating point in effect uses scientific notation for numbers – 1.23E23. So a number is split into two – the mantissa (effectively the bit before the “E”) and the exponent. Storing two numbers in 32-bits (single precision) limits the precision in which numbers are stored but is usually sufficient and allows a far larger range of numbers :-

>>> print("{:10.4f}".format(1.2 - 1.0)) 
    0.2000

In other words if you are using a low-level interface as a calculator, you can produce sensible output merely by writing your code properly. Or use a proper calculator program like qalc(ulator).

This is of course an over simplification and the Wikipedia article on single precision floating point goes into far more detail than I want to understand. Amongst other things I’ve glossed over is the problem of performing calculations in base 2 (binary) rather than base 10 (decimal).

Plus there are a whole bunch of other numeric types such as larger floating point types, decimal floating point, bignums (which use whatever memory is necessary to store a number), fixed point, etc.

Computers aren’t bad at maths; it is just you can trick them into making themselves look bad.

Untitled Seascape
Nov 152022
 

This is intended as a basic guide to how IP networking works at the basic level; traditionally such guides are for “dummies” but a lack of knowledge doesn’t make someone dumb but ignorant and being ignorant of one small esoteric part of computing is no crime. On the other hand, fixing that ignorance can help solve certain networking issues or at the very least make those domestic router settings make some sort of sense.

What is IP?

Whether we are talking about IPv4 (192.0.2.98/24) or IPv6 (2001:db8:1:1::1/64), these are both at a superficial level both “Internet Protocols”. Networks require some kind of physical networking layer underneath the IP layer – most commonly Ethernet for wired connections and WiFi for wireless connections, but it can run on many other networks – FDDI (historical), Fiber Channel, or InfiniBand. IP works above that level.

What made IP “special” for the time was the word internet; we think of this today as a world-wide network (“The Internet”) but back in the 1970s, the internet part of IP was about connecting multiple networks together with a gateway. Whilst network gateways existed before IP (and indeed after), they translated one network protocol into another and not infrequently were application specific (i.e. perhaps only allowed email through) whereas IP was build from the ground up to allow network traffic to traverse multiple network gateways.

There’s a whole lot of detail we could get into about the “protocol” bit of IP, but an an early stage all we need to know is that the packet of data contains an IP header which amongst other things specifies the source address, destination address, and a hint of what’s inside.

IP Addresses

An IP address (whether a source address or a destination address) is either a 32-bit integer (for IPv4) or a 128-bit integer (for IPv6). Which is the technical way of saying they’re just numbers; although we’re used to seeing (and using) a standardised representation of that number. Some tools will convert the representation of the address or will use the actual number :-

» ping 3221226082   
PING 3221226082 (192.0.2.98) 56(84) bytes of data.

So the usual representation of an IPv4 address is a “dotted quad” – four numbers between 0-255 which when converted to binary and concatenated make up the real network address.

On the other hand, IPv6 is somewhat longer and uses groups of four hexadecimal digits separated with colons (“:”) as in:

2001:0db8:0001:0001:0000:0000:0000:0001

Although it is usual to compress that with two simple rules – firstly a number of sequences of all zeros can be replaced once with “::” :-

2001:0db8:0001:0001::0001

Secondly, leading zeros within each ‘group’ can be dropped :-

2001:db8:1:1::1

Even shortened, IPv6 addresses are somewhat more complex than IPv4 addresses, but that’s why we have the DNS; in fact those used to using IPv4 addresses without the DNS should bear in mind that IPv4 addresses are also too complex for normal people to get right (and are subject to typos).

Netmasks

Whether you are assigned a public IPv6 address (2001:db8:cafe/48) or pick a private address from RFC1918 (10/8) you have a “choice” – you can either use the entire network range for one huge network, or you can sub-divide it into smaller subnets. Most home users will go with the first option but for most organisations of a reasonable size, the later is not just preferred but essential.

Whether you further divide a network into subnets or not, your computer still needs to know whether to send packets directly to the destination or whether to route those packets via a gateway or a router. This is done with a netmask that defines the network part of the address (and the host part of the address). If the network address of the destination matches the network address of the source, then the packets can be sent directly. Otherwise they’re sent via the gateway.

AddressLocal?
Netmask255.255.255.0
Source address192.0.2.98Yes
Destination 1192.0.2.73Yes
Destination 2172.16.1.13No

IPv6 addresses work in the same way except the addresses are longer.

Although netmasks are historically given as dotted quads making them look a bit like IPv4 addresses, it is becoming increasingly common to use a more compact method which is less error prone. The netmask is instead specified as the number of bits that the netmask covers – 192.0.2.98/24 rather than 192.0.2.98/255.255.255.0. As for IPv6, the same applies although “/64” is very often assumed – the default size for an IPv6 network is very much more strongly encouraged than for an IPv4 network (although it isn’t compulsory).

The Gateway Or The Router

Gateway or router? Well both – from the perspective of an ordinary host it’s a gateway to other networks; from the perspective of the gateway itself, it is a router connected to multiple networks (domestically often just two) and forwards packets on behalf of other computers.

In essence there is very little difference between a router and an ordinary machine except that the ordinary machine isn’t configured to forward packets, and it is usually configured with just a default gateway (sometimes called a gateway of last resort). Well and the route for the network it is connected to.

Both contain a routing table (or more than one) in the operating system kernel which basically consists of a set of network addresses and destinations (where to forward the packets to). In the case of your usual domestic router that usually consists of a route to your home network, a route to the ISP’s network, and a default route pointing at the ISP’s router.

When a machine wants to send (or forward) a packet to a destination, it picks the closest match in the routing table, and uses that as a intermediate destination to forward the packet to. Your machine operates this way; as does the core Internet routers (although they have slightly larger routing tables).

Some routers (probably a minority) are rather more complex of course. If you have heard of routing software such as BGP, OSPF, or IS-IS, then you have heard of software that distributes routing information. The larger Internet uses BGP to distribute routing information to add to routing tables around the world.

The description of routing so far has been rather hierarchical – your computer forwards to a default gateway, and it in turn forwards to your ISP’s default gateway. Which is a bit unfortunate as Internet routing doesn’t really work this way – there are alternate routes so if one router goes “bang” traffic can still reach the destination.

Dover Castle Gateway
Jul 232022
 

I was following one of those Twitter threads posting their favourite command-line tools (specifically for infosec), and added my own entry – the incomparable tshark. Later it occurred to me that the best command-line tool isn’t really a tool at all as it is built into the shell – the pipe. Many of the command-line tools just wouldn’t be quite the same without it.

For those who aren’t familiar with the command-line, the pipe (“|”) takes the output of one command and feeds it as input to another command. And you can string such pipelines together to add to each other (which can lead to inefficiencies).

For example :-

» ls | wc -l
84

This takes the usual command for listing files and sends the output into the “word count” command to produce a count of the number of files in the current directory. To be more precise, it produces a count of the number of files that ls thinks is in the directory. You can get different results with different variations :-

» echo * | wc -w
89
» ls -a | wc -l
463

If you had a log file containing DHCP requests you could :-

» grep DHCPDISCOVER 2022.07.local0.info.log | head
2022-06-30T23:59:05+00:00 <local0.info> 2001:db8:bad:cafe::b/d-FCB dhcpd: DHCPDISCOVER from 4D:6D:4F:55:59:B4 (esp32-D04CCC) via 10.72.0.1
2022-07-01T01:30:04+00:00 <local0.info> 2001:db8:bad:cafe::b/d-FCB dhcpd: DHCPDISCOVER from 4D:6D:4F:55:59:B4 (esp32-D04CCC) via 10.72.0.1
2022-07-01T02:53:33+00:00 <local0.info> 2001:db8:bad:cafe::b/d-FCB dhcpd: DHCPDISCOVER from DF:69:AF:DC:79:3E via eth0
2022-07-01T02:53:33+00:00 <local0.info> 2001:db8:bad:cafe::b/d-FCB dhcpd: DHCPDISCOVER from DF:69:AF:DC:79:3E via 10.0.0.1
2022-07-01T02:53:39+00:00 <local0.info> 2001:db8:bad:cafe::b/d-FCB dhcpd: DHCPDISCOVER from a8:a6:48:92:9d:36 via eth0
2022-07-01T03:01:03+00:00 <local0.info> 2001:db8:bad:cafe::b/d-FCB dhcpd: DHCPDISCOVER from 4D:6D:4F:55:59:B4 (esp32-D04CCC) via 10.72.0.1
2022-07-01T04:32:02+00:00 <local0.info> 2001:db8:bad:cafe::b/d-FCB dhcpd: DHCPDISCOVER from 4D:6D:4F:55:59:B4 (esp32-D04CCC) via 10.72.0.1
2022-07-01T04:56:53+00:00 <local0.info> 2001:db8:bad:cafe::b/d-FCB dhcpd: DHCPDISCOVER from 91:06:27:15:EF:DC via 10.72.0.1
2022-07-01T06:03:01+00:00 <local0.info> 2001:db8:bad:cafe::b/d-FCB dhcpd: DHCPDISCOVER from 4D:6D:4F:55:59:B4 (esp32-D04CCC) via 10.72.0.1
2022-07-01T07:34:00+00:00 <local0.info> 2001:db8:bad:cafe::b/d-FCB dhcpd: DHCPDISCOVER from 4D:6D:4F:55:59:B4 (esp32-D04CCC) via 10.72.0.1

List out the first few DHCP DISCOVER requests (the astute may notice that I’ve done some obfuscating). We can then pick out a field using awk to list just the MAC addresses :-

» grep DHCPDISCOVER 2022.07.local0.info.log | awk '{print $7}' | head
4D:6D:4F:55:59:B4
4D:6D:4F:55:59:B4
DF:69:AF:DC:79:3E
DF:69:AF:DC:79:3E
a8:a6:48:92:9d:36
4D:6D:4F:55:59:B4
4D:6D:4F:55:59:B4
91:06:27:15:EF:DC
4D:6D:4F:55:59:B4
4D:6D:4F:55:59:B4 

We can then remove the “head” command and add a sort and uniq command to produce a full list of all MAC addresses that have performed a DHCP DISCOVER :-

» grep DHCPDISCOVER 2022.07.local0.info.log | awk '{print $7}' | sort | uniq -c
      4 DF:69:AF:DC:79:3E
      3 89:C1:67:B8:9D:6F
      6 F3:55:1E:06:D4:49
      4 F3:55:1E:06:D4:48
     12 4D:6D:4F:55:59:B3
     92 91:06:27:15:EF:DC
     46 85:2C:B4:B3:70:7E
    333 4D:6D:4F:55:59:B4
      2 40:5B:D8:FF:FA:29
     72 FD:D4:00:41:29:BE
      5 36:1E:07:2D:AD:76
     41 44:FD:6E:05:82:21
     81 CC:78:14:BB:E4:3D

We can sort the result into reverse numerical order :-

» grep DHCPDISCOVER 2022.07.local0.info.log | awk '{print $7}' | sort | uniq -c | sort -r -n
    333 4D:6D:4F:55:59:B4
     92 91:06:27:15:EF:DC
     81 CC:78:14:BB:E4:3D
     72 FD:D4:00:41:29:BE
     46 85:2C:B4:B3:70:7E
     41 44:FD:6E:05:82:21
     12 4D:6D:4F:55:59:B3
      6 F3:55:1E:06:D4:49
      5 36:1E:07:2D:AD:76
      4 F3:55:1E:06:D4:48
      4 DF:69:AF:DC:79:3E
      3 89:C1:67:B8:9D:6F
      2 40:5B:D8:FF:FA:29 

And if you have access to the relevant script, you can produce terminal graphics (just to keep innumerate managers happy) :-

» grep DHCPDISCOVER 2022.07.local0.info.log | awk '{print $7}' | sort | uniq -c | sort -r -n | awk '{print $2, $1}' | tbar --replace 1 --max 350
4D:6D:4F:55:59:B4 ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
91:06:27:15:EF:DC ■■■■■■■■■■■■■■■
CC:78:14:BB:E4:3D ■■■■■■■■■■■■■■
FD:D4:00:41:29:BE ■■■■■■■■■■■■
85:2C:B4:B3:70:7E ■■■■■■■
44:FD:6E:05:82:21 ■■■■■■■
4D:6D:4F:55:59:B3 ■■
F3:55:1E:06:D4:49 ■
36:1E:07:2D:AD:76 
F3:55:1E:06:D4:48 
DF:69:AF:DC:79:3E 
89:C1:67:B8:9D:6F 
40:5B:D8:FF:FA:29 

The pipe isn’t so much a tool itself as a mechanism to combine tools into producing interesting results.

It’s Round
Jul 132022
 

Not all shell aliases of course, but some. I’ve just seen a youtube video that suggested creating a shell alias to run rmtrash when rm is invoked :-

alias rm='rmtrash'

Seems sensible enough doesn’t it? This is in fact the classic example of how dangerous shell aliases can be, although the classic example was to turn on “-i” :-

alias rm='rm -i'

The problem is that you get used to “rm” being safe – either it asks before it removes files (“-i”) or it safely preserves what is deleted in the Trash folder. But what happens when the alias doesn’t get created? Perhaps you have a broken .zshrc and Zsh stops interpreting before the alias is declared. Or you’ve logged on to a remote server that doesn’t have your .zshrc installed as yet?

All of a sudden you are running the unadulterated rm command – deleting files without being asked, or preserving them in the Trash folder. See the danger now?

It is better not to replace standard commands but create a new ‘command’ :-

alias del="rmtrash"

Perhaps you regard this as being excessively risk averse – fair enough. But just don’t say you weren’t warned – and I’ve encountered missing aliases every year over the last 30-odd years I’ve been using Linux and Unix.

The Bare Family
Jun 022022
 

It sometimes seems that every time I dive into a Youtube video promising “${N} Awesome CLI Applications” (or equivalent), that most of the suggested applications are not command-line applications. They’re TUI-applications – text user interface as opposed to graphical user interface – or to align with my bad habit of referring to GUI applications as gooey applications, perhaps tooey applications.

Now there’s nothing wrong with tooey applications; I use them every day. Especially nmon (just because I got used to it before I discovered htop). Or btop

Screenshot of btop running

But none of these are really command-line applications; by which I mean they aren’t used at the command-line even if they are (optionally) invoked there. A command-line application allows you to use the shell including pipes to produce an aggregate result. For example :-

» grep mike /etc/passwd | awk -F: '{print $5}'
Mike Meredith

That uses two command-line “applications” to turn a username (“mike”) into a full name (“Mike Meredith”). Yes it can be optimised into a single command :-

» awk -F: '/^mike:/ {print $5}' /etc/passwd
Mike Meredith

… which even improves the search, but makes the point less well. And we can do slightly fancier things too :-

Screenshot of a random URL being picked and turned into a QR code.

(don’t assume that QR code takes you somewhere nice)

I’m not suggest Youtubers should stop making videos about terminal-based applications; I’m not even suggesting they should concentrate on “proper” command-line applications. Just don’t call terminal-based applications “command-line” because they really are not.