Apr 232016
 

In the week, I got acquainted with the OSX Time Machine’s “Local Snapshots” which get created when your normal Time Machine volume is not available. When digging around for more information on it, I came across the trainee backup-nazi’s standard line that a backup on the same hard disk as the original data is no backup at all, and is completely useless. Well they were half-right.

To over simplify, backups perform two basic functions – they provide a copy of your data that you can use in the event of a disaster (your laptop gets stolen, your house burns down, etc.), and they allow you to recover from those “Oh! I didn’t mean to delete that file” moments. And the later use case is by far the most common – particularly in an organisation where you can ask someone else to recover a file for you.

But of course local snapshots that get created when the backup media is not available are not true backups. Any disaster that occurs is very unlikely to destroy the original data but leave the local snapshot unharmed; if it does leave it unharmed then fine. But backups are for the worst possible scenario – I did not mention your house burning down by accident.

But local snapshots are useful by themselves; whilst they are certainly not backups, they can be very useful for the most common variety of restoration job. And because they are so available, it is possible to use them for purposes we would not have thought of before.

Such as looking at what that document you are working on looked like yesterday. That paragraph you re-worked; does it really read better today than the original version yesterday? In a more technical sense, I have been using file system snapshots for years – to look back in time.

damascus-unix-prompt

Mar 252016
 

Recently I have been seeing quite a lot of usage of random.org (to pick out winners of various kinds of competitions; and no I’m not a winner). The documentation on that site are reasonable with regard to pseudo-random number generators but are not quite correct with regard to the source of random numbers under Linux. And for non-cryptographic uses, the following is fine.

The use of random.org momentarily made me wonder how I would do the equivalent at the Unix (or Linux) command-line, and having used the command before, the shuf command came to mind. To be honest shuffling is not what I think of randomisation given how bad I am at shuffling cards, but despite the name, shuf does pretty well at randomising things :-

» seq 1 10 | shuf
4
5
8
7
2
1
10
9
6
3

The seq command generates a sequence from 1-10 as given. It turns out that shuf can do it itself :-

» shuf -i 1-10
7
3
5
6
9
10
8
1
4
2

The most common (relatively) use I have for shuf is to pick out a random line or two from a file. By using the -n option, shuf can do this. The following example makes use of an example file which contains a small number of first names :-

» shuf -n 1 first-names 
Julian
» shuf -n 1 first-names
Ian
» shuf -n 1 first-names
Craig

If you have just a small selection to make, you can provide the list on the command line with the -em option :-

» shuf -n 1 -e Male Female
Female

And that is pretty much all there is to it – a simple tool that does just one thing well.

damascus-unix-prompt

Mar 192016
 

As you may or may not know, Unicode is the standard for encoding text into ‘computer speak’. There have been many different encodings of characters (graphemes, and other symbols) into “computer speak” by different manufacturers; all of which were severely limited. Almost all of them included the normal Roman alphabet plus a variety of other symbols. Amongst other things, this had two main problems.

Firstly, it excludes large parts of the world that use other writing systems from using computers sensibly – computers are difficult enough to use (at least at first) without learning a whole new writing system. And even then storing a document by transliterating it is not ideal – you have changed the original and introduced another possible source of errors.

Secondly there is the problem of errors introduced into documents when moving them from one computer system to another – for example it was not unknown for a “#” to become a “£” (or very similarly, “£” will appear as “£”). Very much more extreme examples exist.

Of course because of the enormous number of symbols in Unicode, there is a great deal of fun to be had with Unicode – not infrequently poking fun at Unicode for including ridiculous symbols. And why not? There’s no harm in having a bit of fun :-

ɥʇıpǝɹǝɯ ǝʞıɯ

However it is worth pointing out that Unicode standards are a serious business and slipping in “fun” symbols is not likely to happen. Although I was not directly involved, I did help out one of the people who pushed for the inclusion of medieval Slavonic characters within the Unicode standard, and it is not a trivial process.

And now for some “fun” Unicode characters…

þ

The Thorn. The English letter that got away. Before the age of printing, English had an additional letter (I’m over-simplifying here) which was used instead of the digraph th, so words such as the would have been spelt þe. By the time that printing had arrived, the shape of the thorn letter was becoming more like a “y” and because the printers imported their equipment from countries that did not have a thorn, the printed books tended to use “y” instead of þ. Which is of course where we get “Ye olde Shoppe” from.

Of course it was confusing printing ye when we said þe (or the), so the printers settled on the.

So why is þ in Unicode? Because you cannot discuss the letter without including it, and perhaps more importantly cannot encode a historical document that used þ without an encoding for it.

☃ and probably ☕

Ah! The snowman (and the cup of coffee). What sense of fun allowed these symbols into the standard?

Well according to the Unicode standard, it is contained within a block of weather symbols so it was almost certainly contained within a TV station’s encoding standard for weather forecasts. And you cannot claim to be a universal standard for text encoding without including the symbols included in other encodings.

The interrobang. The punctuation symbol used (if rarely) for signalling both a question and an exclamation: What the bleep are you doing‽

Whilst not commonly used today, it was very commonly used in the 1960s and so there are many documents that need encoding that use this symbol.

☠, ⚠, ☢

These look like fun don’t they? They certainly do to me, but in fact they are international symbols for various dangers – poison (☠), warning (⚠) of a general nature, and radioactivity (☢). All pretty serious stuff; and you really don’t want those symbols garbled in a document.

This is a Thai “letter” and I picked it out because it’s made fun of elsewhere, but it stands for all the non-European symbols used in language.

It may look kind of funny, but it probably isn’t so much to someone who knows Thai. To put it another way, if you told me that we’re not going to include the “M” in a character encoding because it looks too silly, I’d be very, very annoyed (my name contains two of ’em).

And yes I can type all of the above and the following into a text terminal 😃

 

2016-03-19_1119

 

 

Mar 052016
 

Just to amuse myself, I’ve been re-reading and re-learning the Unix shell’s ${} detailsand it occurred to me that whilst these were all very well and cute, they very easily lead to impenetrable code. But they are more efficient.

Take the following two ways of getting the current date :-

✓ mike@pica» print -P "%D" 
16-03-05
✓ mike@pica» echo $(date) 
Sat 5 Mar 13:14:38 GMT 2016

It’s not exactly helpful that they return the date/time in different formats. But glossing over that for the moment, which one is clearer? That is right – the second one clearly says that it is going to “echo” the date. Even if this usage is particularly stupid (as date will echo the date all by itself), the second wins as far as clarity goes.

However it is also less efficient – rather than get the date and show it to the terminal, the shell invokes a sub-process to display the date, captures it and then uses it to show to the terminal. In the old days when terminals consisted of printing mechanisms that actually hit a template of a letter against an inked up ribbon against a roll of paper and hoped that the result was readable, this inefficiency could result in very slow code.

But today this level of inefficiency should not make that much difference, and if it does, then why are you writing code in the shell? There are far better languages out there.

In addition, there is a bit of a gotcha with the print -P “%D” option … it only works if you happen to be using zsh :-

✓ mike@pica» print -P "%D"
16-03-05
✓ mike@pica» /bin/sh
$ print -P "%D"
file: option requires an argument -- 'P'
Usage: file [-bcEhikLlNnprsvz0] [--apple] [--mime-encoding] [--mime-type]
            [-e testname] [-F separator] [-f namefile] [-m magicfiles] file ...
       file -C [-m magicfiles]
       file [--help]
Warning: unknown mime-type for "-P" -- using "application/octet-stream"
Error: no such file "-P"
Error: no such file "%D"
$ 
✗ mike@pica» /bin/ksh
$ print -P "%D"
%D
$ 
✓ mike@pica» /bin/bash
mike@pica:~/.lyx$ print -P "%D"
file: option requires an argument -- 'P'
Usage: file [-bcEhikLlNnprsvz0] [--apple] [--mime-encoding] [--mime-type]
            [-e testname] [-F separator] [-f namefile] [-m magicfiles] file ...
       file -C [-m magicfiles]
       file [--help]
Warning: unknown mime-type for "-P" -- using "application/octet-stream"
Error: no such file "-P"
Error: no such file "%D"
mike@pica:~/.lyx$ exit

Confusing is it not?damascus-unix-prompt

Of course if the shell would intercept common usages such as $(date) and optimise them, that would be perfectly reasonable.

Jan 312016
 

Thanks to the Let’s Encrypt project, my blog now has a trusted certificate and traffic to it is encrypted.

Rusty_Padlock

Of course there is nothing especially private about this blog, so why encrypt?

Well for one thing, by encrypting those who log in can keep their account details private.

But for the overwhelming majority of visitors (who do not log in) all it adds is a bit of privacy. Snoopers still know that you are visiting a dodgy website lurking underneath my stairs, but they won’t know what lurid posts you are reading.