
Katie Hates Condoms
Katie Hates Condoms
Recently I have been seeing quite a lot of usage of random.org (to pick out winners of various kinds of competitions; and no I’m not a winner). The documentation on that site are reasonable with regard to pseudo-random number generators but are not quite correct with regard to the source of random numbers under Linux. And for non-cryptographic uses, the following is fine.
The use of random.org momentarily made me wonder how I would do the equivalent at the Unix (or Linux) command-line, and having used the command before, the shuf command came to mind. To be honest shuffling is not what I think of randomisation given how bad I am at shuffling cards, but despite the name, shuf does pretty well at randomising things :-
» seq 1 10 | shuf 4 5 8 7 2 1 10 9 6 3
The seq command generates a sequence from 1-10 as given. It turns out that shuf can do it itself :-
» shuf -i 1-10 7 3 5 6 9 10 8 1 4 2
The most common (relatively) use I have for shuf is to pick out a random line or two from a file. By using the -n option, shuf can do this. The following example makes use of an example file which contains a small number of first names :-
» shuf -n 1 first-names Julian » shuf -n 1 first-names Ian » shuf -n 1 first-names Craig
If you have just a small selection to make, you can provide the list on the command line with the -em option :-
» shuf -n 1 -e Male Female Female
And that is pretty much all there is to it – a simple tool that does just one thing well.
As you may or may not know, Unicode is the standard for encoding text into ‘computer speak’. There have been many different encodings of characters (graphemes, and other symbols) into “computer speak” by different manufacturers; all of which were severely limited. Almost all of them included the normal Roman alphabet plus a variety of other symbols. Amongst other things, this had two main problems.
Firstly, it excludes large parts of the world that use other writing systems from using computers sensibly – computers are difficult enough to use (at least at first) without learning a whole new writing system. And even then storing a document by transliterating it is not ideal – you have changed the original and introduced another possible source of errors.
Secondly there is the problem of errors introduced into documents when moving them from one computer system to another – for example it was not unknown for a “#” to become a “£” (or very similarly, “£” will appear as “£”). Very much more extreme examples exist.
Of course because of the enormous number of symbols in Unicode, there is a great deal of fun to be had with Unicode – not infrequently poking fun at Unicode for including ridiculous symbols. And why not? There’s no harm in having a bit of fun :-
However it is worth pointing out that Unicode standards are a serious business and slipping in “fun” symbols is not likely to happen. Although I was not directly involved, I did help out one of the people who pushed for the inclusion of medieval Slavonic characters within the Unicode standard, and it is not a trivial process.
And now for some “fun” Unicode characters…
The Thorn. The English letter that got away. Before the age of printing, English had an additional letter (I’m over-simplifying here) which was used instead of the digraph th, so words such as the would have been spelt þe. By the time that printing had arrived, the shape of the thorn letter was becoming more like a “y” and because the printers imported their equipment from countries that did not have a thorn, the printed books tended to use “y” instead of þ. Which is of course where we get “Ye olde Shoppe” from.
Of course it was confusing printing ye when we said þe (or the), so the printers settled on the.
So why is þ in Unicode? Because you cannot discuss the letter without including it, and perhaps more importantly cannot encode a historical document that used þ without an encoding for it.
Ah! The snowman (and the cup of coffee). What sense of fun allowed these symbols into the standard?
Well according to the Unicode standard, it is contained within a block of weather symbols so it was almost certainly contained within a TV station’s encoding standard for weather forecasts. And you cannot claim to be a universal standard for text encoding without including the symbols included in other encodings.
The interrobang. The punctuation symbol used (if rarely) for signalling both a question and an exclamation: What the bleep are you doing‽
Whilst not commonly used today, it was very commonly used in the 1960s and so there are many documents that need encoding that use this symbol.
These look like fun don’t they? They certainly do to me, but in fact they are international symbols for various dangers – poison (☠), warning (⚠) of a general nature, and radioactivity (☢). All pretty serious stuff; and you really don’t want those symbols garbled in a document.
This is a Thai “letter” and I picked it out because it’s made fun of elsewhere, but it stands for all the non-European symbols used in language.
It may look kind of funny, but it probably isn’t so much to someone who knows Thai. To put it another way, if you told me that we’re not going to include the “M” in a character encoding because it looks too silly, I’d be very, very annoyed (my name contains two of ’em).
And yes I can type all of the above and the following into a text terminal 😃
This post was inspired by a video of someone’s testament of why they are leaving islam, but yet it has nothing to do with islam.
There is a perfectly understandable misunderstanding within that video – the extremism commonly found in islam today has nothing to do with islam itself. The same extremism can be found in other religions too – christianity, hinduism, budhism, judaisn, etc. Yes the perception is that islam today is far more extreme than those other religions, but there are still extremism in other religions :-
It seems that irrespective of what religion someone believes in, they will take the message from their religious texts that they want to. A good person is going to take the good stuff from the good book; a bad person is going to take the bad stuff from the very same book. I would not go as far as Steven Weinburg :-
Religion is an insult to human dignity. With or without it you would have good people doing good things and evil people doing evil things. But for good people to do evil things, that takes religion.
But it is certainly along the right sort of lines. Extremists use religion as an excuse to do evil things – killing homosexuals, abortionists, atheists, “immoral” women, etc. If we could somehow cause all the extremists of the world to drink the magical cool-aid that would turn off their extremism and turn them into the kind of religious believers who “love thy neighbour”, then there wouldn’t be a problem with religion.
But the sad fact is that extremists do so much harm with their religion that it outweighs any possible benefit we get from religion. We would be better off getting rid of religion just to stop the extremists from pretending to be good.
Just to amuse myself, I’ve been re-reading and re-learning the Unix shell’s ${} details, and it occurred to me that whilst these were all very well and cute, they very easily lead to impenetrable code. But they are more efficient.
Take the following two ways of getting the current date :-
✓ mike@pica» print -P "%D" 16-03-05 ✓ mike@pica» echo $(date) Sat 5 Mar 13:14:38 GMT 2016
It’s not exactly helpful that they return the date/time in different formats. But glossing over that for the moment, which one is clearer? That is right – the second one clearly says that it is going to “echo” the date. Even if this usage is particularly stupid (as date will echo the date all by itself), the second wins as far as clarity goes.
However it is also less efficient – rather than get the date and show it to the terminal, the shell invokes a sub-process to display the date, captures it and then uses it to show to the terminal. In the old days when terminals consisted of printing mechanisms that actually hit a template of a letter against an inked up ribbon against a roll of paper and hoped that the result was readable, this inefficiency could result in very slow code.
But today this level of inefficiency should not make that much difference, and if it does, then why are you writing code in the shell? There are far better languages out there.
In addition, there is a bit of a gotcha with the print -P “%D” option … it only works if you happen to be using zsh :-
✓ mike@pica» print -P "%D" 16-03-05 ✓ mike@pica» /bin/sh $ print -P "%D" file: option requires an argument -- 'P' Usage: file [-bcEhikLlNnprsvz0] [--apple] [--mime-encoding] [--mime-type] [-e testname] [-F separator] [-f namefile] [-m magicfiles] file ... file -C [-m magicfiles] file [--help] Warning: unknown mime-type for "-P" -- using "application/octet-stream" Error: no such file "-P" Error: no such file "%D" $ ✗ mike@pica» /bin/ksh $ print -P "%D" %D $ ✓ mike@pica» /bin/bash mike@pica:~/.lyx$ print -P "%D" file: option requires an argument -- 'P' Usage: file [-bcEhikLlNnprsvz0] [--apple] [--mime-encoding] [--mime-type] [-e testname] [-F separator] [-f namefile] [-m magicfiles] file ... file -C [-m magicfiles] file [--help] Warning: unknown mime-type for "-P" -- using "application/octet-stream" Error: no such file "-P" Error: no such file "%D" mike@pica:~/.lyx$ exit
Confusing is it not?
Of course if the shell would intercept common usages such as $(date) and optimise them, that would be perfectly reasonable.