Feb 292020
 

I used to be able to remember all of the keyboard short-cuts I’d set up to insert ‘fun’ (or useful) Unicode characters … but my memory isn’t quite what it used to be, and I happened to catch a video that mentioned a menu to insert Unicode characters.

So I set out to create my own …

#!/bin/zsh
#
# Run a menu of Unicode characters to put into the clipboard

read -r -d '' menu << END
Þ       Thorn (Sym-t)
Þ	Capital thorn (Sym-T)
✓	Tick (Sym-y)
✔       Alternate tick (Sym-Y)
π	pi (Sym-p)
Π       PI (Sym-P)
★       Star
🖕      Finger
END

selectedchar=$(echo $menu |\
  rofi -dmenu -l 20 -fn misc-24 -p "Unicode inserter" |\
  awk '{print $1}')
if [ -z "$selectedchar" ]
then
  notify-send "Unicode Inserter" "Cancelled"
else
  printf "%s" $selectedchar | xclip -selection clipboard
  notify-send "Unicode Inserter" \
    "Character $selectedchar now in clipboard"
fi

This script :-

  1. Creates a variable with a whole pile of text inside it. The original script contains a lot longer list, but including it would be a) boring and b) give too much away. As it stands, the format of each line is pretty much anything you want as long as the first character is the Unicode character followed by whitespace.
  2. Runs rofi with the variable as input and selects the first field of the response.
  3. Guards against the empty selection (when the menu was cancelled) for neatness mostly.
  4. Prints the selected character (without a newline) into xclip so it can be pasted in. I did try using xdotool to type it directly into the active window, but this didn’t always work so well (i.e. xdotool couldn’t “type” some of the more esoteric characters).
  5. And uses notify-send to alert the dumb user (me!) that something has happened.

Lastly, to make this useful, I added an entry to my sxhkd configuration :-

super + F11
  /site/scripts/m-unicode-insert
Apr 062017
 

One of the possibilities when setting a password is to use non-ASCII characters, such as ¨þ¨ (that is a thorn). Well perhaps something a little more secure than just a single character.

But just how sensible is it?

The first thing to bear in mind is that you need to be able to enter the password reliably in all circumstances. A tale from the mists of time: I once set a root password on a Unix machine that included the ¨@¨ character, which normally worked fine but failed on the system console because on that terminal the old Unix tty was still active and ¨@¨ would erase a line, making it impossible to enter the password.

Fortunately I realised what the problem was before it became more than a little annoying.

But the point still remains – if you cannot type a password, you cannot authenticate. So for passwords such as firmware passwords, system encryption passwords, or normal computer account passwords, a password containing Unicode characters is probably a very bad idea.

But for when you have full control over your computer(s), such as for web account passwords, a password containing Unicode characters is worth considering.

So how safe is a password containing a Unicode character anyway? Well, on my usual password cracking machine, john the ripper is unable to crack the password ¨þ¨ in approximately 24 hours. Of course that is a bit of a cheat as john the ripper does not by default check Unicode characters, and if it did it would be able to crack a one character password. But it would take longer; adding Unicode characters increases the space that john the ripper needs to search in order to find your password.

And perhaps more importantly makes it less likely for a password guesser (Hydra for example) to be successful.

So if you normally use a password such as thistlethinthorn, changing it to þistleþinþorn is worth considering. Or indeed changing the separator between words in a multiword password to a Unicode character: thistle☠thin☠thorn, or red¡whistle¡wheel.

Mar 192016
 

As you may or may not know, Unicode is the standard for encoding text into ‘computer speak’. There have been many different encodings of characters (graphemes, and other symbols) into “computer speak” by different manufacturers; all of which were severely limited. Almost all of them included the normal Roman alphabet plus a variety of other symbols. Amongst other things, this had two main problems.

Firstly, it excludes large parts of the world that use other writing systems from using computers sensibly – computers are difficult enough to use (at least at first) without learning a whole new writing system. And even then storing a document by transliterating it is not ideal – you have changed the original and introduced another possible source of errors.

Secondly there is the problem of errors introduced into documents when moving them from one computer system to another – for example it was not unknown for a “#” to become a “£” (or very similarly, “£” will appear as “£”). Very much more extreme examples exist.

Of course because of the enormous number of symbols in Unicode, there is a great deal of fun to be had with Unicode – not infrequently poking fun at Unicode for including ridiculous symbols. And why not? There’s no harm in having a bit of fun :-

ɥʇıpǝɹǝɯ ǝʞıɯ

However it is worth pointing out that Unicode standards are a serious business and slipping in “fun” symbols is not likely to happen. Although I was not directly involved, I did help out one of the people who pushed for the inclusion of medieval Slavonic characters within the Unicode standard, and it is not a trivial process.

And now for some “fun” Unicode characters…

þ

The Thorn. The English letter that got away. Before the age of printing, English had an additional letter (I’m over-simplifying here) which was used instead of the digraph th, so words such as the would have been spelt þe. By the time that printing had arrived, the shape of the thorn letter was becoming more like a “y” and because the printers imported their equipment from countries that did not have a thorn, the printed books tended to use “y” instead of þ. Which is of course where we get “Ye olde Shoppe” from.

Of course it was confusing printing ye when we said þe (or the), so the printers settled on the.

So why is þ in Unicode? Because you cannot discuss the letter without including it, and perhaps more importantly cannot encode a historical document that used þ without an encoding for it.

☃ and probably ☕

Ah! The snowman (and the cup of coffee). What sense of fun allowed these symbols into the standard?

Well according to the Unicode standard, it is contained within a block of weather symbols so it was almost certainly contained within a TV station’s encoding standard for weather forecasts. And you cannot claim to be a universal standard for text encoding without including the symbols included in other encodings.

The interrobang. The punctuation symbol used (if rarely) for signalling both a question and an exclamation: What the bleep are you doing‽

Whilst not commonly used today, it was very commonly used in the 1960s and so there are many documents that need encoding that use this symbol.

☠, ⚠, ☢

These look like fun don’t they? They certainly do to me, but in fact they are international symbols for various dangers – poison (☠), warning (⚠) of a general nature, and radioactivity (☢). All pretty serious stuff; and you really don’t want those symbols garbled in a document.

This is a Thai “letter” and I picked it out because it’s made fun of elsewhere, but it stands for all the non-European symbols used in language.

It may look kind of funny, but it probably isn’t so much to someone who knows Thai. To put it another way, if you told me that we’re not going to include the “M” in a character encoding because it looks too silly, I’d be very, very annoyed (my name contains two of ’em).

And yes I can type all of the above and the following into a text terminal 😃

 

2016-03-19_1119

 

 

Jul 172011
 

This is probably one of those videos you will only watch once … if that (in full at least), but it is one of those things that you can be glad that someone did :-

[

That shows one character per frame over 33 minutes. It’s an impressive demonstration of just how large the human writing system is.