Apr 062017
 

One of the possibilities when setting a password is to use non-ASCII characters, such as ¨þ¨ (that is a thorn). Well perhaps something a little more secure than just a single character.

But just how sensible is it?

The first thing to bear in mind is that you need to be able to enter the password reliably in all circumstances. A tale from the mists of time: I once set a root password on a Unix machine that included the ¨@¨ character, which normally worked fine but failed on the system console because on that terminal the old Unix tty was still active and ¨@¨ would erase a line, making it impossible to enter the password.

Fortunately I realised what the problem was before it became more than a little annoying.

But the point still remains – if you cannot type a password, you cannot authenticate. So for passwords such as firmware passwords, system encryption passwords, or normal computer account passwords, a password containing Unicode characters is probably a very bad idea.

But for when you have full control over your computer(s), such as for web account passwords, a password containing Unicode characters is worth considering.

So how safe is a password containing a Unicode character anyway? Well, on my usual password cracking machine, john the ripper is unable to crack the password ¨þ¨ in approximately 24 hours. Of course that is a bit of a cheat as john the ripper does not by default check Unicode characters, and if it did it would be able to crack a one character password. But it would take longer; adding Unicode characters increases the space that john the ripper needs to search in order to find your password.

And perhaps more importantly makes it less likely for a password guesser (Hydra for example) to be successful.

So if you normally use a password such as thistlethinthorn, changing it to þistleþinþorn is worth considering. Or indeed changing the separator between words in a multiword password to a Unicode character: thistle☠thin☠thorn, or red¡whistle¡wheel.

Mar 192016
 

As you may or may not know, Unicode is the standard for encoding text into ‘computer speak’. There have been many different encodings of characters (graphemes, and other symbols) into “computer speak” by different manufacturers; all of which were severely limited. Almost all of them included the normal Roman alphabet plus a variety of other symbols. Amongst other things, this had two main problems.

Firstly, it excludes large parts of the world that use other writing systems from using computers sensibly – computers are difficult enough to use (at least at first) without learning a whole new writing system. And even then storing a document by transliterating it is not ideal – you have changed the original and introduced another possible source of errors.

Secondly there is the problem of errors introduced into documents when moving them from one computer system to another – for example it was not unknown for a “#” to become a “£” (or very similarly, “£” will appear as “£”). Very much more extreme examples exist.

Of course because of the enormous number of symbols in Unicode, there is a great deal of fun to be had with Unicode – not infrequently poking fun at Unicode for including ridiculous symbols. And why not? There’s no harm in having a bit of fun :-

ɥʇıpǝɹǝɯ ǝʞıɯ

However it is worth pointing out that Unicode standards are a serious business and slipping in “fun” symbols is not likely to happen. Although I was not directly involved, I did help out one of the people who pushed for the inclusion of medieval Slavonic characters within the Unicode standard, and it is not a trivial process.

And now for some “fun” Unicode characters…

þ

The Thorn. The English letter that got away. Before the age of printing, English had an additional letter (I’m over-simplifying here) which was used instead of the digraph th, so words such as the would have been spelt þe. By the time that printing had arrived, the shape of the thorn letter was becoming more like a “y” and because the printers imported their equipment from countries that did not have a thorn, the printed books tended to use “y” instead of þ. Which is of course where we get “Ye olde Shoppe” from.

Of course it was confusing printing ye when we said þe (or the), so the printers settled on the.

So why is þ in Unicode? Because you cannot discuss the letter without including it, and perhaps more importantly cannot encode a historical document that used þ without an encoding for it.

☃ and probably ☕

Ah! The snowman (and the cup of coffee). What sense of fun allowed these symbols into the standard?

Well according to the Unicode standard, it is contained within a block of weather symbols so it was almost certainly contained within a TV station’s encoding standard for weather forecasts. And you cannot claim to be a universal standard for text encoding without including the symbols included in other encodings.

The interrobang. The punctuation symbol used (if rarely) for signalling both a question and an exclamation: What the bleep are you doing‽

Whilst not commonly used today, it was very commonly used in the 1960s and so there are many documents that need encoding that use this symbol.

☠, ⚠, ☢

These look like fun don’t they? They certainly do to me, but in fact they are international symbols for various dangers – poison (☠), warning (⚠) of a general nature, and radioactivity (☢). All pretty serious stuff; and you really don’t want those symbols garbled in a document.

This is a Thai “letter” and I picked it out because it’s made fun of elsewhere, but it stands for all the non-European symbols used in language.

It may look kind of funny, but it probably isn’t so much to someone who knows Thai. To put it another way, if you told me that we’re not going to include the “M” in a character encoding because it looks too silly, I’d be very, very annoyed (my name contains two of ’em).

And yes I can type all of the above and the following into a text terminal 😃

 

2016-03-19_1119