How Strong Are XKCD-Style Passwords? And Other Thoughts

Nov 082013

One of the odd things about telling people about password security is that you have to learn just a little bit more than what you are saying. This leads to the frustration of not being able to talk about ideas you might have – such as that perhaps xkcd-style passwords (“word1word2word3word4”) are not quite as strong as is made out.

Not that they’re weak of course, and indeed I encourage their use wherever it is inconvenient to use “line noise” style passwords such as “JyP;$u5+Q\hzrU[C”. But how was the strength of the password “correcthorsebatterystaple” calculated? And was that calculation correct?

When we want a quantitative value for the strength of a password, it is traditional to calculate the information entropy of the password for the simple reason that the number of bits of entropy is very quickly turned into the number of password guesses necessary to get the password. Simply calculate 2^(entropy bits) and you have the number of guesses necessary to exhaustively search all possible passwords; of course on average it is only necessary to search through half of the possible passwords to guess the correct password.

To calculate the information entropy of a password is a slightly tricky calculation: entropy = log2 (number of possible symbols) ^ (length of password). Or to be more general: entropy = log2 * (number of different possible passwords). Of course this only applies to truly random passwords, so is strictly speaking the maximum possible entropy. The “log2” is base 2 logarithm which doesn’t usually appear on a calculator, but can be calculated as logx(X) = log(X)/log(2).

If we calculate the information entropy of the “correcthorsebatterystaple” we get :-

$ calc
(suppressed)
; ln(26 ^ (5 + 7 + 6 + 7))/ln(2)
	~117.51099295352730400945

Which is far in excess of the 44 bits calculated in the cartoon.

But is it correct? If you were to translate the “correcthorsebatterystaple” password into Chinese you would get “正馬電池訂” (don’t blame me for the poor translation) which isn’t quite what I was hoping for, as it is five “symbols” instead of four … because we have four words. If we think of the password as being four symbols long, we have a somewhat different calculation where we raise “something” to the power of 4.

Now if we count all the words in the file /usr/share/dict/words as symbols (and add 96 for the ASCII character set), we end up with a total of 99267 symbols which is greater by far than the minimal 26 symbols that we started with. But the calculation comes out differently :-

$ calc
(suppressed)
; ln(99267 ^ 4)/ln(2)
	~66.39610628854963984846

Which is a lot less … we’ve gone from the verging on overkill on terms of strength towards the lower end of “strong”. But that isn’t all. If we remove all words longer than 8 characters from /usr/share/dict/words we get a total of 31321 which gives a somewhat different result :-

$ egrep -e "^.{1,7}$" /usr/share/dict/words | grep -v "'"  | wc -l
31225
$ calc
(suppressed)
; 31225 + 96
	31321
; ln(31321^4)/ln(2)
	~59.73937061801244336267

Which is now dropping to the “reasonable” category. But that still has way more words in it than are likely to be used in passwords. If we restrict it to the top 5000 most common words in episodes of The Simpsons (it happened to be a good list easily obtainable) then we go down yet again :-

; ln(5096^4)/ln(2)
	~49.26059824902520150092

Which is now in the mid-range of the “reasonable” category. We are still above the xkcd calculated value of 44 bits of entropy (they may have used NIST 800-63 to calculate the entropy). And way higher than the amount of entropy in the typical weak password … which is a simple word, or perhaps a word with a symbol added to the end. The amount of entropy in that sort of password is around 18 bits (when we treat a word as a symbol).

As a result we can conclude that :-

The xkcd method results in much stronger passwords than the typical password (49 > 18).
The xkcd method is much weaker than a truly random password (49 < 164). The 164 comes from calculating the entropy of a random password with a choice of 96 possible symbols.
Nobody could argue that the xkcd method is a lot easier to remember than a truly random password.
The xkcd method can become very weak if an attacker can predict what dictionary of words are used. For instance if passwords are generated from a very restricted set of words (say 25 words), then the entropy drops to just over 18 bits which is in the insanely weak category.

At present (as far as we know), there aren’t any tools out there to attack xkcd-style passwords but there soon will be.