Theoretical and Practical Password Entropy
Last Updated: 2011-08-10 20:08:07 UTC
by Johannes Ullrich (Version: 1)
We got a number of submissions pointing to today's XKCD cartoon [1] . I think the cartoon is great, and illustrates a nice dilemma in password security. Yes, I know passwords don't work, but we still all use them and we still have to come up with reasonable passwords.
Even if you are using a password safe tool that comes up with new random passwords for each application and website, you still need to remember the password for the password safe, and there are a few applications (e.g. logging in to your system) that can't be covered by a password safe.
The basic dilemma is that you need to come up with a password that is hard to guess for others but easy enough for you to remember. Most password policies try to enforce a hard to guess password by forcing you to extend the range of characters from which you pick (different case letters, numbers, special characters). However, in real life, this may actually reduce the space of "memorable" passwords, or the total number of possible passwords.
Pass phrases, as suggested by the cartoon, are one solution. But once an attacker knows that you use a pass phrase, the key space is all for sudden limited again. There has been some research showing that a library of 3 word phrases pulled from wikipedia makes a decent dictionary to crack these passwords.
The qualify of a password is usually expressed in "bits of entropy". The "bits of entropy" are calculated by the number of bits it would take to represent all possible passwords. Lets look at some common schemes:
a 4 digit PIN: 10,000 possible passwords, or 13.3 bits (ln2(10,000)=13.3)
12 characters using the full 95 characters ASCII set: 5.4 10^23, or 78.8 bits. (this is the current NIST recommendation)
Pass phrases are harder to evaluate. It depends on the size of the vocabulary of the user, and of course the constraints of grammar. People will likely not choose some random words, but a phrase that makes some sense to them. One model that can be used to obtain a passphrase is called "Diceware", but it assumes random phrases from 6^5 words (7,776).If you consider Diceware's 7,776 words, you would need 6 words to arrive at the same 77.5 bits, close to strength that NIST asks for.
What it all comes down to: How are people actually selecting passwords? People make pretty bad random number generators, in particular if you ask them to remember the result. A good password cracking algorithm takes this into account and tailors the password list based on password requirements and the targets background. For example, for web application pen testing, the simple ruby script "cewl" will create a custom password list from words it finds on the targets website. In past tests, I was easily able to double my password cracking success using this technique if compared to normal dictionaries.
In order to solve this, we need to figure out what passwords people really use. How about asking them for their password and offering them a candy bar in return :). And then there is always another XKCD cartoon for you [2]
[1] http://www.xkcd.org/936/
[2] http://xkcd.com/538/
------
Johannes B. Ullrich, Ph.D.
SANS Technology Institute
Twitter
Comments
there are perhaps ~1,000 common words
If you pick 4 random ones, there are
1000 * 1000 * 1000 * 1000 = 1000 ^ 4 = 10^12 possible choices.
That's approximately 2^39 or about 39 bits worth of 'choice'.
The problem is, that amount is only the entropy if your selection actually _IS_ truly random, which means you use a computer to make a truly random selection of 4 words; don't pick words off the top of your head.
As a human, there is likely to be some bias, in the words you would choose, reducing the entropy; if you pick them without a randomizer, there will be less than 39 bits of entropy.
A punctuation mark added, and a random letter capitalized somewhere in the passphrase also helps.
A combination of _both_ approaches is possible, without going overboard and making the password hard to remember.
1. If you implement account lockouts, even if for ten or twenty attempts, entropy doesn't matter.
2. If they've swiped a password file, they've got all the time in the world without any lockout concerns and a graphics card password cracker can try billions of permutations per second. If it's taking too long, add in a few more graphics cards.
Passwords are passé. The only reason companies use them is because they're free, and you get what you pay for.
For using all of the character types, the max entropy is 70.3 bits at 12 characters. There are 1.48E21 possible permutations here. Assuming a random distribution, the random time to brute force at 3.3 billion passwords per second (achievable with current GPUs) is an average of about 24.8 GPU-years. To get that down to a month, you would need about 27,500 cards--not impossible, but not trivial, either. This increases to an average of 17,900 GPU-years at 13 characters.
I don't yet have complete subsets (e.g., 2 of 4 character classes) as I've not worked out the spreadsheet structure, but it still won't reach the 78.8 bits recommended by NIST. A quick calculation suggests 72.2 bits, but I'm not swearing on that number.
I've thought about doing some research into passphrase strengths where spelling is correct, but the calculations get daunting. The Oxford English Dictionary discusses word count just for English[1] and comes up with 250,000 to 750,000 words, depending on context. Then you get into sentence structure, vocabulary statistics, and regional variations, and it becomes clear why, when no other methods are available, passphrases are by far the better solution.
http://oxforddictionaries.com/page/93
Have you added one word per word? I mean; There are permutations of words, like lowercase, uppercase, first letter uppercase which would at least tripple the amount. Have you added whitespaces? What if someone replaces Whitespace with lets say dots or 'x' or whatever. You see, the complexity is even higher!
The number of words in the Oxford English Dictionary is not relevant here. What is relevant here is the number of words in the average person vocabulary. That is only a few thousand at best.
Next you can decimate the permutations, because most people will choose a grammatically correct phrase, e.g. subject transitive-verb object, or subject intransitive-verb adjective. Also you might need to ban "I", "The" and "and".
For example,
"There are over 80,000 Chinese characters, but most of them are seldom used today. So how many Chinese characters do you need to know? For basic reading and writing of modern Chinese, you only need a few thousands. Here are the coverage rates of the most frequently used Chinese characters:
Most frequently used 1,000 characters: ~90% (Coverage rate)
Most frequently used 2,500 characters: 98.0% (Coverage rate)
Most frequently used 3,500 characters: 99.5% (Coverage rate)
For an English word, the Chinese translation (or the Chinese 'word') often consists of two or more Chinese characters. " [1]
Now, if you understand Chinese, you can use pinyin - an anglicized form of Chinese - to construct your passwords. Pinyin comes in two basic flavors, traditional (Taiwan) and simplified (China). The simplified pinyin is often used with a number at the end of each character to denote the phonetic "tone" of pronunciation. Therefore, the work for 'sunrise' could be combined with the word for 'bright' (ming2xu4) to easily express a 'bright sunrise' and easily defeat the Oxford adherents. Furthermore, if one chooses from the more arcane characters (remember, there are about 80k choices here) it would present a rather daunting challenge. By the way, there are variations on this theme if you go with the Cantonese rather than Mandarin pronunciation.
[1] http://chineseculture.about.com/library/symbol/blccbasics.htm
"civilian flyer know fait"
"roughly channel Limited fools"
"be obviously Sensia then"
What is your source for the average person's vocabulary? A quick google search shows that, while there is no good answer, whatever it is is much more than a few thousand words. Here is a nice little article.
http://www.worldwidewords.org/articles/howmany.htm
That said, I often have a problem when assigning passwords to people. I seem to use a lot of words my users don't know. I had a user complain about random passwords when I assigned her a phrase containing zymurgy.
The OED is commonly accepted as the ultimate source of knowledge of the English language, or at least as close as anything can get to it right now. As such, it's useful for helping to set upper bounds of complexity.
You're right that people have much smaller vocabularies; an educated person averages in the 24,000-30,000 word range[1]. Those vocabularies differ significantly from person to person, meaning that someone in a technical field and someone in a literary field may have significant differences in those 24,000+ words.
But cutting down the 250,000 base words to, say, the 20,000 most common words is the same as cutting a word list for JtR down to the most common 1000 passwords. You select an arbitrary number for your particular needs at that point so you can at least try to tackle part of the problem.
[1] http://www.independent.co.uk/news/world/americas/english-language-nears-the-one-millionword-milestone-473935.html
New Comments closed for all Diaries older than two(2) weeks
Please send your comments to our Contact Form

Diary Archives