This whitepaper is part of a three-part installment covering a wide breadth of topics on passwords, next-generation security and plenty more. In this installment, we look at passwords: what, where, and why. In the second installment we look at modern two-fold weaknesses of password and in the third and last installment we look at innovation and evolution of passwords and authentication mechanisms.
When we think of passwords, what usually comes to mind? Often times, we default to those incredibly complex and difficult to remember combinations: A6C.Goo4sp-s, t00d1fficult2remember!, and so forth. The requirements seem to get tougher and stranger with each passing year, and they actually do. Barely a week goes by anymore before yet another breaking news article comes out about yet another large web service being attacked and their password database being compromised.
As offline password cracking methods become simpler and quicker -- a topic we will visit in a later paper in this series -- so too do password ideologies grow more complex. And, as with most things in the computer security industry, this is an uphill battle that security engineers most fight with more challenging methods to combat would-be intruders. But, as a general axiom of computer security, more safety means less convenience, and sometimes even less productivity. Indeed, to fight this battle, engineers must impose unruly requirements on their users. But it need not be this way.
Increasingly, security engineers are employing the use of newer and more challenging methods against attackers, yet they try to keep these requirements relatively easy or more intuitive for their users. The methods vary widely, and often are unique to specific industries (bank authentication versus Google, for example). Even the networks themselves that handle authentication have been beefed up in ways never before utilized nor necessary in years past, mainly due to recent increasing threats to security and concerns over privacy. However, through all of this, one thing has become a clear certainty, that this fight will never stop. So how do security engineers stay one step ahead of hackers?
As aforementioned, passwords have been the dominant leader in authentication mechanisms. In fact, the concept of a password has been around since the Roman military of 200 B.C., so it is no wonder that passwords would still find themselves as the forefront most chosen method to login to a website, or anything for that matter. It should thus come as common sense knowledge that, with the reliance on password authentication as a means of security, the weaker or easier-to-guess the password is the more likely it is to be compromised.
So, throughout the ages, passwords became more and more complex, usually by length or obfuscation (code words, alphanumeric substitution, and so forth). This holds most true today, especially in web authentication. In fact, when we think of website password requirements today, it is typically something similar to the following:
Seems common enough, right? But there are some noticeable problems that have been well-known and discussed in security circles for decades now, and yet they still perpetuate themselves to this day. While the whole requirement group itself is a mess, there are two particularly important elements that cause the whole thing to be a miserable failure: Complexity and length requirements.
Example password requirements from Yahoo!, requiring 8 to 32 characters, including letters and numbers.
As depicted with the examples at the start of this topic, typically passwords consist first of some form of simple word complexity, often via alphanumeric obfuscation -- a method of substitution rendering the password obscure or unintelligible. This is common with passwords in the form of substituting numbers or special characters for letters or even entire words. This yields passwords like P4ssw0rd, 2fast2furious, m0v!ngaw4y, and other similarities.
The caveat to this type of complexity is that in making a password obscure or unintelligible with the desire to make it more complicated on attackers, it comes with the unintended consequence of the user often being incapable of memorizing the jumbled mess of alphanumeric and special characters. To add even more problems to the mix, passwords length restrictions are employed, which cause further difficulties in memorization. We will explore these problems a little later in this article.
The true reason for password complexity is due to the first and biggest misconception of password-based authentication security: longer password equals more secure. This is based purely on a mathematical theory that, at its time, held some weight. However, in modern password cracking realms, its purpose is negligible at best and useless at worst. The theory purports that if you have an N-length password, each additional layer of character sets adds more complexity, and mathematically this is indeed true. An 8-character password consisting of only lowercase letters and numbers has 2.8 trillion possible combinations (26 letters plus 10 numbers iterated over 8 characters). Add uppercase and this increases again. This posits, then, that more character type requirements yield mathematically more secure passwords. Realistically, however, this is not true.
Consider that modern password hash brute force software can guess at nearly five billion passwords per second. If you assume a passphrase akin to the xkcd joke of 44 bits (in their example, "correct horse battery staple", a 28-character password), a SHA1-encoded password hash can be cracked in a little over an hour, at worst. (We dig more into passphrases later in this article, as well.) If on a six-character password you require upper- and lower-case letters, the maximum possible combinations would be in the region of 19 trillion. Add the requirement for numbers as well and the maximum theoretical difference increases by barely five minutes.
Mathematically, however, it is true that as you increase the exponent value (the character count), the maximum also increases exponentially. This is about the part where most all security engineers stop and call it a day, narrowly focusing on part two of the password security-through-obscurity genre: increase the password length, increase the security. But is that sufficient?
In early 2000, the University of Cambridge Computer Laboratory performed a password study involving around 400 students. The study came to many conclusions, most notably that random passwords are more difficult to remember than mnemonic passwords, and passwords exceeding 6 characters become increasingly difficult to memorize. In fact, some subjects were never capable of memorizing their passwords. Though it is not directly discussed in this paper in particular, it does certainly hint to a considerable complexity problem with random or 'non-human' passwords.
Importantly, the Cambridge study highlighted the observation that participants had difficulty memorizing passwords beyond six characters. This is pretty much a well-known fact to anyone who has ever had to make a password for anything. We, as humans, are an associative-memory bunch, and in memorizing random data with no real order or correlation we find ourselves incapable of storing this information to memory in any real recallable fashion. Think of it like a database with no index, just random data floating about. This is why mnemonic devices work so well for many of us when studying for exams, as we can associate something psychologically tangible to the data. This is also why we very commonly use remarkably easy-to-guess passwords -- password1, anybody?
To compensate for the fact that our "blink182"-simple passwords exist in readily available password cracking dictionary lists and are just, in general, sometimes very easy to guess, security engineers focus on increasing the cost of the password. The term 'cost,' in this sense, implies the direct difficulty or iteration count of an encryption or hashing algorithm. In a SHA256 cryptographic hash function, for example, the default of 5,000 iterations over the hashing formula implies the cost, thus the more iterations, the more 'expensive' an encryption or hashing scheme. Similarly in a raw password sense, it is assumed that the more difficult a password is to guess (the 'cost' in this context, via character length), the more 'expensive' and thus more difficult it is to break. Indeed, a three-character password is highly insecure and can be brute-force guessed by a mediocre computer in a matter of seconds (if even that long), so naturally the logic flows that a longer password is more secure.
That is all well and good, but this also places entirely too much focus on one end of the spectrum: minimum length. The almost never-discussed elephant in the room still exists, that of maximum length, and why in the mathematically holy name of Pythagoras does a maximum limit even exist?
It is true, one must admit, that at one point in ancient times -- Okay, so the 70's and 80's are not ancient, so to speak, but in Internet time the 70's is like the Roman era. Right? Anyway... -- systems were incapable of transmitting more than a certain length of characters, for some pragmatic reason or another.
For example, the industry-standard communication protocol for EEP4 PIN pads in common Automated Teller Machines at one point required exactly a four-digit Personal Identification Number (PIN). This was required because, due to the technology limitations of the security systems in place, the encoding of the PIN required the exact integer length requirement of a four digits. As technology progressed through time, so too did the maximum length requirements -- PINs can be six or so digits now, admittedly not a lot of progress but it is progress nonetheless.
That explains systems were the protocols required exact, specific lengths, but what about arbitrary encryption, transmission, and storage methods? Take, for example, MD5 and SHA1, two powerhouse heavyweights in password hashing that are also used to generate checksum hashes of whole file or archive downloads, often millions of bytes in length (whereas a password is typically 6-12 bytes on average). If an MD5 or SHA1 hash have no real theoretical limit (ignoring the possibilities of hash collision, whereupon multiple strings result in the same password hash), then why would any software developer or security engineer seriously consider imposing a password length limit?
For quite some time now, it has been a long-standing joke as to why maximum password length exists, especially to this day. One of the theories is that originally in the 1970's, DES-based crypt truncated a password string after the 8th byte, thus anything beyond an eight-character password was pointless and a maximum-length policy was born. However, simple math and observation would yield that the 70's were 40 years ago, and technology has evolved just a little bit since then, namely in the security field. As is part of the joke, no one really knows why anyone still enforces maximum length beyond the disappointing but default answer, "It just has always been this way."
The only remotely plausible reason to continue keeping such a requirement anymore may be something to the effect of, "Why must users generate 40-character passwords consisting of obfuscated letters and numbers?" But this presumes a common assumption of password styling that we have yet to sincerely question: Is a complex mess of obfuscation and easily forgettable gibberish the only type of password ideology that is reasonable or acceptable?
Let us examine a hypothetical, but common situation: You are sitting at your computer, registering your new account on the latest social media trend, Trendr.cu ... or something, whatever. Anyway… You set your username, your email, that picture of you at the Halloween party last year that you still cannot remember too well, and everything else looks great.
But now you are stuck on the dreaded and much-hated password requirements part of the registration. Minimum 8 characters, Must have special characters, on and on. You struggle to search your mind for a secure password, while not using that typical password you use practically everywhere else. After a few attempts that do not meet the minimum requirements, you finally settle on something, enter it twice, and complete the registration. Now it is just three days later and you cannot remember if that was a letter O or a zero. Did you put the exclamation point in this password, or was that only when you were trying to meet the minimum requirements? And now your account is locked and needs reset. Wonderful…
Although the reality of this occurrence is not really measured at large, it is probably quite reasonable to assume this happens to perhaps thousands of new users each week, and that is probably just on Facebook alone. The reality of this statistic applied to all websites would likely be so staggeringly large it may warrant a Congressional hearing. But this need not be so. We have grown so used to the archaic password requirements of ages past that we quite often do not give even a moment's consideration to the usage of passphrases, such as partial or complete sentences.
We default to the difficult-to-remember combinations of alphanumeric sequences that produce less human-legible content than a palm slapped upon a keyboard randomly. Perhaps one reason for this is because of maximum length restrictions. But even still, a passphrase can be easily fit into an unreasonably short maximum password length requirement. For example, the sentence The door is open. is 18 characters, including the punctuation at the end. A large amount of password requirements arbitrarily cut off at 20 characters, so even a short sentence such as that may fit. No, the reason no one really uses passphrases is because, simply put, no one really thinks to do so.
Really, when you think about it, how many websites prompt you for a passphrase in lieu of a password? Probably none that you ever use. Sure, some may offer unique and interesting new methods, such as two-way SSL, key fob tokens, or other cool goodies – We will focus more on these in a later Passwords installment – but none ever really correct the issue of passwords versus passphrases, nor even suggest to a user that they apply a different approach.
For an end user, this can often make them default to focusing exclusively on difficult passwords that are complex and near impossible to memorize, as previously discussed. Psychology calls this many things: subconscious persuasion, herd mentality, or any number of other conditioned response names. We do this because our response to a password requirement is triggered from previous experience, and we know no different.
In fact, this really boils down to us being almost subconsciously trained, for some absurd reason or another, to never use one very important single character: a space. It is simply because of this forbidden keyboard character – as well as being subliminally coaxed by the word ‘word’ in password – that we just never even give a sentence as a password a moment’s consideration.
Surprisingly, a considerable number of password-based authentication systems actually forbid spaces in their authentication systems. Much like the laughed-at maximum password length phenomenon, no one seemingly has a good explanation as to why spaces are still banned, either. A very exceedingly small portion of old authentication systems may have a legitimate reason for this – an archaic authentication string tokenizer that uses spaces as separators, perhaps – but this is 2014, and it is way past time for those to be changed.
Websites can easily parse any other user input that contains spaces, so there is practically no reason why any system that exists today cannot parse a password field with spaces. However, we can even ignore the whole concept of using spaces entirely and still achieve the desired result, simply by implementing progress: make the concept of passwords extinct, and teach about and encourage the use of passphrases only. But how?
Content providers, website hosts, or whatever title befits them, they all have the ability to be the change necessary to properly educate users on a modern approach to password-based authentication, as well as influence other organizations to follow a similar suit. And it really is quite simple to fix this long-standing problem, too. It only takes three simple steps:
Prompt users for passphrases and not passwords
As we mentioned in the previous section, this problem is largely due to psychological conditioning of website users at large. This is something that can be very easily cured, but it requires content providers to be willing and participatory to push for this change. If a content provider stops prompting for passwords and starts prompting for passphrases, it will open a bit of dialogue with that user so they understand the key differences and change their approach to password-based authentication. In fact, simply by prompting differently, it may help many users achieve an “A ha!” moment of epiphany, ushering in a different mode of thinking even when using other websites that are not yet participatory in passphrase prompting. The fix can be something almost as easy as performing a sed or other similar find-and-replace command on your website content to replace all instances of “password” with “passphrase”. Obviously, a little more descriptive work would be required, but if nothing else, appropriate prompting is likely the most important change a content provider should make.
Provide detailed but brief literature on the differences and why a passphrase is better
Of course, a content provider really should not just stop at changing their prompting or phrasing without explaining what a passphrase is, or why it is important to use a passphrase in lieu of a typical password. This literature could simply add to or replace any password popups a registration or authentication user interface already presents. However, it is important that this passphrase information does not focus on the side of brevity, but rather that it gives a detailed though simple explanation of the concept and how to produce a good passphrase. This is important because it is necessary to reeducate both your end-users and the public at large on good passphrase opportunities and safety precautions. Remember, competitor and non-competitor content providers alike will notice your changes, too, and this will hopefully influence them to implement a similar approach.
Eliminate all maximum length restrictions on passphrases
Notice we did not mention “… or other absurd requirements” in that title. When you consider the content of a passphrase, requirements that often seem absurd for passwords are actually not really that absurd at all for a passphrase. Putting upper- and lower-case letters, numbers, and special characters in a passphrase becomes incredibly simple when you change the focus from a single word to a phrase or sentence. The 12 angry men., though probably an easily guessable passphrase, satisfies an absurd password requirement with no difficulties, all because we thought of those requirements in the form of a sentence instead of a single word. Certainly, however, we do not want to limit ourselves to such a short and guessable passphrase as The 12 angry men., so of course one absurd requirement must still go: maximum length restrictions. This may require some additional software changes, such as eliminating any maximum string length conditional checks, removing any potential truncation (hopefully no website or other code still does this, and if so, for shame!), ensuring spaces are allowed, and utilizing strong and secure one-way password hashing mechanisms that will support this (this is also a topic we will discuss more in a later Passwords installment). All in all, this fix really should not require much of any effort for any content provider. And if it does, perhaps it may be time to consider redesigning the authentication system.
Earlier, we looked at the evolution of a password: what the concept of a password is; where a password comes from through the progression of authentication, both analog and digital; and why a password is now an archaic notion. Every day, we punish ourselves by trying to memorize nonsensical jumbles of letters; some of us succeed at it, usually with insecure passwords, and many of us forget.
Even 14 years ago in 2000, the University of Cambridge found memorization to be measurably difficult, and yet we continue on with passwords, expecting it to one day work out for us. Indeed, as the old adage goes, “The definition of insanity is doing something over and over, and expecting new results,” so are we insane for thinking a mess of keyboard presses will one day be secure and memorable? Well, no, of course not! We are, quite simply, merely unenlightened to the alternative, glorious path of passphrases, a concept that has yet to be mainstream. But you – content provider and user alike – you can change that path, simply by progressing your promoted ideology from passwords, to passphrases.