This whitepaper is part of a three-part installment covering a wide breadth of topics on passwords, security, next-generation, and plenty more. In this installment, we look at the modern two-fold weaknesses of passwords, and easy methods to fix these problems. In the next and last installment we look at innovation and evolution of passwords and authentication mechanisms.
In the previous article Passwords vs. Pass Phrases - An Ideological Divide we looked at several factors that weaken password-based authentication security, namely on the side of the end-user. The concept of a password in and of itself is inherently flawed, and many of the surrounding security or enforcement strategies are equally flawed and antiquated. We – being both the content providers and end users alike – operate on a password ideology that is decades old and utilizes some principles no one in the security industry can reasonably justify anymore. (Maximum password length, anyone?) But given the progressive nature of the Internet and the unforgiving speed at which everything changes therein, this is something we can easily change.
Password-based authentication need not be such an archaic pillar of security any longer. Indeed, as we previously went over, content providers must inspire an ideology of passphrases, and end users must deeply understand and implement this concept. However, a good security engineer worth his weight in firewall appliances knows that a proper and functional security posture of a well-built and maintained system requires multiple layers of security. A modernized approach to password ideology is only one of several necessary steps for a highly-secured system. Next, content providers must ensure that the underlying technology can survive a data breach when – not if – it happens.
Authentication – Why it is Important to Protect Every Bit of Data
Unless you are a developer or a systems administrator, not a whole lot of thought goes into a common password-based authentication mechanism. You enter your username or email, your password often masked by black dots, a button click starts some magical voodoo that happens behind the curtain, and voila! You are now logged in and your session is validated for some predetermined length of time. But as you, our well-learned reader, certainly will know, far more goes into an authentication system.
When an authentication occurs, the user's supplied data is submitted as-is, most often in plaintext form. This presents an extremely important, yet often overlooked challenge to developers and administrators: securing the user's data before it even makes it to you. More often than not, you will find many large-scale organizations that use plaintext, unprotected authentication systems. A simple glance at the address bar shows many do not even use SSL – or TLS, as it should properly be called – on their authentication systems. In fact, some smaller self-hosted online stores may know such security is both wise and required for PCI compliance – a topic we have previously discussed at length – but they will for some inexplicable reason completely gloss over end user authentication. Logically, they should ask themselves: Why would a hacker desire only credit card details, but never any login information?
The First Layer: A Secured Line
Consider you run a website that caters to students attending a higher-education institution, such as a university. As has become commonplace among many businesses and organizations, college campuses often provide free WiFi internet access, usually restricted to their students and staff only. Often times these WiFi connections are provided in two varieties: encrypted and unencrypted. The encrypted option usually requires some level of setup on your local computer before it works properly, which is why many universities offer the open, unencrypted option, for those who either cannot make it work properly, or need to obtain the instructions how. Unfortunately, once students have connected to the unencrypted internet access, many all of them are going to mindlessly forego the encrypted path and just keep surfing.
Now say some black hat hacker in the student lobby of whatever university is your largest visitor has a wireless network auditing tool such as WiFi Pineapple. This device allows him to mimic the WiFi signal the university provides, inspect the traffic, and pass it along potentially compromised. He quietly sits and sniffs all the unencrypted traffic in that lobby and passes it along, those students being blissfully unaware all of their unencrypted traffic is plaintext treasure for this hacker. He accomplishes this using Firesheep, a Firefox plugin that allowed hackers to capture other WiFi network traffic in Firefox (prompting Facebook, Gmail, and many others to employ SSL-only connections). If your hypothetical website for students is running an authentication mechanism unencrypted – no end-to-end TLS certificate handshake of any sort – you now have exposed your end users at that campus to potential sniffing attacks made as simple as a dongle and a browser addon. It may not seem that important, but what if you had the next Facebook (which started on college campuses) and were taken down by poor authentication measures? That billion dollar dream is now gone.
It is critically important that the security of password-based authentication start right at the moment an end user arrives at your website. Before they even begin to provide any data to your servers, the channel between you and them must be secure. This is almost always done using an end-to-end TLS certificate (they're commonly called "SSL", but that is actually a misnomer). Trusted TLS certificates can be easily obtained for free and are accepted by most all browsers and other systems that honor such security certificates. However, while end-to-end traffic security is critical and crucial, there are still more layers to a reasonably secured infrastructure.
The Second Layer: Secured Storage
In order to ensure a valid authentication occurs, the system the end-user is authenticating into must compare the challenge password with a previously established comparison. This is often stored in some secured format. (Of course, some organizations choose to store all passwords in plaintext form – not a good idea, obviously – but we will get more into that later.) Typically this storage security is completed using a mathematical algorithm called a cryptographic hash function, a formula that takes in an arbitrary length password and returns a fixed-size string in the form of a password hash. For example, if we take a lesson from the previous article in this series and generate a passphrase – This is a password. – then we are left with a password hash of 07997f833c2d709d2e5fcd7666858d8c.
Commonly, web and web-like password-based authentication mechanisms utilize simple hashing functions, such as MD5 (used in the previous example) or SHA1, even after both have been proven considerably weak for many years. This is likely due to the simplicity of hash creation and comparison with both functions, requiring only hashing the plaintext password supplied by the user and performing a direct string comparison to the stored hash in the user table. In fact, this manner of hash comparison can be, and often mistakenly is, completed within the database query that fetches the stored hash itself. It seems secure enough, and it is incredibly easy to implement in code, so why bother with anything more complex? Indeed, that is apparently the common and acceptable approach to password storage security, but unfortunately it is a dangerously lazy one, too.
What is Wrong With Simple Hashing?
First, when generating a password hash, you absolutely want each hash to be unique. No two differing passwords should ever generate the same password hash. However, both MD5 and SHA1 have been found to have an uncomforting likelihood of two passwords generating the same password hash – known as a hash collision. MD5 can have no more than 3.4 x 1038 possible unique password strings before a collision will occur. SHA1 even has a probability formula to determine collision likelihood. The fact that these are known severe mathematical flaws with both cryptographic hash functions should be reason enough to abandon use of them with password hashing. However, the extremely minimal modern cost of both functions is truly the most damning element.
In the security industry, a password hash's strength is determined by the cost of the cryptographic hash function itself. The term 'cost,' in this sense, implies the direct difficulty or iteration count of an encryption or hashing algorithm. In terms of difficulty, this can be considered to basically be an exponential curve relative to time for each additional character in a password (essentially as steep as f(x) = 2x – see Figure 1 below). Cost is also used in terms of how much additional mathematical work is applied to a cryptographic hash function. In a SHA256 cryptographic hash function, for example, the default of 5,000 iterations over the hashing formula implies a cost of repeating and applying the SHA256 mathematical formula 5,000 consecutive times – thus the more iterations, the more 'expensive' (and often, more secure) an encryption or hashing scheme is. However, neither MD5 nor SHA1 in typical web system deployments contain the ability to iterate over any formula for additional security, therefore their cost lies solely in the amount and type of characters the end user types. This is the first among simple hashing functions' many flaws.
Figure 1 - Example of the exponential-like curve of a password's cost in terms of character length versus time
For quite some time, it was considered that MD5 and SHA1 were reasonably secure since the technology was markedly limited in terms of brute-force cracking the hash itself. The cost of an MD5 or SHA1 hash were substantial enough to hinder common-day Intel or AMD CPUs at that time from directly deciphering an MD5 or SHA1 hash itself or brute-force guessing at a hash. In 2007, however, nVidia released a C programming library for their Cuda and Tesla series graphics processors. This led to all sorts of new projects being designed for GPU usage, linear algebra being a considerably large one. It was not until 2010 that the most frightening aspect of GPU technology became serious headline news, when researchers at Georgia Tech published GPU technology was extremely successful at password hash cracking. And not just extremely successful, but so much so that all prior conceived notions of cost have been rendered wholly obsolete now.
Various usages of Hashcat – a password hash cracking utility – have shown CPU to GPU comparisons with Radeon GPUs pushing upwards of 90 times faster than top-of-the-line Intel or AMD CPUs in comparative MD5 brute force tests. When you put this into terms of the exponential amount of time it takes to brute force a password hash for each additional character, the results are staggering. Where an MD5 password hash may take 450 years with a higher powered multicore CPU, a modern GPU may be able to do it in 5 years at worst. A 20 year wait on a CPU is less than 2 months on a GPU. Today, however, GPUs fare far better than this. Much of the GPU research data available is around three years old, which is centuries in terms of Moore's Law. This has proven to be a potential nightmare for content providers utilizing standard password storage methods.
Consider that the IGHASHGPU password hash brute forcing software projects the ability to brute force attempt 3.7 billion MD5 hashes or 1.4 billion SHA1 hashes, per second. If you assume a passphrase akin to the xkcd joke of 44 bits (in their example, "correct horse battery staple", a 28-character password), a SHA1-encoded password hash by this measure could conceivably be cracked in a little over an hour, at worst. Simply put, using simple password hashing is a welcome invitation for mass password compromise. As assuming as that statement is, it is quite true.
Password Data Mass Compromises – Even the Mighty Can Fall
Over the past five years, several dozen major organizations, corporations, and even government entities have fallen victim to attackers infiltrating their servers and extracting massive password hash dumps. This has become common and recurring event that public projects have begun to appear to document their rapid occurrences and provide a database of the dumps. Many of these victims even had advance warning of the impending attacks, and employed highly skilled teams of security engineers, yet they still could not inhibit their attackers from obtaining password data. Truly, if a hacker (or group thereof) has a strong enough desire to gain entry into your systems, they most likely will eventually find a way in.
In December 2010, Gawker Media—one of the most popular social media blog networks, consisting of a conglomeration of eight different websites—found itself the unfortunate victim of a massive password database compromise. LinkedIn—a very large social media networking website tailored specifically to professional relationships—found itself in the same unfortunate circumstance in June of 2012. The social media gaming giant RockYou found its thirty million users' passwords compromised in December 2009 (this one was unique due to the fact that RockYou stored all of its passwords plaintext, not cryptographically secured). Just from January through April 2014 alone, over ten million cryptographic password hashes—possibly more—were released to the public from hacks against enormous media behemoths like Comcast, Yahoo!, and AOL. Now, as of June 2014, even eBay has found itself victim of a mass compromise, reporting a whopping potential 145 million compromised password hashes.
Indeed, if a hacker is persistent and skilled enough, they may invariably gain access to their target at some point or another. Even with hundreds of thousands of dollars of equipment, personnel, monitoring, and everything else watching the front gate—which, unarguably, are efficient and necessary tactics big players like Comcast and eBay utilize—eventually someone may be able to break through that hole in the fence that no one is looking at—no one except the attacker, that is. So much focus is put on the common entry point of a website that no one considers to continue layering the security on deeper. If not the actual authentication system itself, then how else are hackers able to gain entry, and why are they able to obtain such large treasure troves?
A Firewall Behind the Firewall: Protect the Data at the Database Itself
When your cryptographic password hash data is stored, it is just as critically important to isolate the hashes as it is to have secure hashes. The purpose of encrypting user passwords is indeed to inhibit the ability of an attacker from learning of your users' passwords and potentially compromising other accounts they hold elsewhere. But as we have seen with the widespread use and failures of MD5 and SHA1, even securing your users' passwords is not enough. Looking past the strength of the password hash used, why should an attacker even have access to the password hashes to begin with?
This all starts primarily with poor database security, which can come from any number of bad (in)security habits: unsanitized user input, dangerous or buggy code, non-segregated data, poor access control lists, and many more. Unsanitized user input – known in the industry as a SQL Injection, a topic we have discussed at great length previously – is the crux of all web security critical failures, especially ones that yield database treasure troves of password hash dumps. Since its inception, the Open Web Application Security Project (OWASP) has assembled a top ten list of web security vulnerabilities. Every year that list has been assembled, SQL injections have made the list. Furthermore, nearly every single compromise of a password hash database in the past several years has been possible at least in part because of SQL injections. We have exhausted this topic before – as have many hundreds of other organizations, corporations, even governments – and yet it still remains consistently the most damaging attack vector.
Before we cover SQL injections much further, we must once again and briefly harken back to our two-part series on PCI compliance – a merchant regulatory security standard organized by the major credit card corporations of the world: Visa, MasterCard, American Express, Discover, and Japan Credit Bureau – to revisit some topics that are incredibly important to every aspect of web security. Whether a content provider's data is as simple as RockYou's, or as critical as multi-million dollar banking, the six categories of PCI compliance are highly applicable to nearly any line of business that has a web-facing authentication portal. Of course, some PCI compliance requirements are potentially inapplicable – not every website can restrict data access at a digital or physical level, depending on their hosting scenario – but the core concept still holds valid: restrict and secure the data with multiple layers of security.
Indeed, securing the code that runs the website should be the only step required. However, perhaps it is impractical or infeasible to completely secure the SQL queries in the code used (by one comparison, Drupal has had over 20,000 lines of code committed, WordPress has had over 60,000 lines, and Joomla! has had over 180,000 lines). (The recent HeartBleed bug in the OpenSSL library is an excellent example of software with thousands of lines of code being used without inspection by thousands of users.) Or, it may simply be impossible to do so because the code is encoded, such as with SourceGuardian or ZenCrypt. Even with all these impracticalities, a content provider can still potentially shield against many of these attacks by using layers of firewalls.
Typically this might include some adaptive solution that rides on top of iptables or ipfw (depending if you are using Linux or a BSD variant, respectively), or perhaps a reactive Host Intrusion Detection System (HIDS) such as OSSEC, although these are often more complicated than desired and not exactly purpose-built for these uses. Instead, a content provider may wish to utilize a Web Application Firewall, which is designed specifically for these tasks. While there exist several enterprise-level solutions that are both a WAF and database firewall (sitting between your web application and your database), there are many open-source solutions, such as ModSecurity and IronBee, that perform remarkably well.
Although, a Web Application Firewall is not always secure, either, and may still allow a SQL injection or other method of attack to penetrate through. A common theme you may notice in this paper by now is our frequent mention of the word "layers," and for good reason, too. A Web Application Firewall itself is not enough, nor is just securing of the code a website runs, nor just monitoring, and so forth. However, when a content provider combines all of these approaches, frequently performs thorough web security audits and penetration scans, and encourages end-users to practice strong and modern security standards, the probability of a mass compromise drops significantly.
Wrap-up: Making Intrusions Fruitless to Attackers
Of course, no one will never be able to prevent every single type of attack and have 100% assurance that no one may ever gain unauthorized access into their systems. However, a content provider can implement layers of strong security standards to make any such intrusions fruitless for an attacker. Sure, they may be able to deface a website or mess with some content, but if a content provider employs strong layers of security, they may be able to keep the damage limited within that scope, or even less. A highly-secured communication pathway to the user, use of very expensive cryptographic password hash functions, layers of firewalls and data integrity/security checks from user to database and every step in between – all of these and more are critical components to ensuring your systems do not meet the same fate and embarrassment that even the largest organizations have unfortunately suffered.