Back when I first started “spending too much time on the computer,” I wanted to be a hacker1. The Uploading Virus scene in Independence Day was my main inspiration. I toiled on HackThisSite.org in my early teenage years but all I learned was that hacking wasn’t as easy as Jeff Goldblum made it look2.
Now that I write software for a living, I’ve decided to revisit one of the most compelling branches of cyber mischief: password cracking3. For this project, I’ll set up a login authentication program and explore what having a strong password really means. While nothing I’ll attempt here will break the law or help people break the law, the world of cyber security is terrifying. If you don’t use long, completely unique passwords (with two-factor authentication) for each website you frequent, after reading this post it may be more difficult for you to fall asleep at night.
The problem with cracking passwords directly within login forms is that every website will lock you out after a small number of attempts. In addition, any small delays that are induced can drastically slow down a program that needs to try trillions of different combinations. Password cracking, in reality, can only be performed by those who obtain dumps (massive lists) of compromised password hashes or who find other ways to subvert the security measures of online validation. Once the process is taken offline, the safeguards don’t matter much.
I’ve elected to ignore safeguards and focus on the most critical component for the sake of demonstration: a fast feedback loop. My method of password authentication will be an oversimplification. I’ll write a function that checks if a generated “guess” matches the actual, plaintext, password. I don’t need to worry about how any one website or application implements password authentication, because it all boils down to one simple string equality comparison.
I’m running a Macbook Pro with a 2.5 GHz Intel Core i7. The power of a modern consumer-grade laptop may surprise you.
Challenge 1: How easy is it to crack a password?
Let’s say a password consists of just letters and numbers- no symbols. Let’s also say that we know the length of the password. This reduces complexity, but not by much. The program I wrote tries every possible combination of letters and numbers, starting with the character ‘a’ through ‘Z’ through ‘9’ for a total of 62 possible characters. To get an idea of the maximum number of attempts that will be required for each password length, simply raise the size of the character set to the length of the password (62^length). This is the platform of the brute force approach. Here, I haven’t considered running multiple threads, renting time on a powerful server, or in any way tuning my code for performance. How easy is it to crack the password “kVb117D3iC”?
|Password||Length||Actual Attempts||Maximum Attempts||Time (nanoseconds)||Time||Attempts/Second|
|* Colored cells contain calculated rather than observed results|
Most shocking about these results is that my Macbook can perform 60-100 million attempts per second4. Aside from occurring a bit faster than one might predict, the time it takes to crack a password increases reliably with password length. If length isn’t known, which it never is, cracking a password of a given length will additionally take the cumulative time of all previous lengths. The increase in complexity will be less than an order of magnitude.
The troubling question is that, if this is what happens on a Macbook Pro, then what can a supercomputer do? If I was to parallelize and optimize my rudimentary program to run on a GPU-based Amazon p2.16xlarge instance5, I suspect I’d see a performance gain somewhere two and four orders of magnitude. Suddenly, cracking difficult passwords becomes child’s play.
Challenge 2: What if a cyber criminal only gets three guesses?
So, every once in a while there will be a major data breach, it will spawn a news frenzy, you’ll change your password on the affected site, and then you’ll be safe, right? Not exactly. With each new breach, cyber criminals get access to millions of new email/password configurations which they can then mine for useful insights into the ways average people approach password security. Here’s my speculation regarding what methods a sophisticated criminal is going to try if he only gets three guesses.
A Common Password Backed By Data Science
Lists of the most common passwords exist. And while they’re not constantly updated to account for each password dump6, some are available to the public, such as 10k Most Common Passwords. By my estimation, a wannabe hacker with access to this list will have a single-digit-percent chance at accessing an arbitrary account on any website, assuming the website didn’t force users to use numbers, symbols, or passwords above a certain length.
While most services do enforce such basic restrictions on new passwords, those that don’t are vulnerable. This approach can be tuned using simple data sources like social media profile information. If you have access to a password list that includes email addresses, there is nothing stopping you from looking up a Facebook or LinkedIn account by email and then determining that those who are fans of WWE are more likely to use “Stonecold” (#5193 on the frequency list) as their password. Or, maybe men over the age of 62 more commonly use “verizon” (#5501) or “comcast” (#8893) as their password. You could mine this data endlessly- the end result being that you could probably increase the chances of a “shot in the dark” password attempt approaching high-single-digit effectiveness in three guesses on the right website.
You don’t see any good-guy data science projects revolving around this topic, because the risks of disseminating these insights would be considered unethical. You can be certain, though, that your data is being used against you in ways you don’t want to think about.
Grandfathered-In Weak Passwords
One exploitable but rarely mentioned fact is that, when many users were registering for websites still in their infancies, the more strict password rules of today weren’t enforced. For example, the first iteration of Paypal might have allowed a five-character password with no numbers- plenty of sites did. As time went on, standards became more rigid, but, in most cases, early users were never forced to change their passwords. If a hacker can obtain information regarding a website’s historical password standards along with a user’s signup date (this is often proudly displayed), your account becomes low hanging fruit. If you think you fit into this category, change your password immediately.
Using the Same Password for Other Accounts
You are probably guilty of using the same password for multiple accounts. After a breach, you’ll be encouraged to change all your matching passwords, and hopefully you will. However, you’ll probably forget about the Pinterest account that you stopped accessing in 2012 or that the Lowe’s website stored your credit card information when you remodeled your kitchen last fall. The larger the website, the greater the chances that someone who has your username and password, however old, will try to log in.
The risk to your older accounts is magnified by the fact that many websites, from social media to forums, make it easy to determine when you were last active. The older the account, the greater the chance that its password is 1. weak and 2. correlated with other old accounts and compromised passwords from that time period.
Account Linking, User Profiling, and More Password Trends
There have been almost two billion accounts compromised in major data breaches, according to haveibeenpwned.com. As more and more data trickles down from the dark web and into the hands of data scientists, the chances of your secure accounts being compromised is increasing. First, as websites become more social and interconnected, it becomes easier for someone with bad intentions to outline your internet browsing profile. Whenever email addresses are revealed, hackers can start to compile lists of secondary and tertiary email addresses linked to one identity. With enough work, the password associated with an old, seemingly useless AIM screen name of LinkinPark4Evr@aim.com7 can become associated with Tim@tjohearn.com.
Then, once you’ve been pwned by multiple data breaches, for example MySpace, Tumblr, Adobe, and LinkedIn, hackers can not only begin to understand your conventions for choosing passwords, but also trends for entire swaths of users. When forced to change a password, how many people are just incrementing the last number in their password by one or adding an arbitrary character to the end of the password? How many people’s passwords have revolved around the same sports team for the last decade?
As the implications of data being used against you become more clear, the three-guess safeguard turns from a guarantee of safety to an invitation for self-doubt and nail biting. We are totally ignorant of the calamities that have been avoided by hackers simply not utilizing data well enough.
Tim O'Hearn just changed all his passwords. You can reach him at the email address listed above that doesn't include the name of his favorite nu-metal band. Subscribe and be part of the conversation.
1. Having just read The Cathedral and the Bazaar, I understand that this is a misnomer, and using “hacker” is inconsiderate to those involved in the culture, but I don’t think it would be worth explaining this difference in detail or spending the entire post dancing around what will seem like silly semantics to the majority of readers.
2. I had no idea what I was doing, and apparently neither did anyone else, because the site has one of the lowest active user bases, by percentage, of anywhere on the internet
3. I am blurring the line here a bit- I’m totally ignoring that most password lists that are cracked offline are actually hashed and that depending on the hashing scheme there is another degree of variability and overall user protection involved. This approach is, in absence of safeguards, totally viable and possible. Also, I’m totally ignoring phishing, social engineering, and keylogging, the former two being much more important concerns than general password strength.
4. Note that I kept the calculation time constant and calculated the last few lengths. The less attempts per second for shorter passwords is due to more apparent overhead due to relatively less time spent in the massive for loop. Also note that my computer was running several other programs at the same time, so this shouldn’t be considered a good benchmark for MacBook performance.
5. This will absolutely be the focus of a future project.
6. For good reason. Constantly updating these public lists would be unethical with very little public benefit, plus obtaining dumped password lists is a legal gray area.
7. This wasn’t my actual screen name, but I would probably look back on those years more fondly if it had been.