The evolution of password hashing
Password hashing, as the name suggests, is the process of calculating a hash value from a password. The hash value is typically stored in a database, and during the login (sign-in) process, the hash value of the password entered by the user is calculated and compared with the hash value stored in the database. If they match, the user is authenticated.
Before we dive into the evolution of password hashing algorithms, it's important to understand why it's necessary.
Plain text passwords: A major security risk
Imagine being a user of a website where you've registered an account. One day, the website gets hacked, and the database is leaked. If the website stores passwords in plaintext, the hacker can directly access your password. Since many people reuse passwords across multiple websites, the hacker can use this password to gain unauthorized access to your other accounts. The situation becomes even worse if you use the same or a similar password for your email account, as the hacker can reset your password and take over all your associated accounts.
Even without a data breach, in large teams, anyone with database access can see passwords. Compared to other information, passwords are highly sensitive, and you definitely don't want anyone to have access to them.
Storing passwords without hashing is an amateur mistake. Unfortunately, if you search for "password leak plaintext", you'll find that major corporations like Facebook, DailyQuiz, and GoDaddy have all experienced password leaks in plaintext. It's likely that many other companies have made the same error.
Encoding v.s. encryption v.s. hashing
These three terms are often confused, but they are distinct concepts.
Encoding is the first thing to exclude for password storage. For example, Base64 is an encoding algorithm that converts binary data into a string of characters:
const data = 'Hello, world!'; const encoded = Buffer.from(data).toString('base64'); console.log(encoded); // SGVsbG8sIHdvcmxkIQ==
Knowing the encoding algorithm allows anyone to decode the encoded string and retrieve the original data:
const encoded = 'SGVsbG8sIHdvcmxkIQ=='; const data = Buffer.from(encoded, 'base64').toString(); console.log(data); // Hello, world!
To hackers, most encoding algorithms are equivalent to plaintext.
Before hashing gained popularity, encryption was used to store passwords, such as with AES. Encryption involves using a key (or a pair of keys) to encrypt and decrypt data.
The problem with encryption is evident in the term "decrypt". Encryption is reversible, meaning that if a hacker obtains the key, they can decrypt the password and retrieve the plaintext password.
The primary difference between hashing, encoding, and encryption is that hashing is irreversible. Once a password is hashed, it cannot be decrypted back to its original form.
As a website owner, you don't actually need to know the password itself, as long as the user can log in with the correct password. The registration process can be simplified as follows:
- User enters the password.
- The service uses a hashing algorithm to calculate the hash value of the password.
- The service stores the hash value in the database.
When the user logs in, the process is:
- User enters the password.
- The service uses the same hashing algorithm to calculate the hash value of the password.
- The service compares the hash value with the hash value stored in the database.
- If the hash values match, the user is authenticated.
Both processes avoid storing passwords in plaintext, and since hashing is irreversible, even if the database is compromised, the hacker can only obtain hash values that appear as random strings.
Hashing algorithms starter pack
Hashing may seem like the perfect solution for password storage, but it's not that simple. To understand why, let's explore the evolution of password hashing algorithms.
In 1992, Ron Rivest designed the MD5 algorithm, a message-digest algorithm that can calculate a 128-bit hash value from any data. MD5 has been widely used in various fields, including password hashing. For example, the MD5 hash value of "123456" is:
As mentioned earlier, the hash value appears as a random string and is irreversible. Moreover, MD5 is fast and easy to implement, making it the most popular password hashing algorithm.
However, MD5's advantages are also its weaknesses in password hashing. Its speed makes it vulnerable to brute-force attacks. If a hacker possesses a list of common passwords and your personal information, they can calculate the MD5 hash value of each combination and compare them with the hash values in the database. For example, they might combine your birthday with your name or your pet's name.
In the present day, computers are significantly more powerful than before, making it easy to brute force MD5 password hashes.
So, why not use a different algorithm that generates longer hash values? The SHA family seems like a good choice. SHA-1 is a hashing algorithm that generates 160-bit hash values, and SHA-2 is a family of hashing algorithms that generate hash values of 224-bit, 256-bit, 384-bit, and 512-bit lengths. Let's see the SHA-256 hash value of "123456":
The SHA-256 hash value is much longer than MD5, and it is also irreversible. However, there's another issue: if you already know the hash value, like the one above, and you see the exact hash value in the database, you know the password is "123456". A hacker can create a list of common passwords and their corresponding hash values, and compare them against the hash values in the database. This list is known as a rainbow table.
To mitigate rainbow table attacks, the concept of salt was introduced. Salt is a random string that is added to the password before hashing. For example, if the salt is "salt", and you want to use SHA-256 to hash the password "123456" with the salt, instead of simply doing:
You would do:
sha256('salt123456'); // 9898410d7f5045bc673db80c1a49b74f088fd7440037d8ce25c7d272a505bce5
As you can see, the result is completely different from hashing without salt. Typically, each user is assigned a random salt during registration, which is stored in the database alongside the hash value. During the login process, the salt is used to calculate the hash value of the entered password, which is then compared to the stored hash value.
Despite the addition of salt, the hash value is still susceptible to brute-force attacks as hardware becomes more powerful. To make it harder, iteration (i.e., running the hashing algorithm multiple times) can be introduced. For example, instead of using:
You could use:
sha256('salt' + sha256('salt123456'));
Increasing the number of iterations makes brute-forcing more difficult. However, this also affects the login process, as it becomes slower. Therefore, a balance between security and performance needs to be struck.
Let's take a break and summarize the characteristics of a good password hashing algorithm:
- Irreversible (preimage resistance)
- Difficult to brute force
- Resistant to rainbow table attacks
As you may have noticed, salt and iteration are necessary to satisfy all these requirements. The issue is that both MD5 and the SHA family were not specifically designed for password hashing; they are widely used for integrity checks (or "message-digest"). As a result, each website may have its own implementation of salt and iteration, making standardization and migration challenging.
Password hashing algorithms
To address this problem, several hashing algorithms have been specifically designed for password hashing. Let's take a look at some of them.
bcrypt is a password hashing algorithm designed by Niels Provos and David Mazières. It is widely used in many programming languages. Here's an example of bcrypt hash value:
Although it appears as another random string, it contains additional information. Let's break it down:
- The first section
$2yindicates the algorithm, which is
- The second section
$12indicates the number of iterations, which is
12. This means the hashing algorithm will be run 212=4096 times (iterations).
- The third section
wNt7lt/xf8wRJgPU7kK2juis the salt.
- The last section
GrirhHK4gdb0NiCRdsSoAxqQoNbiluuis the hash value.
bcrypt has some limitations:
- The maximum length of the password is 72 bytes.
- The salt is limited to 16 bytes.
- The hash value is limited to 184 bits.
Given the debates and limitations of existing password hashing algorithms, a password hashing competition was held in 2015. Skipping the details, let's focus on the winner: Argon2.
Argon2 is a password hashing algorithm designed by Alex Biryukov, Daniel Dinu, and Dmitry Khovratovich. It introduces several new concepts:
- Memory-hard: The algorithm is designed to be hard to parallelize, making brute-forcing with GPUs challenging.
- Time-hard: The algorithm is designed to be hard to optimize, making brute-forcing with ASICs (Application-specific integrated circuits) difficult.
- Side-channel resistant: The algorithm is designed to be resistant to side-channel attacks, such as timing attacks.
There are two main versions of Argon2, Argon2i and Argon2d. Argon2i is the safest against side-channel attacks, while Argon2d provides the highest resistance against GPU cracking attacks.
Here's an example of an Argon2 hash value:
Let's break it down:
- The first section
$argon2iindicates the algorithm, which is
- The second section
$v=19indicates the version, which is
- The third section
$m=16,t=2,p=1indicates the memory cost, time cost, and parallelism degree, where are
- The fourth section
$YTZ5ZnpXRWN5SlpjMHBDRQis the salt.
- The last section
$12oUmJ6xV5bIadzZHkuLTgis the hash value.
In Argon2, the maximum length of the password is 232-1 bytes, the salt is limited to 232-1 bytes, and the hash value is limited to 232-1 bytes. This should suffice for most scenarios.
Over the years, password hashing algorithms have undergone significant evolution. We owe a debt of gratitude to the security community for their decades of effort in making the internet a safer place. Thanks to their contributions, developers can pay more attention to building better services without worrying about the security of password hashing. While achieving 100% security in a system may be unattainable, we can employ diverse strategies to minimize the associated risks.
If you'd like to avoid the hassle of implementing authentication and authorization, feel free to try Logto for free. We provide secure (we use Argon2!), reliable, and scalable solutions, enabling you to focus on building your product.