What Is A Hash Function? (3 Key Things To Know)


Hash functions are used to store and retrieve data from tables and in cryptography to keep information secure.  However, it helps to know what a hash function is before you use one.

So, what is a hash function?  A hash function is a specialized function used for data storage, retrieval, & security. A hash function takes an input (data or a message) and returns an output (hash value), usually as a string of bits. A good hash function is fast and easy to compute, difficult to reverse, and collision-resistant.

Of course, there are lots of different hash functions, some of which are approved by the U.S. government for data security.

In this article, we’ll talk about hash functions and how they work.  We’ll also talk about what hash functions are used for in cryptography and cryptocurrency.

Let’s get started.

What Is A Hash Function?

A hash function is a specialized function used to make data storage and retrieval more efficient.  A hash function takes an input (data or a message) and maps it to an output (a hash value).

hash function no collision
A hash function takes an input (like a name or other data) and gives an output (hash value) that is usually an integer.

This hash value acts as the index for the corresponding message so that we can find it in a table or database.  A good hash function minimizes collisions (a hash collision occurs when two messages map to the same hash value).

As a result, we say that a hash function

If our hash function is f, our input is the message mi, and our output is the hash value hi, then we have:

  • f(m1) = h1
  • f(m2) = h2
  • f(m3) = h3

Instead of trying to match the message m1 to the proper record in the table that corresponds to m1, we can calculate f(m1) = h1 and look up the value of h1 in the table.  This will be more efficient than searching by matching the message m1.

When used in cryptography or data security, a hash function should be a one-way function.  This means two things:

  1. It is easy to find the output of the hash function, given any input we choose (the function is easy to compute).
  2. It is hard to find the input necessary to yield a given output (the function is hard to invert).

How Does A Hash Function Work?

A hash function takes a message (data) as the input and returns a hash value as the output.  The hash values are limited to a range of integer values.

For example, we could use a hash function to get hash values for a list of names:

  • f(Alan) = 12  [the input is Alan, and the output is 12]
  • f(Bob) = 7  [the input is Bob, and the output is 7]
  • f(Carl) = 1  [the input is Carl, and the output is 1]

As mentioned before, the ideal hash function minimizes (or eliminates) collisions.  A collision occurs when a hash function maps two inputs to the same output.

Here is an example of a collision:

  • f(Alan) = 12  [the input is Alan, and the output is 12]
  • f(Bob) = 7  [the input is Bob, and the output is 7]
  • f(Carl) = 1  [the input is Carl, and the output is 1]
  • f(Dave) = 12  [the input is Dave, and the output is 12]

Here, we have a collision between the names Alan and Dave (both inputs map to a hash value of 12).

When hash function collisions occur, there are ways to resolve the collisions (you can read more about that here).

hash function with collision
Collisions can occur with hash functions, but they can be minimized, and ones that occur can be resolved.

Can Hash Functions Be Reversed?

In general, hash functions cannot be reversed, and there are two main reasons for this:

  • 1. A hash function is relatively easy to do, but relatively hard to undo.  This is helpful for security because, for example, it makes it difficult to find a password even if you know the hash of the password.  To give an example that is easier to understand, consider chocolate milk.  It is easy to mix chocolate and milk together to make chocolate milk.  However, it is much more difficult to “undo” this combination and separate the mixture into chocolate and milk.
  • 2. If collisions are possible, then two (or more) inputs could lead to the same output.  Even if you find an input that leads to a given output, you don’t know if the original input was a different one.  In our example from earlier, you don’t know whether a name input of Alan or Dave was used to get the output of 12 for the hash value.
chocolate milk
It is easy to combine chocolate and milk to make chocolate milk. It is not so easy to reverse the process to separate the mixture into milk and chocolate. A hash function is similar: it is easy to go one way, but difficult to reverse the operation.

A hash function that cannot be reversed is sometimes called a one-way function or trapdoor function.  It is easy to go one way, but hard to go the other way.

Why Is A Hash Function Irreversible?

A hash function is irreversible because it is computationally infeasible (very difficult, time-consuming, and/or computing-power intensive) to find one input that leads to a given output (or an input that has the same output as another input).

This is why you cannot use a hash function alone for encryption.  It would be computationally difficult or impossible to decrypt.

In addition, there might be two or more messages that lead to the same hash!  This would lead to ambiguity – how would you know which message was the original one that the sender intended for you?

How To Use A Hash Function

Once you have chosen a hash function (more on specific ones later), you need to find an existing implementation (or implement it yourself).

To use a hash function, there are three basic steps:

  • 1. Evaluate the function (with the message as input) to calculate the hash value.
  • 2. Store the hash value in the hash table.
  • 3. Deal with hash collisions in the table, if necessary (this will happen rarely with collision-resistant hash functions).

What Can Hash Functions Be Used For?

Hash functions can be used for various data storage, retrieval, and security purposes, including:

  • CheckDigits
  • Checksums
  • Ciphers & Cryptography
  • Error Correcting Codes
  • Fingerprints
  • Lossy Compression
  • Passwords
  • Randomization Functions

For example, hash functions can help to improve security for websites with usernames and passwords for logins.  Instead of storing a user’s password as plaintext in a database, the website can store the hash of the password.

When the user enters the password, the hash of the password is computed.  If the hashed value from this session matches the stored value in the table for that username, access is granted.

Even if a hacker gains access to a hashed password for a user in the table, the hash of that password won’t work on the website.  Here is the reason:

  • When the legitimate user enters “password”, the hash f(password) is computed, and this matches what is in the table for that username.
  • When the hacker enters “f(password)”, the hash f(f(password) is computed, and this will not match what is in the table for that username (essentially, the password is hashed twice, not once).

The hacker will have great difficulty in reversing the hash function to find the password (not to mention the fact that more than one password could lead to the same hash value).

Why Use Hash Functions In Cryptography?

Hash functions are used in cryptography to calculate a checksum.  This helps to ensure that nobody has tampered with a message from the time it was signed and sent to the time it is received and read.

Here’s how it works, in summary:

  • 1. The sender composes the message.
  • 2. The sender encrypts the message (e.g. using the receiver’s public key) and uses a digital signature to sign the message (e.g. applying his private key in a public key cryptosystem).
  • 3. The sender evaluates the hash function (with the encrypted and signed message as input) to calculate the checksum.
  • 4. The sender sends the encrypted and signed message, along with the checksum, to the receiver.
  • 5. The receiver evaluates the hash function (with the encrypted and signed message as input) and verifies that it is the same value as the checksum received from the sender.
  • 6. The receiver “unsigns” the signed message (e.g. applying the sender’s public key in a public key cryptosystem).
  • 7. The receiver can now decrypt the message (using his private key) and read the message.

You can learn more about how hash functions are used in cryptography from this article on the SSL Store.

How To Use The Hash Function In Python

To use the hash function in Python, use the syntax “hash(input)”, where input is the text, message, or value that you want to find the hash of.  This function returns an integer value.

Objects can override the hash() function if you have a custom implementation.  Just remember to follow the principles of a good hash function: one that is easy and fast to compute, difficult to reverse, and collision-resistant.

You can learn more about the hash() function in Python here.

Hash Functions In Cryptocurrency

Cryptocurrencies (such as bitcoin) that use a blockchain ledger rely on hash functions to verify transactions.

For example, bitcoin uses hash functions to:

  • Track the chain of ownership of a bitcoin (by verification of digital signatures).
  • Implement a Timestamp Server (to prove that data existed when the hash was calculated).
  • Implement Proof of Work (find an input for a hash function that yields an output that begins with a given number of zero bits).
  • Reclaim Disk Space (hash transactions in a Merkle tree to save memory)

You can learn more about how hash functions are used in the bitcoin whitepaper by Satoshi Nakamoto.

bitcoin hash transactions
Hash functions are used in blockchain to implement proof of work.

What Hash Function Does Bitcoin Use?

Bitcoin uses the SHA-256 hash function.  This is a hash function in the SHA-2 family, developed by the United States National Security Agency (NSA).

This specific hash function returns outputs that are 256 bits (the others in the SHA-2 family return hash outputs that are 224, 384, 512 bits).

Each block in the bitcoin blockchain contains the SHA-256 hash of the previous block.  This hashing is what links one block in the chain to the next, giving the blockchain its name.

The SHA-256 function can also be used to adjust the difficulty of generating (validating) a block in the blockchain.  Since the rate of bitcoin mining decreases over time and computing power increases over time, the difficulty of validating a block must increase over time.

To do this, miners try to find a hash input that generates an output with a certain number of leading zeros.  The first miner to do this “wins” the contest and earns the bitcoin reward for validating the block and adding it to the blockchain.

You can learn more about hash functions in Bitcoin mining here.

What Hash Function Does Ethereum Use?

Ethereum uses the Keccack-256 hash function.  This is a hash function in the SHA-3 family, released by NIST (National Institute of Standards and Technology) on August 5, 2015.

Does Excel Have A Hash Function?

Excel does not have a hash function.  However, you can implement your own hash function in Excel by using custom (user-defined) function in VBA.

Is AES A Hash Function?

AES (Advanced Encryption Standard, also known as Rijndael) is a secure hash function and block cipher.  It takes a string of bits as input and returns a 256 bit string as the hash output.

In most cases, a small change in the input to AES will result in a large change to the output hash.

According to NIST, AES has the same properties and key length (256 bits) as SHA-256, with improved performance.  It supersedes DES (Data Encryption Standard) from 1977.

Is CRC A Hash Function?

CRC (Cyclic Redundancy Check) is a hash function whose output (checksum) is used to detect changes (accidental or otherwise) to data.

CRC is used to calculate a hash for a block of data.  When the data is retrieved, the CRC hash is calculated again to ensure that no error in the data is present.

Is SHA-1 A Hash Function?

SHA-1 is a hash function that takes an input and produces a hash output of 160 bits.  It was designed by the NSA, but has been superseded by SHA-2.

In 2011, NIST deprecated use of SHA-1 and disallowed it for use in digital signatures in 2013.

Conclusion

Now you know what a hash function is and how they are used in data storage, retrieval and security.  You also know a bit about how hash functions are used in cryptocurrency.

I hope you found this article helpful.  If so, please share it with someone who can use the information.

Don’t forget to subscribe to my YouTube channel & get updates on new math videos!

~Jonathon

Recent Posts