Hashes and Integrity

Dr. Greg Bernstein

October 12th, 2021

Cryptographic Hashes


NCyTE Learning Objectives

From NCyTE

  • Students will be able to explain the use of hash functions in securing information.
  • Students will be able to describe the basic requirements for a cryptographic hash function.
  • Students will be able to identify commonly used algorithms for hashing.
  • Students will be able to illustrate how hashing works using online tools.

Additional Learning Objectives

  • Understand the difference between cryptographic and “regular” hash functions
  • Understand HMACs and HOTPs
  • Understand hash functions use key derivation functions and the appropriate types of different algorithms.
  • Understand the difference between the various types of random numbers

Attacks on Integrity

From Stallings, Cryptography and Network Security, 2003

  • Content modifications: changes to contents including insertions, deletion, transposition, modification

  • Sequence Modification: changes to a sequence of messages including reordering, insertion and deletion.

  • Timing Modification: delay or replay of messages.

Cryptographic Context

From NCyTE

  • Confidentiality: Encryption algorithms
  • Integrity: (Cryptographic) Hash functions
  • Authenticity: Digital signatures, digital certificates
  • Non-repudiation: Digital signatures, digital certificates

Recall Symmetric Encryption

From NCyTE

Symmetric Crytography

Recall Public Key Encryption

From NCyTE

Asymmetric Cryptography

Alternative Approach

Why not use encryption (symmetric or asymmetric)?

  • If I encrypt my message and someone tries to modify it, it won’t decrypt properly!
  • To prevent sequence modification put a sequence number in the plaintext
  • To prevent replay (timing) based attacks put a timestamp in the plaintext

Issues with Encrypt Everything

  • Encryption can be costly in terms of computational power, particularly public key based encryption

  • Not all data is “confidential”, for example open source software, but integrity is critical, i.e., executable files!!!

  • Some data doesn’t have sufficient structure to tell if decrypted data is “incorrect”

Encryption, Integrity, and Authentication Combined

However if you need to encrypt there are special block cipher modes

  • Authenticated encryption with additional data (AEAD) modes “A number of modes of operation have been designed to combine secrecy and authentication in a single cryptographic primitive.”

  • Galois/counter mode (GCM) combines the well-known counter mode of encryption with the new Galois mode of authentication.

  • Counter with cipher block chaining message authentication code (CCM) is an authenticated encryption algorithm.

Both these modes are used in TLS 1.3

Hash Functions

From NCyTE

  • A (cryptographic) hash function maps digital data of arbitrary size to digital data of fixed size. The hash is sometimes called a message digest.

  • A cryptographic hash function is a hash function that is considered practically impossible to invert (one-wayness) or find collisions (i.e. two messages with the same hash value).

Hash Diagram

From NCyTE

Hash concept

Crypto Hash Requirements 1

From NCyTE

  • Variable input/ Fixed output size: Can be applied to data of practically any size but the output is always a fixed number of bits.

  • Pre-image resistant: Computationally infeasible to find the input value for a given hash value (one-wayness).

Crypto Hash Requirements 2

From NCyTE

  • Collision resistant: Computationally infeasible to find two inputs that have the same hash value.

  • Efficiency: Easy (fast) to compute so practical on hardware and software

  • Pseudorandomness: The outputs pass tests designed to detect pseudorandomness.

Uses of Hash Function 1

From NCyTE

  • Digital Signatures: When you sign messages digitally, the hash value of the message is encrypted instead of the message itself. This allows messages of arbitrary lengths to be signed.

  • Password Files: Hashes of passwords are stored, not the passwords themselves. Because no one can see the plain password, this provides an extra layer of security in case the password file is stolen.

Uses of Hash Function 2

From NCyTE

  • Intrusion/Virus Detection: A change in the hash value of a file may indicate an intrusion or a virus.

  • Construction of a pseudorandom number generator: One of the required properties of a cryptographic function is that the output has to pass pseudorandomness tests.

  • File synchronization: Whether to upload a file or not for synchronization (for example with the cloud storage) can be determined by checking the hash value of the file has changed or not since the last update

Uses of Hash Functions 3

  • Is an important application that you use missing from the list?

  • Can you name a file synchronization program that uses (optionally) hashes?

Cryptographic Hash Function Examples 1

From NCyTE

  • SHA-1 – Designed by NSA
    • Published by NIST in 1993 as Federal Info. Processing Standard
    • Theoretical attacks developed for SHA1 in 2005 suggested the algorithm may not be secure enough for ongoing use.
  • SHA-2 – Also designed by NSA
    • Published by NIST in 2001 as Federal Info. Processing Standard
    • Includes six hash functions with digests that are 224, 256, 384 or 512 bits.

Cryptographic Hash Function Examples 2

From NCyTE

  • SHA-3 – Designed by Bertoni, Daemen,Peeters, Van Assche
    • Published by NIST in 2015 as the new standard
    • Not meant to replace SHA-2 as SHA-2 has not been broken
  • MD4 – Designed by Rivest in 1990
    • 128 bit digests; used in TLS certificates
    • Practical collision attacks were developed against it
  • MD5 – Similar to MD4;
    • security severely compromised, so not suitable for cryptographic use.

Crypto Hash Example

From Wikipedia: Crypto Hash: A cryptographic hash function (specifically SHA-1) at work. A small change in the input (in the word “over”) drastically changes the output (digest).

Hash example

Hands on with Python 1

Small change to a short message

from cryptography.hazmat.primitives import hashes
digest = hashes.Hash(hashes.SHA256())
digest.update(b"This is my message. It's not fancy, but it is only a test.")
digest_bytes1 = digest.finalize()
# d7bd857b3044c451e9492388c09c713fb131c2fcd58541376e220dc913e6d3dc

digest = hashes.Hash(hashes.SHA256())
digest.update(b"This is my message. It's not fancy, but it is only a test!")
digest_bytes2 = digest.finalize()
# 6373b945dc1d5616d1d06193752594a89e4a8b7b020f783b6d8b7c04f9d4571d

Hands on with Python 2

Hash a long book

sherlock = open("sherlockHolmes.txt", "r", encoding="utf8")
book = sherlock.read()
print(f"This book has {len(book)} characters")
digest = hashes.Hash(hashes.SHA256())
digest.update(book.encode("utf-8")) # encode turns the characters into bytes
digest_bytes_book = digest.finalize()
print(f"Hash: {digest_bytes_book.hex()}")

Hands on with Python 3

Small change to a long book

import random
index = random.randint(0, len(book)) # Choose a character to change
mod_book = book[0:index-1] + "x" + book[index:] # change to an "x"
print(f"The modified book has {len(mod_book)} characters")
digest = hashes.Hash(hashes.SHA256())
digest_bytes_book2 = digest.finalize()
print(f"Hash: {digest_bytes_book2.hex()}")

Hands on with Python 4

Python Cryptography supports the following modern hashes


HMAC and HOTP References

Python Notebook Examples

From CryptToy

Python Types/Modules

Node.js/NPM Modules

Hash Based Message Authentication Code

From Wikipedia: HMAC

HMAC (hash-based message authentication code) is a specific type of message authentication code (MAC) involving a cryptographic hash function and a secret cryptographic key. …it may be used to simultaneously verify both the data integrity and the authenticity of a message. …HMAC can provide message authentication using a shared secret instead of using digital signatures

HMAC Algorithm 1

From Wikipedia: HMAC

  • H is a cryptographic hash function
  • m is the message to be authenticated

HMAC Algorithm 2

From Wikipedia: HMAC

  • K is the secret key
  • K’ is a block-sized key derived from the secret key, K; either by padding to the right with 0s up to the block size, or by hashing down to less than or equal to the block size first and then padding to the right with zeros
  • ‖ denotes concatenation

HMAC Algorithm 3

From Wikipedia: HMAC

  • ⊕ denotes bitwise exclusive or (XOR)
  • opad is the block-sized outer padding, consisting of repeated bytes valued 0x5c (“01011100”)
  • ipad is the block-sized inner padding, consisting of repeated bytes valued 0x36 (“00110110”)

HMAC Algorithm 4

  • You can use any different hash function
  • The hash function gets applied twice (inner and outer)
  • The key gets mixed with the ipad and opad for each use
  • The message gets concatenated with a key mix, then the hash of the message gets concatenated with a key mix.

Multi Factor Authentication 1

Something you have

  • We’ve shared a long, complicated secret key!
  • Prove it to me!
  • Here it is… whoops, now its not secret anymore…

Multi Factor Authentication 2

  • Maybe I should just share a hash of the key, but that doesn’t change…
  • How about I apply an HMAC with the key to a counter? (HOTP)
  • How about I apply an HMAC with the key to the current time? (TOTP)
  • These always change and protocol doesn’t allow “token” to be used again to prevent replay attacks.


From Wikipedia: HMAC-based one-time password (HOTP)

HMAC-based one-time password (HOTP) is a one-time password (OTP) algorithm based on hash-based message authentication codes (HMAC). It is a cornerstone of the Initiative for Open Authentication (OATH). HOTP was published as an informational IETF RFC 4226 in December 2005… The HOTP algorithm is a freely available open standard.

HOTP Parameters

From Wikipedia: HMAC-based one-time password (HOTP)

  • A cryptographic hash method, H (default is SHA-1)
  • A secret key, K, which is an arbitrary byte string, and must remain private
  • A counter, C, which counts the number of iterations
  • A HOTP value length, d (6–10, default is 6, and 6–8 is recommended)*

HOTP Usage

From Wikipedia: HMAC-based one-time password (HOTP)

Both parties compute the HOTP value derived from the secret key K and the counter C. Then the authenticator checks its locally-generated value against the value supplied by the authenticated.

QR Code to share the Secret Key

Contains the URI: otpauth://totp/Secure%20App:alice%40google.com?secret=JBSWY3DPEHPK3PXP&issuer=Secure%20App

QR Code Contains the Secret Key!


From Wikipedia: Time-based One-Time Password

Time-based One-time Password (TOTP) is a computer algorithm that generates a one-time password (OTP) which uses the current time as a source of uniqueness. As an extension of the HMAC-based One-time Password algorithm (HOTP), it has been adopted as Internet Engineering Task Force (IETF) standard RFC 6238.

Open Source Crypto Libraries


From OpenSSL

OpenSSL is a robust, commercial-grade, and full-featured toolkit for the Transport Layer Security (TLS) and Secure Sockets Layer (SSL) protocols. It is also a general-purpose cryptography library.

Documentation: OpenSSL man pagess not very readable.

LibreSSL 1

From LibreSSL

LibreSSL is a version of the TLS/crypto stack forked from OpenSSL in 2014, with goals of modernizing the codebase, improving security, and applying best practice development processes.

LibreSSL 2

From LibreSSL – LibreSSL releases contain several parts:

  • libcrypto: a library of cryptography fundamentals
  • libssl: a TLS library
  • libtls: a new TLS library, designed to make it easier to write foolproof applications
  • Various utilities such as openssl(1), nc(1), and ocspcheck(8).

Questions and Answers

  • Isn’t SSL out of date and superceded by TLS? Yes

  • Why are do we care about OpenSSL and LibreSSL? These are just the names of the projects they both support the latest TLS1.3 standards and modern crypto algorithms.

  • Both include the openssl command which gets us direct access to many cypto algorithms

  • OpenSSH uses LibreSSL so you probably have the openssl command on your system.

Hands On with OpenSSL 1

openssl help

--- A bunch of other info but this is what we are interested in

Message Digest commands (see the `dgst' command for more details)
blake2b512        blake2s256        gost              md4
md5               mdc2              rmd160            sha1
sha224            sha256            sha3-224          sha3-256
sha3-384          sha3-512          sha384            sha512
sha512-224        sha512-256        shake128          shake256

Hands On with OpenSSL 2

Compute SHA3-256 and SHA512 Hashes

greg@DESKTOP-71U86TK MINGW64 /c/Greg/Customers/CSUEB/CS671/Crypto
$ openssl sha3-256 IntroToCrypto.pdf
SHA3-256(IntroToCrypto.pdf)= e63298e99eb4fe0f99009b9ef5d885cf310bd62f9772e29844b3b76d22e9a86c
greg@DESKTOP-71U86TK MINGW64 /c/Greg/Customers/CSUEB/CS671/Crypto
$ openssl sha512 IntroToCrypto.pdf
SHA512(IntroToCrypto.pdf)= c13b1fd7c5072a92cb14295988cdbf08abe6d58f6e80987d5f853b7ef463e04aee5f19337c00a19487b143128515ac55c1dc54fb082b666c05a9aa315a00c59f

Random Numbers

Classes of Random Numbers

  • Pseudo Random Number Generators (PRNGs) – Used for simulation

  • Cryptographically Secure Psuedo Random Numbr Generators (CSPRNGs) –

  • True random numbers

PRNG Definition

From wikipedia: PRNG

A pseudorandom number generator (PRNG) is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generated sequence is not truly random, because it is completely determined by an initial value, called the PRNG’s seed.

Issues with PRNGs

From wikipedia: PRNG

  • Shorter-than-expected periods for some seed states (such seed states may be called “weak” in this context);
  • Lack of uniformity of distribution for large quantities of generated numbers;
  • Correlation of successive values;
  • Poor dimensional distribution of the output sequence;
  • Distances between where certain values occur are distributed differently from those in a random sequence distribution.

Famous PRNGs

Python PRNG

From Python Random Module

Python Random Module

Crytographically Secure PRNGs

Wikipedia: CSPRNGs

A cryptographically secure pseudorandom number generator (CSPRNG) is a pseudorandom number generator (PRNG) with properties that make it suitable for use in cryptography.

CSPRNG requirements fall into two groups: first, that they pass statistical randomness tests; and secondly, that they hold up well under serious attack, even when part of their initial or running state becomes available to an attacker.

CSPRNG Implementations

Wikipedia: CSPRNGs

True Random Number Generators

From Hardware random number generator

In computing, a hardware random number generator (HRNG) or true random number generator (TRNG) is a device that generates random numbers from a physical process, rather than by means of an algorithm.

Trusted Platform Modules (TPM)

From Wikipedia: TPM

TPM diagram

OS Supplied Random Numbers

These are usually CSPRNGs that are started with some hopefully true random numbers “entropy”

Random Numbers Always a Hot Topic

  • High-speed harvesting of random numbers, Ingo Fischer, Daniel J. Gauthier, Science 26 Feb 2021: Vol. 371, Issue 6532, pp. 889-890

  • G. M. Bernstein and M. A. Lieberman, Secure Random Number Generation Using Chaotic Circuits, IEEE Transactions on Circuits and Systems, Vol. 37, No. 9, September 1990.

  • US Patent 5,007,087: Method and apparatus for generating secure random numbers using chaos, Bernstein; Greg M., Lieberman; Michael A., April 9, 1991.

Password hashing/Key stretching functions

Wikipedia: Key Stretching

In cryptography, key stretching techniques are used to make a possibly weak key, typically a password or passphrase, more secure against a brute-force attack by increasing the resources (time and possibly space) it takes to test each possible key.

Adaptive One-Way Functions

Make password encryption harder

Adaptive one-way functions compute a one-way (irreversible) transform. Each function allows configuration of ‘work factor’. Underlying mechanisms used to achieve irreversibility and govern work factors (such as time, space, and parallelism) vary between functions

Argon2 paper

General purpose Key derivation functions

From Wikipedia: Key derivation function

In cryptography, a key derivation function (KDF) is a cryptographic hash function that derives one or more secret keys from a secret value such as a main key, a password, or a passphrase using a pseudorandom function.

The most common is HKDF. Not for passwords!

// reveal.js plugins