Hashes and Integrity

Dr. Greg Bernstein

October 12th, 2021

Cryptographic Hashes

References

NCyTE Learning Objectives

From NCyTE

Students will be able to explain the use of hash functions in securing information.
Students will be able to describe the basic requirements for a cryptographic hash function.
Students will be able to identify commonly used algorithms for hashing.
Students will be able to illustrate how hashing works using online tools.

Additional Learning Objectives

Understand the difference between cryptographic and “regular” hash functions
Understand HMACs and HOTPs
Understand hash functions use key derivation functions and the appropriate types of different algorithms.
Understand the difference between the various types of random numbers

Attacks on Integrity

From Stallings, Cryptography and Network Security, 2003

Content modifications: changes to contents including insertions, deletion, transposition, modification
Sequence Modification: changes to a sequence of messages including reordering, insertion and deletion.
Timing Modification: delay or replay of messages.

Cryptographic Context

From NCyTE

Confidentiality: Encryption algorithms
Integrity: (Cryptographic) Hash functions
Authenticity: Digital signatures, digital certificates
Non-repudiation: Digital signatures, digital certificates

Recall Symmetric Encryption

From NCyTE

Recall Public Key Encryption

From NCyTE

Alternative Approach

Why not use encryption (symmetric or asymmetric)?

If I encrypt my message and someone tries to modify it, it won’t decrypt properly!
To prevent sequence modification put a sequence number in the plaintext
To prevent replay (timing) based attacks put a timestamp in the plaintext

Issues with Encrypt Everything

Encryption can be costly in terms of computational power, particularly public key based encryption
Not all data is “confidential”, for example open source software, but integrity is critical, i.e., executable files!!!
Some data doesn’t have sufficient structure to tell if decrypted data is “incorrect”

Encryption, Integrity, and Authentication Combined

However if you need to encrypt there are special block cipher modes

Authenticated encryption with additional data (AEAD) modes “A number of modes of operation have been designed to combine secrecy and authentication in a single cryptographic primitive.”
Galois/counter mode (GCM) combines the well-known counter mode of encryption with the new Galois mode of authentication.
Counter with cipher block chaining message authentication code (CCM) is an authenticated encryption algorithm.

Both these modes are used in TLS 1.3

Hash Functions

From NCyTE

A (cryptographic) hash function maps digital data of arbitrary size to digital data of fixed size. The hash is sometimes called a message digest.
A cryptographic hash function is a hash function that is considered practically impossible to invert (one-wayness) or find collisions (i.e. two messages with the same hash value).

Hash Diagram

From NCyTE

Crypto Hash Requirements 1

From NCyTE

Variable input/ Fixed output size: Can be applied to data of practically any size but the output is always a fixed number of bits.
Pre-image resistant: Computationally infeasible to find the input value for a given hash value (one-wayness).

Crypto Hash Requirements 2

From NCyTE

Collision resistant: Computationally infeasible to find two inputs that have the same hash value.
Efficiency: Easy (fast) to compute so practical on hardware and software
Pseudorandomness: The outputs pass tests designed to detect pseudorandomness.

Uses of Hash Function 1

From NCyTE

Digital Signatures: When you sign messages digitally, the hash value of the message is encrypted instead of the message itself. This allows messages of arbitrary lengths to be signed.
Password Files: Hashes of passwords are stored, not the passwords themselves. Because no one can see the plain password, this provides an extra layer of security in case the password file is stolen.

Uses of Hash Function 2

From NCyTE

Intrusion/Virus Detection: A change in the hash value of a file may indicate an intrusion or a virus.
Construction of a pseudorandom number generator: One of the required properties of a cryptographic function is that the output has to pass pseudorandomness tests.
File synchronization: Whether to upload a file or not for synchronization (for example with the cloud storage) can be determined by checking the hash value of the file has changed or not since the last update

Uses of Hash Functions 3

Is an important application that you use missing from the list?
Can you name a file synchronization program that uses (optionally) hashes?

Cryptographic Hash Function Examples 1

From NCyTE

SHA-1 – Designed by NSA
- Published by NIST in 1993 as Federal Info. Processing Standard
- Theoretical attacks developed for SHA1 in 2005 suggested the algorithm may not be secure enough for ongoing use.
SHA-2 – Also designed by NSA
- Published by NIST in 2001 as Federal Info. Processing Standard
- Includes six hash functions with digests that are 224, 256, 384 or 512 bits.

Cryptographic Hash Function Examples 2

From NCyTE

SHA-3 – Designed by Bertoni, Daemen,Peeters, Van Assche
- Published by NIST in 2015 as the new standard
- Not meant to replace SHA-2 as SHA-2 has not been broken
MD4 – Designed by Rivest in 1990
- 128 bit digests; used in TLS certificates
- Practical collision attacks were developed against it
MD5 – Similar to MD4;
- security severely compromised, so not suitable for cryptographic use.

Crypto Hash Example

From Wikipedia: Crypto Hash: A cryptographic hash function (specifically SHA-1) at work. A small change in the input (in the word “over”) drastically changes the output (digest).

Hands on with Python 1

Small change to a short message

from cryptography.hazmat.primitives import hashes
digest = hashes.Hash(hashes.SHA256())
digest.update(b"This is my message. It's not fancy, but it is only a test.")
digest_bytes1 = digest.finalize()
print(digest_bytes1.hex()) 
# d7bd857b3044c451e9492388c09c713fb131c2fcd58541376e220dc913e6d3dc

digest = hashes.Hash(hashes.SHA256())
digest.update(b"This is my message. It's not fancy, but it is only a test!")
digest_bytes2 = digest.finalize()
print(digest_bytes2.hex())
# 6373b945dc1d5616d1d06193752594a89e4a8b7b020f783b6d8b7c04f9d4571d

Hands on with Python 2

Hash a long book

sherlock = open("sherlockHolmes.txt", "r", encoding="utf8")
book = sherlock.read()
print(f"This book has {len(book)} characters")
digest = hashes.Hash(hashes.SHA256())
digest.update(book.encode("utf-8")) # encode turns the characters into bytes
digest_bytes_book = digest.finalize()
print(f"Hash: {digest_bytes_book.hex()}")

Hands on with Python 3

Small change to a long book

import random
index = random.randint(0, len(book)) # Choose a character to change
mod_book = book[0:index-1] + "x" + book[index:] # change to an "x"
print(f"The modified book has {len(mod_book)} characters")
digest = hashes.Hash(hashes.SHA256())
digest.update(mod_book.encode("utf-8"))
digest_bytes_book2 = digest.finalize()
print(f"Hash: {digest_bytes_book2.hex()}")

Hands on with Python 4

Python Cryptography supports the following modern hashes

SHA-2 family
SHA-3 family
BLAKE2 avoids length extension attacks and is used in a number of other algorithms including Argon2.

HMAC and HOTP

HMAC and HOTP References

Python Notebook Examples

From CryptToy

Jupyter Notebook (Python) RFC2104 implementing RFC 2104, HMAC Algorithm.
Jupyter Notebook (Python) RFC4226 implementing RFC 4226, HOTP Algorithm
Jupyter Notebook (Python) RFC6238 implementing RFC 6238, TOTP Algorithm

Python Types/Modules

Node.js/NPM Modules

otplib

Hash Based Message Authentication Code

From Wikipedia: HMAC

HMAC (hash-based message authentication code) is a specific type of message authentication code (MAC) involving a cryptographic hash function and a secret cryptographic key. …it may be used to simultaneously verify both the data integrity and the authenticity of a message. …HMAC can provide message authentication using a shared secret instead of using digital signatures

HMAC Algorithm 1

From Wikipedia: HMAC

H is a cryptographic hash function
m is the message to be authenticated

HMAC Algorithm 2

From Wikipedia: HMAC

K is the secret key
K’ is a block-sized key derived from the secret key, K; either by padding to the right with 0s up to the block size, or by hashing down to less than or equal to the block size first and then padding to the right with zeros
‖ denotes concatenation

HMAC Algorithm 3

From Wikipedia: HMAC

⊕ denotes bitwise exclusive or (XOR)
opad is the block-sized outer padding, consisting of repeated bytes valued 0x5c (“01011100”)
ipad is the block-sized inner padding, consisting of repeated bytes valued 0x36 (“00110110”)

HMAC Algorithm 4

You can use any different hash function
The hash function gets applied twice (inner and outer)
The key gets mixed with the ipad and opad for each use
The message gets concatenated with a key mix, then the hash of the message gets concatenated with a key mix.

Multi Factor Authentication 1

Something you have

We’ve shared a long, complicated secret key!
Prove it to me!
Here it is… whoops, now its not secret anymore…

Multi Factor Authentication 2

Maybe I should just share a hash of the key, but that doesn’t change…
How about I apply an HMAC with the key to a counter? (HOTP)
How about I apply an HMAC with the key to the current time? (TOTP)
These always change and protocol doesn’t allow “token” to be used again to prevent replay attacks.

HOTP

From Wikipedia: HMAC-based one-time password (HOTP)

HMAC-based one-time password (HOTP) is a one-time password (OTP) algorithm based on hash-based message authentication codes (HMAC). It is a cornerstone of the Initiative for Open Authentication (OATH). HOTP was published as an informational IETF RFC 4226 in December 2005… The HOTP algorithm is a freely available open standard.

HOTP Parameters

From Wikipedia: HMAC-based one-time password (HOTP)

A cryptographic hash method, H (default is SHA-1)
A secret key, K, which is an arbitrary byte string, and must remain private
A counter, C, which counts the number of iterations
A HOTP value length, d (6–10, default is 6, and 6–8 is recommended)*

HOTP Usage

From Wikipedia: HMAC-based one-time password (HOTP)

Both parties compute the HOTP value derived from the secret key K and the counter C. Then the authenticator checks its locally-generated value against the value supplied by the authenticated.

QR Code Contains the Secret Key!

Format Key Uri Format, key is in Base32 encoding
Wikipedia:QR Codes for the special URI
The Base16, Base32, and Base64 Data Encodings

TOTP

From Wikipedia: Time-based One-Time Password

Time-based One-time Password (TOTP) is a computer algorithm that generates a one-time password (OTP) which uses the current time as a source of uniqueness. As an extension of the HMAC-based One-time Password algorithm (HOTP), it has been adopted as Internet Engineering Task Force (IETF) standard RFC 6238.

Open Source Crypto Libraries

OpenSSL

From OpenSSL

OpenSSL is a robust, commercial-grade, and full-featured toolkit for the Transport Layer Security (TLS) and Secure Sockets Layer (SSL) protocols. It is also a general-purpose cryptography library.

Documentation: OpenSSL man pagess not very readable.

LibreSSL 1

From LibreSSL

LibreSSL is a version of the TLS/crypto stack forked from OpenSSL in 2014, with goals of modernizing the codebase, improving security, and applying best practice development processes.

LibreSSL 2

From LibreSSL – LibreSSL releases contain several parts:

libcrypto: a library of cryptography fundamentals
libssl: a TLS library
libtls: a new TLS library, designed to make it easier to write foolproof applications
Various utilities such as openssl(1), nc(1), and ocspcheck(8).

Questions and Answers

Isn’t SSL out of date and superceded by TLS? Yes
Why are do we care about OpenSSL and LibreSSL? These are just the names of the projects they both support the latest TLS1.3 standards and modern crypto algorithms.
Both include the openssl command which gets us direct access to many cypto algorithms
OpenSSH uses LibreSSL so you probably have the openssl command on your system.

Hands On with OpenSSL 1

openssl help

--- A bunch of other info but this is what we are interested in

Message Digest commands (see the `dgst' command for more details)
blake2b512        blake2s256        gost              md4
md5               mdc2              rmd160            sha1
sha224            sha256            sha3-224          sha3-256
sha3-384          sha3-512          sha384            sha512
sha512-224        sha512-256        shake128          shake256
sm3

Hands On with OpenSSL 2

Compute SHA3-256 and SHA512 Hashes

greg@DESKTOP-71U86TK MINGW64 /c/Greg/Customers/CSUEB/CS671/Crypto
$ openssl sha3-256 IntroToCrypto.pdf
SHA3-256(IntroToCrypto.pdf)= e63298e99eb4fe0f99009b9ef5d885cf310bd62f9772e29844b3b76d22e9a86c
(base)
greg@DESKTOP-71U86TK MINGW64 /c/Greg/Customers/CSUEB/CS671/Crypto
$ openssl sha512 IntroToCrypto.pdf
SHA512(IntroToCrypto.pdf)= c13b1fd7c5072a92cb14295988cdbf08abe6d58f6e80987d5f853b7ef463e04aee5f19337c00a19487b143128515ac55c1dc54fb082b666c05a9aa315a00c59f
(base)

Random Numbers

Classes of Random Numbers

Pseudo Random Number Generators (PRNGs) – Used for simulation
Cryptographically Secure Psuedo Random Numbr Generators (CSPRNGs) –
True random numbers

PRNG Definition

From wikipedia: PRNG

A pseudorandom number generator (PRNG) is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generated sequence is not truly random, because it is completely determined by an initial value, called the PRNG’s seed.

Issues with PRNGs

From wikipedia: PRNG

Shorter-than-expected periods for some seed states (such seed states may be called “weak” in this context);
Lack of uniformity of distribution for large quantities of generated numbers;
Correlation of successive values;
Poor dimensional distribution of the output sequence;
Distances between where certain values occur are distributed differently from those in a random sequence distribution.

Famous PRNGs

Linear Congruential Generators (LCGs) Class of algorithms used extensively in computing, but have issues.
Mersenne Twister 1997 designed to correct many of the flaws in previous generators. Almost all modern scientific software/libraries support this.
List of random number generators Good resource

Python PRNG

From Python Random Module

Crytographically Secure PRNGs

Wikipedia: CSPRNGs

A cryptographically secure pseudorandom number generator (CSPRNG) is a pseudorandom number generator (PRNG) with properties that make it suitable for use in cryptography.

CSPRNG requirements fall into two groups: first, that they pass statistical randomness tests; and secondly, that they hold up well under serious attack, even when part of their initial or running state becomes available to an attacker.

CSPRNG Implementations

Wikipedia: CSPRNGs

“A secure block cipher can be converted into a CSPRNG by running it in counter mode”
NIST: Random Bit Generation
NSA kleptographic backdoor in the Dual_EC_DRBG PRNG NSA got caught trying to insert a backdoor into a security standard.

True Random Number Generators

From Hardware random number generator

In computing, a hardware random number generator (HRNG) or true random number generator (TRNG) is a device that generates random numbers from a physical process, rather than by means of an algorithm.

Trusted Platform Modules (TPM)

From Wikipedia: TPM

OS Supplied Random Numbers

These are usually CSPRNGs that are started with some hopefully true random numbers “entropy”

The Windows 10 random number generation infrastructure
Linux Myths about /dev/urandom Good explanations!
Python secrets, os.urandom

Random Numbers Always a Hot Topic

High-speed harvesting of random numbers, Ingo Fischer, Daniel J. Gauthier, Science 26 Feb 2021: Vol. 371, Issue 6532, pp. 889-890
G. M. Bernstein and M. A. Lieberman, Secure Random Number Generation Using Chaotic Circuits, IEEE Transactions on Circuits and Systems, Vol. 37, No. 9, September 1990.
US Patent 5,007,087: Method and apparatus for generating secure random numbers using chaos, Bernstein; Greg M., Lieberman; Michael A., April 9, 1991.

Password hashing/Key stretching functions

Wikipedia: Key Stretching

In cryptography, key stretching techniques are used to make a possibly weak key, typically a password or passphrase, more secure against a brute-force attack by increasing the resources (time and possibly space) it takes to test each possible key.

Adaptive One-Way Functions

Make password encryption harder

Adaptive one-way functions compute a one-way (irreversible) transform. Each function allows configuration of ‘work factor’. Underlying mechanisms used to achieve irreversibility and govern work factors (such as time, space, and parallelism) vary between functions

Argon2 paper

Current Recommended Password “Stretching” Algorithms

Argon2 is the winner of the password hashing competition and should be considered as your first choice for new applications
PBKDF2 when FIPS certification or enterprise support on many platforms is required;
scrypt where resisting any/all hardware accelerated attacks is necessary but support isn’t.
bcrypt where PBKDF2 or scrypt support is not available.

General purpose Key derivation functions

From Wikipedia: Key derivation function

In cryptography, a key derivation function (KDF) is a cryptographic hash function that derives one or more secret keys from a secret value such as a main key, a password, or a passphrase using a pseudorandom function.

The most common is HKDF. Not for passwords!

Hashes and Integrity

Cryptographic Hashes

References

NCyTE Learning Objectives

Additional Learning Objectives

Attacks on Integrity

Cryptographic Context

Recall Symmetric Encryption

Recall Public Key Encryption

Alternative Approach

Issues with Encrypt Everything

Encryption, Integrity, and Authentication Combined

Hash Functions

Hash Diagram

Crypto Hash Requirements 1

Crypto Hash Requirements 2

Uses of Hash Function 1

Uses of Hash Function 2

Uses of Hash Functions 3

Cryptographic Hash Function Examples 1

Cryptographic Hash Function Examples 2

Crypto Hash Example

Hands on with Python 1

Hands on with Python 2

Hands on with Python 3

Hands on with Python 4

HMAC and HOTP

HMAC and HOTP References

Python Notebook Examples

Python Types/Modules

Node.js/NPM Modules

Hash Based Message Authentication Code

HMAC Algorithm 1

HMAC Algorithm 2

HMAC Algorithm 3

HMAC Algorithm 4

Multi Factor Authentication 1

Multi Factor Authentication 2

HOTP

HOTP Parameters

HOTP Usage

QR Code to share the Secret Key

QR Code Contains the Secret Key!

TOTP

Open Source Crypto Libraries

OpenSSL

LibreSSL 1

LibreSSL 2

Questions and Answers

Hands On with OpenSSL 1

Hands On with OpenSSL 2

Random Numbers

Classes of Random Numbers

PRNG Definition

Issues with PRNGs

Famous PRNGs

Python PRNG

Crytographically Secure PRNGs

CSPRNG Implementations

True Random Number Generators

Trusted Platform Modules (TPM)

OS Supplied Random Numbers

Random Numbers Always a Hot Topic

Closely Related Crypto Processes

Password hashing/Key stretching functions

Adaptive One-Way Functions

Current Recommended Password “Stretching” Algorithms

General purpose Key derivation functions