Decentralized Identifiers and Number representation, DID Cryptography multiformats Python
Decentralized Identifiers (DIDs)
In this notebook I’m reviewing a simple form of Decentralized Identifier (DID). DIDs are intimately related to public key cryptography. Keeping control of your decentralized identity is tied to keeping control of your private key and making your public key known to those that need to know it. To preserve your privacy you can have as many DIDs as you wish, i.e., so you cannot be correlated across websites or other service providers.
Readings and References:
- A Primer for Decentralized Identifiers: An introduction to self-administered identifiers for curious people, November 2021.
- Decentralized Identifiers (DIDs) v1.0 Core architecture, data model, and representations
- DID Specification Registries The interoperability registry for Decentralized Identifiers
- The did:key Method v0.7 A DID Method for Static Cryptographic Keys
- did:web Method Specification
From the Core Architecture document:
Decentralized identifiers (DIDs) are a new type of identifier that enables verifiable, decentralized digital identity. A DID refers to any subject (e.g., a person, organization, thing, data model, abstract entity, etc.) as determined by the controller of the DID. In contrast to typical, federated identifiers, DIDs have been designed so that they may be decoupled from centralized registries, identity providers, and certificate authorities. Specifically, while other parties might be used to help enable the discovery of information related to a DID, the design enables the controller of a DID to prove control over it without requiring permission from any other party. DIDs are URIs that associate a DID subject with a DID document allowing trustable interactions associated with that subject.
We’ll be looking at creating the did:key
type of DID and how to turn
it into a DID document.
Additional Python packages:
- py-multibase
- base58
- py-multicodec
- cryptography Generally comes with Anacnoda and other python distributions.
Note that there may be other Python packages that can also perform these operations. These were some that were available at the time that I worked on this.
Extra files: DIDsAndNumbers.ipynb
import json
from multibase import encode, decode
import base58
import multicodec
Numbers and Such
When dealing with cryptography which is the foundation of DIDs and digital signatures, we need to deal with large integers, binary data, and textual representations of binary data. Here we look at the facility of Python and various standards that help with this.
Python Integers
Integers (int)
These represent numbers in an unlimited range, subject to available (virtual) memory only. For the purpose of shift and mask operations, a binary representation is assumed, and negative numbers are represented in a variant of 2’s complement which gives the illusion of an infinite string of sign bits extending to the left.
# Example of a ridiculusly large number used in Elliptic Curve Cryptography
# Its got about 135 digits!
prime448 = 2**448 - 2**224 - 1
print(f'A really big integer: {prime448}')
print(f'As a float: {float(prime448):g}')
A really big integer: 726838724295606890549323807888004534353641360687318060281490199180612328166730772686396383698676545930088884461843637361053498018365439
As a float: 7.26839e+134
Byte Stuff
When dealing with cryptographic information in the “raw”, we generally deal with them as a sequence of bytes. Python has a few special types to help us out.
From Binary Sequence Types — bytes, bytearray, memoryview
The core built-in types for manipulating binary data are
bytes
andbytearray
. They are supported by memoryview which uses the buffer protocol to access the memory of other binary objects without needing to make a copy.
Byte Objects:
Bytes objects are immutable sequences of single bytes. Since many major binary protocols are based on the ASCII text encoding, bytes objects offer several methods that are only valid when working with ASCII compatible data and are closely related to string objects in a variety of other ways.
Bytearray Objects
bytearray
objects are a mutable counterpart to bytes objects.
# Examples of a byte literals
my_bytes1 = b'Greg B.'
print(len(my_bytes1), type(my_bytes1))
7 <class 'bytes'>
# Look at individual values of the bytes as numbers
for b in my_bytes1:
print(b, end=", ")
71, 114, 101, 103, 32, 66, 46,
# We can create bytes from hex strings and
# convert bytes to hex strings
my_bytes2 = bytes.fromhex('2Eff ae31')
for b in my_bytes2:
print(b, end=", ")
print()
print(my_bytes2.hex())
46, 255, 174, 49,
2effae31
Binary to Text Encodings
Although we can represent bytes as hexidecimal strings this is quite inefficient in terms of message or storage space. Many different of ways of representing binary information such as images, cryptographic hashes, digital signatures, etc. in textual form.
References/Readings:
- Wikipedia: Binary to Text Encoding Gives an overview of many of the formats. About Base58 (used for Bitcoin addresses) “Similar to Base64, but modified to avoid both non-alphanumeric characters (+ and /) and letters that might look ambiguous when printed”
- Wikipedia: Base64 “In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in sequences of 24 bits that can be represented by four 6-bit Base64 digits.” Used frequently on the web. JavaScript has native support for one variant.
- Wikipedia: Base32 “Base32 is the base-32 numeral system. It uses a set of 32 digits, each of which can be represented by 5 bits.”
- Multibase “Multibase is a protocol for disambiguating the encoding of base-encoded (e.g., base32, base36, base64, base58, etc.) binary appearing in text.”
- py-multibase
Creation of Sample Binary Data
Here we create some simple binary test data via Python’s bytearray
class. This is nice to use since we can modify the elements of the byte
array after creation and create data on the fly.
# Let's fill up a bytearray with stuff
# Issue: bytearray and bytes are different things
my_bytesArray3 = bytearray()
for i in range(10):
my_bytesArray3.append(i%255)
print(f'{my_bytesArray3}, type: {type(my_bytesArray3)}')
print(f'In hex: {my_bytesArray3.hex()}')
bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t'), type: <class 'bytearray'>
In hex: 00010203040506070809
Warning bytes
versus bytearray
Here’s a tricky to find bug. The multibase
library works with bytes
or str
. If you give it a bytearray
it will be converted to a str
(string) and then processed. This is NOT what you want!!! You can
convert a bytearray
to bytes
as shown below.
# String representation of a bytearray
print(str(my_bytesArray3))
# Convert a bytearray to bytes:
my_bytes3 = bytes(my_bytesArray3)
print(str(my_bytes3))
bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t')
b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t'
Multibase Encodings
With so many different ways to text encode binary data, e.g., base64, base58btc, base32, etc., how can we tell what encoding was used? The Multibase Data Format utilizes a single starting character to differentiate the encoding, e.g., an “m” for base64 encoding, a “z” for base58btc encoding. Below we try out a few of these.
Reference/Readings:
- The Multibase Data Format IETF Draft, August 2022.
# Try out base64 encoding
my_64encode = encode('base64', my_bytes3)
print(my_64encode)
print(f'Length of base64 encoded string: {len(my_64encode)}')
print(f'Length of decoded bytes: {len(decode(my_64encode))}')
b'mAAECAwQFBgcICQ'
Length of base64 encoded string: 15
Length of decoded bytes: 10
Bug in Multibase Library
For Base58btc the multibase
library has a bug when the bytes start
with a zero byte, i.e., leading zero bytes get removed. The base58
library does not have this bug.
my_58encode = encode('base58btc', my_bytes3)
print(my_58encode)
print(f'Length of base58btc encoded string: {len(my_58encode)}')
print(f'Length of decoded string: {len(decode(my_58encode))}')
b'zkA3B2yGe2z4'
Length of base58btc encoded string: 12
Length of decoded string: 9
# Notice missing \x00 byte in string
print(decode(my_58encode))
b'\x01\x02\x03\x04\x05\x06\x07\x08\t'
# the base32 encoder doesn't have this problem
# we get back out what we put in
decode(encode('base32',my_bytes3))
b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t'
# The base58 library doesn't have this bug
print(base58.b58encode(my_bytes3))
print(base58.b58decode(base58.b58encode(my_bytes3)))
b'1kA3B2yGe2z4'
b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t'
Multiformats and such…
Great! We can now tell how the binary data is encoded into text but what does the binary data represent? It could be about anything, e.g., an IPv4 address, a SHA256 hash, a Ed25519 public key… This is where multiformats come in.
Idea: Add a byte or more to the begining of a string of binary data to describe what the following binary string is. Hence the operations of the Python multicodec library are to add and remove the prefix, and tell you what the prefix means.
What are Multiformats? The Multiformats Project is a collection of protocols which aim to future-proof systems, today. They do this mainly by enhancing format values with self-description. This allows interoperability, protocol agility, and helps us avoid lock in.
References/Readings:
# Try an example using the IPv4 "type"
my_binary_address = bytes([192,168, 1, 220]) # Local IPv4 address of my laptop
print(f"Raw bytes for IPv4 address: {my_binary_address}")
print(f"IPv4 encoding available: {multicodec.is_codec('ip4')}")
my_multi_address = multicodec.add_prefix('ip4', my_binary_address)
print(f"Raw bytes with encoding: {my_multi_address}")
my_base58_multi_address = encode('base58btc', my_multi_address)
print(f"Base58btc encoding of everything: {my_base58_multi_address}")
# Try reversing everything
my_decode_address = decode(my_base58_multi_address)
print(f"What encoding: {multicodec.get_codec(my_decode_address)}")
my_recovered_address = multicodec.remove_prefix(my_decode_address)
for b in my_recovered_address:
print(b, end=".")
Raw bytes for IPv4 address: b'\xc0\xa8\x01\xdc'
IPv4 encoding available: True
Raw bytes with encoding: b'\x04\xc0\xa8\x01\xdc'
Base58btc encoding of everything: b'zY6kNiw'
What encoding: ip4
192.168.1.220.
DID:KEY
To get started with decentralized identifiers (DIDs) and related technologies such as verifiable credentials it would be nice to have DIDs that we can use that don’t depend on other technologies such as web servers, block chains, etc… Since the foundation of a DID is public key cryptography the simplest way to represent a DID is just via a public key. This is the concept behine the did:key “method”:
The format for the did:key method conforms to the DID-CORE specification and is simple. It consists of the did:key prefix, followed by a Multibase base58-btc encoded value that is a concatenation of the Multicodec identifier for the public key type and the raw bytes associated with the public key format.
Alternatively, the encoding rules can also be thought of as the application of a series of transformation functions on the raw public key bytes:
did-key-format := did:key:MULTIBASE(base58-btc, MULTICODEC(public-key-type, raw-public-key-bytes))
Example 1: A simple Ed25519 did:key value
did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK
Extract Key Type and Bytes
# Let's try decoding the example above
ed255_Encoded = b'z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK'
ed255_Multi = decode(ed255_Encoded)
print(f"Key type: {multicodec.get_codec(ed255_Multi)}")
ed255_binary = multicodec.remove_prefix(ed255_Multi)
print(f"Key length: {len(ed255_binary)} bytes")
print(ed255_binary.hex())
Key type: ed25519-pub
Key length: 32 bytes
2e6fcce36701dc791488e0d0b1745cc1e33a4c1c9fcc41c63bd343dbbe0970e6
Create Your Own DIDs
Using the multiformat and multicodec packages we can create our own DIDs from a public key with just a line or two of code.
# Try generating and encoding our own
from cryptography.hazmat.primitives.asymmetric.ed25519 import Ed25519PrivateKey
from cryptography.hazmat.primitives.serialization import Encoding, PublicFormat, PrivateFormat, NoEncryption
private_key = Ed25519PrivateKey.generate()
public_key = private_key.public_key()
# Danger: in real life you don't flaunt your private key like this!
public_key_bytes = public_key.public_bytes(Encoding.Raw, PublicFormat.Raw)
private_key_bytes = private_key.private_bytes(Encoding.Raw, PrivateFormat.Raw, NoEncryption())
print('Public Key:')
print(public_key_bytes.hex())
print('Private Key:')
print(private_key_bytes.hex()) # For education purposes only!!!
Public Key:
4adf0bc18e092f5d36a86be3ddbfd6e48464089d23e164375d938c4535e45a15
Private Key:
cc0a53b3d0737f59e8d504bab0b01ae68ab6a6cdf4f3fcef7ad2201efe15baba
# Let's encode the public key for use in the did:key method
public_encoded = encode('base58btc', multicodec.add_prefix('ed25519-pub', public_key_bytes))
print(public_encoded)
b'z6MkjVXUgSXQJfbNYzqUJMEkU6oxp4N2WvnKCrdPsNPJAeLU'
DID Document Creation
The did:key
specification
tells how to create a DID document given a did:key:...
value. Below is
an example from the specification. Since there are a lot of places where
the encoded key value is repeated this is something to let Python do for
us via a function.
# Example from spec
example_doc = {
"@context": [
"https://www.w3.org/ns/did/v1",
"https://w3id.org/security/suites/ed25519-2020/v1",
"https://w3id.org/security/suites/x25519-2020/v1"
],
"id": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK",
"verificationMethod": [{ # Signature verification method
"id": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK#z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK",
"type": "Ed25519VerificationKey2020",
"controller": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK",
"publicKeyMultibase": "z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK"
}],
"authentication": [
"did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK#z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK"
],
"assertionMethod": [
"did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK#z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK"
],
"capabilityDelegation": [
"did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK#z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK"
],
"capabilityInvocation": [
"did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK#z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK"
],
"keyAgreement": [{
"id": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK#z6LSj72tK8brWgZja8NLRwPigth2T9QRiG1uH9oKZuKjdh9p",
"type": "X25519KeyAgreementKey2020",
"controller": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK",
"publicKeyMultibase": "z6LSj72tK8brWgZja8NLRwPigth2T9QRiG1uH9oKZuKjdh9p"
}]
}
# Function to create a DID document from a multibase encoded public key
# I'm leaving off the keyAreement stuff for now since I haven't found
# a python implementation for the Ed25519 to X25519 key derivation yet.
def create_did_doc(multi_pub):
doc = {
"@context": [
"https://www.w3.org/ns/did/v1",
"https://w3id.org/security/suites/ed25519-2020/v1"
# "https://w3id.org/security/suites/x25519-2020/v1"
],
"id": "did:key:" + multi_pub,
"verificationMethod": [{ # Signature verification method
"id": "did:key:" + multi_pub + "#" + multi_pub,
"type": "Ed25519VerificationKey2020",
"controller": "did:key:" + multi_pub,
"publicKeyMultibase": multi_pub
}],
"authentication": [
"did:key:" + multi_pub + "#" + multi_pub
],
"assertionMethod": [
"did:key:" + multi_pub + "#" + multi_pub
],
"capabilityDelegation": [
"did:key:" + multi_pub + "#" + multi_pub
],
"capabilityInvocation": [
"did:key:" + multi_pub + "#" + multi_pub
],
# "keyAgreement": [{
# "id": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK#z6LSj72tK8brWgZja8NLRwPigth2T9QRiG1uH9oKZuKjdh9p",
# "type": "X25519KeyAgreementKey2020",
# "controller": "did:key:z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK",
# "publicKeyMultibase": "z6LSj72tK8brWgZja8NLRwPigth2T9QRiG1uH9oKZuKjdh9p"
# }]
}
return doc
did_doc_dict = create_did_doc(public_encoded.decode('utf8'))
print(json.dumps(did_doc_dict, indent=2))
{
"@context": [
"https://www.w3.org/ns/did/v1",
"https://w3id.org/security/suites/ed25519-2020/v1"
],
"id": "did:key:z6MkjVXUgSXQJfbNYzqUJMEkU6oxp4N2WvnKCrdPsNPJAeLU",
"verificationMethod": [
{
"id": "did:key:z6MkjVXUgSXQJfbNYzqUJMEkU6oxp4N2WvnKCrdPsNPJAeLU#z6MkjVXUgSXQJfbNYzqUJMEkU6oxp4N2WvnKCrdPsNPJAeLU",
"type": "Ed25519VerificationKey2020",
"controller": "did:key:z6MkjVXUgSXQJfbNYzqUJMEkU6oxp4N2WvnKCrdPsNPJAeLU",
"publicKeyMultibase": "z6MkjVXUgSXQJfbNYzqUJMEkU6oxp4N2WvnKCrdPsNPJAeLU"
}
],
"authentication": [
"did:key:z6MkjVXUgSXQJfbNYzqUJMEkU6oxp4N2WvnKCrdPsNPJAeLU#z6MkjVXUgSXQJfbNYzqUJMEkU6oxp4N2WvnKCrdPsNPJAeLU"
],
"assertionMethod": [
"did:key:z6MkjVXUgSXQJfbNYzqUJMEkU6oxp4N2WvnKCrdPsNPJAeLU#z6MkjVXUgSXQJfbNYzqUJMEkU6oxp4N2WvnKCrdPsNPJAeLU"
],
"capabilityDelegation": [
"did:key:z6MkjVXUgSXQJfbNYzqUJMEkU6oxp4N2WvnKCrdPsNPJAeLU#z6MkjVXUgSXQJfbNYzqUJMEkU6oxp4N2WvnKCrdPsNPJAeLU"
],
"capabilityInvocation": [
"did:key:z6MkjVXUgSXQJfbNYzqUJMEkU6oxp4N2WvnKCrdPsNPJAeLU#z6MkjVXUgSXQJfbNYzqUJMEkU6oxp4N2WvnKCrdPsNPJAeLU"
]
}
Checking DID Document
You can copy and paste the above JSON string into the JSON-LD Playground to see if it is valid JSON-LD. The above worked when I last tried it…