A minimal perfect hash function (MPHF) implementation based on the CHD (Compress, Hash, Displace) algorithm, originally described in:
"Hash, Displace, and Compress" by Djamal Belazzougui, Fabiano C. Botelho, and Martin Dietzfelbinger (ESA 2009).
A perfect hash function maps a set of n keys to a set of m integers without collisions. If m = n, it is a minimal perfect hash function (MPHF). CHD generates an MPHF that:
- Uses ~2.0–2.5 bits per key (space-efficient).
- Supports O(1) lookup time.
- Requires O(n) construction time.
CHD operates in three phases:
- Compress – Hash each key into a bucket using a first-level hash function. Keys in the same bucket form a "group".
- Hash – For each bucket, try different hash function seeds (via a second-level hash family) until all keys in the bucket map to distinct positions in a global displacement table.
- Displace – Store the chosen seed (displacement value) for each bucket. During lookup, compute the bucket, retrieve the seed, and compute the final hash.
The result is a small data structure (a packed array of displacement values) that can be used to compute the rank of any key in the original set.