A simple flat array requires the entire expanse's size in memory usage
What if we use more than 1 array to lookup the index?
Divide our original array in half, and decode the index in steps.
Take the first half of the index and lookup a pointer to another array. Take the second half of the index and lookup the final value in the second array.
If the population is sparse we can likely just use null pointers for most of the entries in the first array, saving memory.
This basic data structure of multiple levels with multiple fanout pointers from each is known as a digital tree or "trie".
The order of a digital tree is the number of branches that can be taken at each level.
A low order wastes less memory because fewer entries in each branch are null pointers.
However a low order tree is deeper and requires more indirections to decode the key.
If we could somehow combine a high order with low memory waste...
Judy uses order 256 branches for performance
For a 32 bit expanse this translates to a maximum of 4 lookups
With vanilla tries this would waste a large amount of memory
Judy uses a variety of 'tricks' in order to compress the size of each branch or leaf depending on how 'full' it is
In a real machine main memory access is painfully slow, which is why memory is loaded into the CPU cache
The amount of memory read and cached at a time is typically 16 machine words long
Algorithms that want to be fast must be cache aware and be designed from the ground up to efficiently make use of the CPU cache
Can point to various classes of children nodes:
A JP is always 2 words in size and can be thought of as a two element array
For null, branches or leaves the first word is a standard pointer to the next child, and the second word encodes the pointer type, as well as the bytes decoded so far (sans the first one) and the population count of the subexpanse
This is an optimization when the population of the subexpanse is small
Instead of a further indirection to a leaf node to get the value, the values are 'immediately' packed into the pointer itself.
For JudyL a separate values only leaf is used as well
We have the Array Pointer which points either to a Root Leaf or a JPM
Root leaves allow a small number of items to be accessed directly
Once the array grows larger the root leaf is replaced with JPM
The JPM contains the array meta data and points to the first Judy Pointer
A Judy Pointer is a fat pointer that either points to a branch or leaf, or if there are few enough items, directly encodes the items into the pointer
A branch uses one of 3 possible strategies:
Only 16 words long
Used when population is small, generally up to 7
Along with a count, stores sorted list of populated subexpanses
Also stores list of Judy pointers to the next branch/leaf
Contains 2 tiers: the bitmap and subarrays
Bitmap is always 32 bytes long
Contains 8 bitmap subexpanses and 8 Judy pointers to the subarray
Subarray consists of packed list of Judy pointers, one for each bit set in the parent bitmap
Two tiers are used to keep the size of the bitmap so that it will fit within 16 words for efficient inserts and deletes
The pointers to subarrays are interspersed so that on machines with 8 word cache lines the number of fills is reduced
The uncompressed branch is very simple, it's just a standard array with empty values represented by null pointers
A leaf uses 1 one two possible strategies:
A packed array of indexes stored in sorted order that contain only the minimum number of bytes remaining to be decoded at the leaf's level in the tree
In addition JudyL leaves have a separate value area that is 1 word for each index
At the lowest level of the tree when there is only a single byte left to decode and sufficient population it saves memory to represent each index in the subexpanse as a bit in a 256 bit bitmap
For JudyL value subarrays are interspersed with the bitmap entries