Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Distribution Clustering

Entropy coding used by JPEG XL is adaptive, which means it has multiple distributions with varying symbol probability, and chooses appropriate one depending on the current context (hence adaptive). The number of contexts can be one or two, or as many as thousands. However we can’t provide distribution of every single context when there are thousands of those; it would be too costly.

Fortunately, distributions tend to form a cluster with similar symbol probability. Instead of signalling every single distribution, JPEG XL first reduces the number of distributions using distribution clustering.

// struct EntropyDecoder(num_dist: u32, lz77_allowed: bool) {
// ...

pub let (cluster_map, num_clusters) = read_clustering(num_dist);

// ...

Reading cluster map

We define a function read_clustering which reads a cluster map:

pub fn read_clustering(num_dist: u32) -> (Vec<u32>, u32) {
    if num_dist == 1 {
        // Trivial case; no need to cluster
        return vec![0];
    }

    let is_simple = read!(Bool);
    let cluster = if is_simple {
        read_simple(num_dist)
    } else {
        read_complex(num_dist)
    };
    let num_clusters = validate_cluster(&cluster);
    (cluster, num_clusters)
}

Clustering is represented as a list of cluster indices. You can think it as a function; it maps pre-clustering context [0, num_dist) to cluster index [0, num_cluster), where num_cluster is defined implicitly from the largest element of the list. There’s no “hole” in the mapping, meaning every integer in range 0..num_cluster is present.

If is_simple == true, distribution clustering is in simple encoding.

fn read_simple(num_dist: u32) -> Vec<u32> {
    let mut cluster = Vec::new();
    let nbits = read!(u(2));
    for _ in 0..num_dist {
        cluster.push(read!(u(nbits)));
    }
    cluster
}

If is_simple == false, it’s rather complex:

fn read_complex(num_dist: u32) -> Vec<u32> {
    let mut cluster = Vec::new();

    let use_move_to_front = read!(Bool);

    // Recursive entropy-coded stream
    let lz77_allowed = num_dist > 2;
    let mut decoder = read!(struct EntropyDecoder(1, lz77_allowed));
    decoder.init();
    for _ in 0..num_dist {
        cluster.push(decoder.read_uint(0));
    }
    decoder.validate();

    if use_move_to_front {
        // Move-to-front encoding
        inverse_move_to_front(&mut cluster);
    }
    cluster
}

Note that we’re reading another entropy-coded stream recursively. lz77_allowed is there to prevent additional recursion; if we allowed LZ77 there, the number of distribution would increase to 2, which might trigger another recursive entropy-coded stream.

Move-to-front encoding

Complex cluster encoding may optionally use move-to-front encoding. It keeps the most recently used indices in lower value.

fn inverse_move_to_front(encoded_map: &mut [u32]) {
    let mut mru: Vec<u32> = (0..256).collect();
    for slot in encoded_map {
        let idx = *slot as usize;
        assert!(idx < 256);

        let cluster_idx = mru[idx];
        *slot = cluster_idx;

        if idx > 0 {
            // Move last used index to front
            mru.copy_within(0..idx, 1);
            mru[0] = cluster_idx;
        }
    }
}

This helps encoding cluster maps with runs of same index.

Cluster map validation

Finally, we validate that there’s no hole in the cluster map:

fn validate_cluster(cluster_map: &[u32]) -> u32 {
    let num_clusters = *cluster_map.iter().max().unwrap() + 1;
    let cluster_indices: HashSet<_> = cluster_map.iter().copied().collect();
    assert_eq!(num_clusters, cluster_indices.len() as u32);
    num_clusters
}