Introduction

This book is not an official JPEG XL product, and its development is driven by the community.

License

This documentation is released under CC BY 4.0.

Reading a Bitstream

A bitstream is, well, a stream of bits, though typically bitstreams are understood as series of bytes (that is, chunks of 8 bits).

ff 0a 7a 43 05 3b 01 00 ...

We need to unpack these bytes into bits. Let’s write those in binary form:

11111111 00001010 01111010 01000011
00000101 00111011 00000001 00000000
...

Bitstreams are read in byte order, lowest bits first. So if we read 12 bits from the beginning, we get 1 1 1 1 1 1 1 1 0 1 0 1.

  1111 1111
.<---------- read in this direction
|___________
  0000 1010 |
.<---------- then this
| and so on...

But there’s a twist: if we read multiple bits at once as an integer, bits read first become less significant bits. Which means, in previous example, it’s 0b_1010_1111_1111 as an integer, not 0b_1111_1111_0101. If we were to read three 4-bit integer, we would get 0b1111, 0b1111, and 0b1010.

This sounds a little bit confusing (pun not intended), but it has some advantage on modern CPUs. Think about this: when we have ff 0a, what will happen if we read 16-bit integer? We will get 0b_00001010_11111111, or 0x0aff. Sounds familiar? Yeah, we’ve read bytes in little endian order. So bitstreams can be read like: load an integer in little endian order, do some bitwise operations, and done. libjxl does it in (mostly) branchless way, and it’s quite fast.

Bitstream Operations

Thourghout this book, read!(Operation) means “read bitstream as specified in Operation”. Operation can be one of the following:

`u(N)`

Read N bits as an integer. If N == 0, the result is zero.

`U32(d0, d1, d2, d3)`

Returns unsigned 32-bit integer. Read an integer using the following pseudocode:

match read!(u(2)) {
    0 => read_u32!(d0),
    1 => read_u32!(d1),
    2 => read_u32!(d2),
    3 => read_u32!(d3),
}

Where read_u32! is defined as:

read_u32!(M)        := M as u32
read_u32!(u(N))     := read!(N) as u32
read_u32!(M + u(N)) := (M + read!(N)) as u32

`Bool`

Returns boolean value. Equals to read!(u(1)) != 0.

`U64`

Returns unsigned 64-bit integer. Read an integer using the following pseudocode:

match read!(u(2)) {
    0 => 0u64,
    1 => 1 + read!(u(4)) as u64,
    2 => 17 + read!(u(8)) as u64,
    3 => {
        let mut value = read!(u(12));
        let mut shift = 12;
        while read!(Bool) {
            if shift == 60 {
                value |= read!(u(4)) << shift;
                break;
            }
            value |= read!(u(8)) << shift;
            shift += 8;
        }
        value as u64
    }
}

For example, 0 is represented as two 0s, and u64::MAX corresponds to 73 consecutive 1s.

`F16`

Equals to f16::from_bits(read!(u(16))). Assert that the value is not NaN or infinities.

Note that you might need to convert bits to single precision float first, since most programming languages don’t have half precision float type.

`struct Ty(value, ...)`

Read a struct, optionally with contexts; see Structs and Enums.

`enum Ty`

Read an enum; see Structs and Enums.

`ZeroPadToByte`

Read bits until it reaches byte boundary. Assert that bits are all zeroes.

Structs and Enums

Structs

Struct is a collection of fields, defined with how to parse them from a bitstream. Struct definitions look like this:

struct ImageHeaders {
    let signature = read!(u(16));
    assert!(signature == 0x0aff);

    pub let size: ImageSize = read!(struct ImageSize);
    pub let metadata: ImageMetadata = read!(struct ImageMetadata);
}

let variable makes a temporary variable used within struct definition.
pub let field: Type defines a public field of struct.
assert!(condition) checks whether the given condition holds true.

Parse context

Structs can optionally receive context used during parsing:

struct FrameHeader(image_headers: ImageHeaders) {
    // ...
    pub let do_ycbcr: bool = if image_headers.metadata.xyb_encoded {
        false
    } else {
        read!(Bool)
    };
    // ...
}

read!(struct FrameHeader(headers));

Default implementation

Structs typically have default implementation. They are marked with default struct and invoked with Ty::default(context, ...).

struct ImageSize {
    // ...
    pub let height: u32 = /* ... */;
    pub let width: u32 = /* ... */;
    // ...
}

default struct ImageSize {
    pub let height: u32 = 0;
    pub let width: u32 = 0;
}

default struct ImageMetadata {
    // ...
    pub let intrinsic_size: ImageSize = ImageSize::default();
    // ...
}

Default implementation of a struct defines the same set of fields as in normal struct definition and doesn’t have any bitstream operation.

Enums

Enum is defined with a set of named integer – it’s a C-style enum.

enum FrameType {
    RegularFrame = 0,
    LfFrame = 1,
    ReferenceOnly = 2,
    SkipProgressive = 3,
}

To read an enum value, first read an integer using operation U32(0, 1, 2 + u(4), 18 + u(6)). Assert that the resulting integer is less than 64. Then, find the matching enum value in the definition; it’s an invalid value if there’s no matching one.

Reading Extensions

JPEG XL defines a special struct called Extensions in order to provide forward compatibility. Extension may appear as a field in some struct definitions.

struct Extensions {
    // This implies that there are at most 64 extensions.
    let active_extension_bitset = read!(U64);

    let mut extension_bit_length = Vec::new();
    for extension_id in 0..64 {
        let mask = 1 << extension_id;
        if active_extension_bitset & mask != 0 {
            // Extension with the ID `extension_id` is active.
            let bit_length = read!(U64);
            extension_bit_length.push((extension_id, bit_length));
        }
    }

    // Read data for each active extension.
    pub let mut extension_data = HashMap::new();
    for (extension_id, bit_length) in extension_bit_length {
        let bits = read_bits_to_bitstream!(bit_length);
        extension_data.insert(extension_id, bit_length);
    }
}

In this definition, read_bits_to_bitstream!(N) reads N bits from the bitstream and saves those bits as another bitstream.

Image Header

Image header is at the very beginning of a JPEG XL codestream. It contains the so-called magic signature, dimension (width and height) of the image, what color space the image is in, how many channels are in it, the (preferred or actual) bit depth, and so on. Basically it describes global attributes that apply to the whole image.

Decoder reads ImageHeaders at the beginning of bitstream. It contains 16-bit signature, ImageSize, and ImageMetadata. Following subchapters will explain those in detail.

struct `ImageHeaders`

struct ImageHeaders {
    let signature = read!(u(16));
    assert!(signature, 0x0aff); // ff 0a

    pub let size = read!(struct ImageSize);
    pub let metadata = read!(struct ImageMetadata);

    // Optional ICC profile, if the color encoding requires it
    pub let embedded_icc = if metadata.color_encoding.has_embedded_icc {
        Some(read_icc_profile())
    } else {
        None
    };
}

ImageHeaders is quite straightforward. It reads magic signature (ff 0a), dimension, metadata, and optional ICC profile. Note that ICC profile is read only when required.

struct `ImageSize`

struct ImageSize {
    let is_multiple_of_8 = read!(Bool);
    pub let height = if is_multiple_of_8 {
        let height_div8 = 1 + read!(u(5));
        height_div8 * 8
    } else {
        1 + read!(U32(u(9), u(13), u(18), u(30)))
    };

    // Aspect ratio
    let ratio = read!(u(3));
    pub let width = match ratio {
        0 => {
            if is_multiple_of_8 {
                let width_div8 = 1 + read!(u(5));
                width_div8 * 8
            } else {
                1 + read!(U32(u(9), u(13), u(18), u(30)))
            }
        }
        1 => height,
        2 => height * 6 / 5,
        3 => height * 4 / 3,
        4 => height * 3 / 2,
        5 => height * 16 / 9,
        6 => height * 5 / 4,
        7 => height * 2,
    };
}

ImageSize uses some technique to encode common image dimensions more densely.

Most images has some kind of well-known aspect ratio. JPEG XL skips encoding the width if aspect ratio is in the set of well-known ratio: 1:1, 6:5, 4:3, 3:2, 16:9, 5:4, 2:1.
For small images up to 256x256, it can use more compact encoding if both sides are multiple of 8. Exact conditions may differ if aspect ratio is taken into account.

For example, encoding 3840x2160 takes 19 bits:

Height is multiple of 8, but it’s not small enough. is_multiple_of_8 = false. Takes 1 bit.
For encoding height, 13-bit variant is used since encoding 2160 requires at least 12 bits. Takes 2 + 13 = 15 bits.
Aspect ratio is 16:9, which is well-known. Takes 3 bits.

Note that width or height cannot be zero, and width is rounded down if the aspect ratio calculation results in a fractional value.

Reading Entropy Decoder Configuration

This chapter describes how to read struct EntropyDecoder from a bitstream.

read!(struct EntropyDecoder(num_dist, lz77_allowed)) reads entropy decoder configuration, in the following order:

(Optional) LZ77 stream configuration.
Distribution clustering.
Which entropy coding type to use (prefix code or ANS).
Hybrid integer configuration for each post-clustered distribution.
Post-clustered symbol probability distributions.

Subchapters will explain each topic in detail.

Also, EntropyDecoder has a helper method parse:

/// Read an entropy decoder configuration with LZ77 allowed.
pub fn EntropyDecoder::parse(num_dist: u32) -> EntropyDecoder {
    read!(struct EntropyDecoder(num_dist, true))
}

LZ77 Stream

Entropy-coded streams can optionally be marked as LZ77-enabled.

// struct EntropyDecoder(num_dist: u32, lz77_allowed: bool) {

let lz77_enabled = read!(Bool);
pub let lz77_config = if lz77_enabled {
    // Prevent recursion going too deep
    assert!(lz77_allowed);

    let config = read!(struct Lz77Params(num_dist));
    // NOTE: This mutates `num_dist`
    num_dist += 1;
    Some(config)
} else {
    None
};

// ...

Where Lz77Params is defined as:

struct Lz77Params(dist_ctx: u32) {
    pub let min_symbol = read!(U32(224, 512, 4096, 8 + u(15)));
    pub let min_length = read!(U32(3, 4, 5 + u(2), 9 + u(8)));
    pub let dist_ctx = dist_ctx;
    pub let len_conf = read!(struct HybridIntegerConfig(8));
}

… and HybridIntegerConfig is defined in the corresponding chapter.

Now lz77_config contains whether LZ77 is enabled, and if so, parameters for handling LZ77 symbols. Note that num_dist is increased by one if LZ77 is enabled. The last context is used to read LZ77 distance when LZ77 length symbol is encountered during decoding.

lz77_allowed acts as a safeguard to prevent unbounded recursion; more on this in the next chapter.

Distribution Clustering

Entropy coding used by JPEG XL is adaptive, which means it has multiple distributions with varying symbol probability, and chooses appropriate one depending on the current context (hence adaptive). The number of contexts can be one or two, or as many as thousands. However we can’t provide distribution of every single context when there are thousands of those; it would be too costly.

Fortunately, distributions tend to form a cluster with similar symbol probability. Instead of signalling every single distribution, JPEG XL first reduces the number of distributions using distribution clustering.

// struct EntropyDecoder(num_dist: u32, lz77_allowed: bool) {
// ...

pub let (cluster_map, num_clusters) = read_clustering(num_dist);

// ...

Reading cluster map

We define a function read_clustering which reads a cluster map:

pub fn read_clustering(num_dist: u32) -> (Vec<u32>, u32) {
    if num_dist == 1 {
        // Trivial case; no need to cluster
        return vec![0];
    }

    let is_simple = read!(Bool);
    let cluster = if is_simple {
        read_simple(num_dist)
    } else {
        read_complex(num_dist)
    };
    let num_clusters = validate_cluster(&cluster);
    (cluster, num_clusters)
}

Clustering is represented as a list of cluster indices. You can think it as a function; it maps pre-clustering context [0, num_dist) to cluster index [0, num_cluster), where num_cluster is defined implicitly from the largest element of the list. There’s no “hole” in the mapping, meaning every integer in range 0..num_cluster is present.

If is_simple == true, distribution clustering is in simple encoding.

fn read_simple(num_dist: u32) -> Vec<u32> {
    let mut cluster = Vec::new();
    let nbits = read!(u(2));
    for _ in 0..num_dist {
        cluster.push(read!(u(nbits)));
    }
    cluster
}

If is_simple == false, it’s rather complex:

fn read_complex(num_dist: u32) -> Vec<u32> {
    let mut cluster = Vec::new();

    let use_move_to_front = read!(Bool);

    // Recursive entropy-coded stream
    let lz77_allowed = num_dist > 2;
    let mut decoder = read!(struct EntropyDecoder(1, lz77_allowed));
    decoder.init();
    for _ in 0..num_dist {
        cluster.push(decoder.read_uint(0));
    }
    decoder.validate();

    if use_move_to_front {
        // Move-to-front encoding
        inverse_move_to_front(&mut cluster);
    }
    cluster
}

Note that we’re reading another entropy-coded stream recursively. lz77_allowed is there to prevent additional recursion; if we allowed LZ77 there, the number of distribution would increase to 2, which might trigger another recursive entropy-coded stream.

Move-to-front encoding

Complex cluster encoding may optionally use move-to-front encoding. It keeps the most recently used indices in lower value.

fn inverse_move_to_front(encoded_map: &mut [u32]) {
    let mut mru: Vec<u32> = (0..256).collect();
    for slot in encoded_map {
        let idx = *slot as usize;
        assert!(idx < 256);

        let cluster_idx = mru[idx];
        *slot = cluster_idx;

        if idx > 0 {
            // Move last used index to front
            mru.copy_within(0..idx, 1);
            mru[0] = cluster_idx;
        }
    }
}

This helps encoding cluster maps with runs of same index.

Cluster map validation

Finally, we validate that there’s no hole in the cluster map:

fn validate_cluster(cluster_map: &[u32]) -> u32 {
    let num_clusters = *cluster_map.iter().max().unwrap() + 1;
    let cluster_indices: HashSet<_> = cluster_map.iter().copied().collect();
    assert_eq!(num_clusters, cluster_indices.len() as u32);
    num_clusters
}

Entropy Coding Types

JPEG XL supports two types of entropy coding methods: prefix code, and asymmetric numeral systems.

// struct EntropyDecoder(num_dist: u32, lz77_allowed: bool) {
// ...

let use_prefix_code = read!(Bool);
pub let log_alphabet_size = if use_prefix_code {
    15
} else {
    read!(u(2)) + 5
}

// ...

Prefix Code

If use_prefix_code == true, prefix code is used. Prefix code reduces the amount of encoded bits by assigning shorter code to more frequent symbols. Each code can be as long as 15 bits. Distributions are parsed in the exact same way as Brotli does; see Prefix Code chapter for details.

Asymmetric Numeral Systems

If use_prefix_code == false, asymmetric numeral systems (ANS) is used. ANS achieves compression by encoding symbols into an integer using, well, asymmetric numeral systems. Think of decimal numeral system we all know. In this system, we have 10 symbols, 0 to 9, and each appended symbol increases the magnitude of value equally – it’s symmetric. On the other hand, each symbol in ANS can affect the encoded value completely asymmetrically; for example, in an extreme case where there are frequent symbol 0 and rare symbol 1, symbol 0 could only increase the value by 0.02%, but the value would grow about 4096 times larger if symbol 1 is appended.

JPEG XL uses streaming rANS variant with alias mapping; see Asymmetric Numeral Systems chapter for details.

Hybrid Integer

Hybrid integer is a mechanism to insert raw, non-entropy-coded bits into entropy decoded symbol. It can be useful when there are some bits that can’t be encoded efficiently using entropy coding, e.g. the signal is noisy and have almost equal probability in a range of value.

// struct EntropyDecoder(num_dist: u32, lz77_allowed: bool) {
// ...

pub let mut uint_configs = vec![];
for _ in 0..num_clusters {
    uint_configs.push(read!(struct HybridIntegerConfig(log_alphabet_size)));
}

// ...

Hybrid integer config

struct HybridIntegerConfig(log_alphabet_size: u32) {
    let exp_bits = add_one_log2_ceil(log_alphabet_size);
    pub let split_exponent = read!(u(exp_bits));

    pub let (msb_in_token, lsb_in_token) = if split_exponent == log_alphabet_size {
        // Special case: no raw bits.
        (0, 0)
    } else {
        let msb_bits = add_one_log2_ceil(split_exponent);
        let msb_in_token = read!(u(msb_bits));
        assert!(msb_in_token <= split_exponent);

        let lsb_bits = add_one_log2_ceil(split_exponent - msb_in_token);
        let lsb_in_token = read!(u(lsb_bits));
        assert!(msb_in_token + lsb_in_token <= split_exponent);

        (msb_in_token, lsb_in_token)
    };
}

/// The number of bits needed to read a value up to `x`.
fn add_one_log2_ceil(x: u32) -> u32 {
    // NOTE: This can be done using integer types only.
    (x + 1).log2().ceil()
}

Hybrid integer config splits symbol (“token”) from entropy coded stream into three parts if the symbol is larger than or equal to 1 << split_exponent.

+-----------------------+
|         token         |
+---------+-------+-----+
| raw_exp |  msb  | lsb |
+---------+-------+-----+
     ^- split_exponent -^

Then it reads raw_bits according to raw_exp value, and constructs new symbol value.

+-+-------+------------+-----+
|1|  msb  |  raw_bits  | lsb |
+-+-------+------------+-----+

See [dedicated chapter] for details.

Symbol Probability Distributions

// struct EntropyDecoder(num_dist: u32, lz77_allowed: bool) {
// ...

pub let mut dists = Vec::new();
for _ in 0..num_clusters {
    let dist = if use_prefix_code {
        read!(struct PrefixDist(log_alphabet_size))
    } else {
        read!(struct AnsDist(log_alphabet_size))
    };

    dists.push(dist);
}

// ...

The decoder reads a set of prefix code distribution or ANS distribution, depending on the flag use_prefix_code read previously.

Prefix Code

Asymmetric Numeral Systems

Keyboard shortcuts

JPEG XL: From Bits to Pixels