Introduction
This book is not an official JPEG XL product, and its development is driven by the community.
License
This documentation is released under CC BY 4.0.
Reading a Bitstream
A bitstream is, well, a stream of bits, though typically bitstreams are understood as series of bytes (that is, chunks of 8 bits).
ff 0a 7a 43 05 3b 01 00 ...
We need to unpack these bytes into bits. Let’s write those in binary form:
11111111 00001010 01111010 01000011
00000101 00111011 00000001 00000000
...
Bitstreams are read in byte order, lowest bits first. So if we read 12 bits from the beginning, we
get 1 1 1 1
1 1 1 1
0 1 0 1
.
1111 1111
.<---------- read in this direction
|___________
0000 1010 |
.<---------- then this
| and so on...
But there’s a twist: if we read multiple bits at once as an integer, bits read first become less
significant bits. Which means, in previous example, it’s 0b_1010_1111_1111
as an integer, not
0b_1111_1111_0101
. If we were to read three 4-bit integer, we would get 0b1111
, 0b1111
, and
0b1010
.
This sounds a little bit confusing (pun not intended), but it has some advantage on modern CPUs.
Think about this: when we have ff 0a
, what will happen if we read 16-bit integer? We will get
0b_00001010_11111111
, or 0x0aff
. Sounds familiar? Yeah, we’ve read bytes in little endian order.
So bitstreams can be read like: load an integer in little endian order, do some bitwise operations,
and done. libjxl does it in (mostly) branchless way, and it’s quite fast.
Bitstream Operations
Thourghout this book, read!(Operation)
means “read bitstream as specified in Operation
”.
Operation
can be one of the following:
u(N)
Read N
bits as an integer. If N == 0
, the result is zero.
U32(d0, d1, d2, d3)
Returns unsigned 32-bit integer. Read an integer using the following pseudocode:
match read!(u(2)) {
0 => read_u32!(d0),
1 => read_u32!(d1),
2 => read_u32!(d2),
3 => read_u32!(d3),
}
Where read_u32!
is defined as:
read_u32!(M) := M as u32
read_u32!(u(N)) := read!(N) as u32
read_u32!(M + u(N)) := (M + read!(N)) as u32
Bool
Returns boolean value. Equals to read!(u(1)) != 0
.
U64
Returns unsigned 64-bit integer. Read an integer using the following pseudocode:
match read!(u(2)) {
0 => 0u64,
1 => 1 + read!(u(4)) as u64,
2 => 17 + read!(u(8)) as u64,
3 => {
let mut value = read!(u(12));
let mut shift = 12;
while read!(Bool) {
if shift == 60 {
value |= read!(u(4)) << shift;
break;
}
value |= read!(u(8)) << shift;
shift += 8;
}
value as u64
}
}
For example, 0 is represented as two 0
s, and u64::MAX
corresponds to 73 consecutive 1
s.
F16
Equals to f16::from_bits(read!(u(16)))
. Assert that the value is not NaN
or infinities.
Note that you might need to convert bits to single precision float first, since most programming languages don’t have half precision float type.
struct Ty(value, ...)
Read a struct, optionally with contexts; see Structs and Enums.
enum Ty
Read an enum; see Structs and Enums.
ZeroPadToByte
Read bits until it reaches byte boundary. Assert that bits are all zeroes.
Structs and Enums
Structs
Struct is a collection of fields, defined with how to parse them from a bitstream. Struct definitions look like this:
struct ImageHeaders {
let signature = read!(u(16));
assert!(signature == 0x0aff);
pub let size: ImageSize = read!(struct ImageSize);
pub let metadata: ImageMetadata = read!(struct ImageMetadata);
}
let variable
makes a temporary variable used within struct definition.pub let field: Type
defines a public field of struct.assert!(condition)
checks whether the given condition holds true.
Parse context
Structs can optionally receive context used during parsing:
struct FrameHeader(image_headers: ImageHeaders) {
// ...
pub let do_ycbcr: bool = if image_headers.metadata.xyb_encoded {
false
} else {
read!(Bool)
};
// ...
}
read!(struct FrameHeader(headers));
Default implementation
Structs typically have default implementation. They are marked with default struct
and invoked
with Ty::default(context, ...)
.
struct ImageSize {
// ...
pub let height: u32 = /* ... */;
pub let width: u32 = /* ... */;
// ...
}
default struct ImageSize {
pub let height: u32 = 0;
pub let width: u32 = 0;
}
default struct ImageMetadata {
// ...
pub let intrinsic_size: ImageSize = ImageSize::default();
// ...
}
Default implementation of a struct defines the same set of fields as in normal struct definition and doesn’t have any bitstream operation.
Enums
Enum is defined with a set of named integer – it’s a C-style enum.
enum FrameType {
RegularFrame = 0,
LfFrame = 1,
ReferenceOnly = 2,
SkipProgressive = 3,
}
To read an enum value, first read an integer using operation U32(0, 1, 2 + u(4), 18 + u(6))
.
Assert that the resulting integer is less than 64. Then, find the matching enum value in the
definition; it’s an invalid value if there’s no matching one.
Reading Extensions
JPEG XL defines a special struct called Extensions
in order to provide forward compatibility.
Extension may appear as a field in some struct definitions.
struct Extensions {
// This implies that there are at most 64 extensions.
let active_extension_bitset = read!(U64);
let mut extension_bit_length = Vec::new();
for extension_id in 0..64 {
let mask = 1 << extension_id;
if active_extension_bitset & mask != 0 {
// Extension with the ID `extension_id` is active.
let bit_length = read!(U64);
extension_bit_length.push((extension_id, bit_length));
}
}
// Read data for each active extension.
pub let mut extension_data = HashMap::new();
for (extension_id, bit_length) in extension_bit_length {
let bits = read_bits_to_bitstream!(bit_length);
extension_data.insert(extension_id, bit_length);
}
}
In this definition, read_bits_to_bitstream!(N)
reads N
bits from the bitstream and saves those
bits as another bitstream.
Image Header
Image header is at the very beginning of a JPEG XL codestream. It contains the so-called magic signature, dimension (width and height) of the image, what color space the image is in, how many channels are in it, the (preferred or actual) bit depth, and so on. Basically it describes global attributes that apply to the whole image.
Decoder reads ImageHeaders
at the beginning of bitstream. It contains 16-bit signature,
ImageSize
, and ImageMetadata
. Following subchapters will explain those in detail.
struct ImageHeaders
struct ImageHeaders {
let signature = read!(u(16));
assert!(signature, 0x0aff); // ff 0a
pub let size = read!(struct ImageSize);
pub let metadata = read!(struct ImageMetadata);
// Optional ICC profile, if the color encoding requires it
pub let embedded_icc = if metadata.color_encoding.has_embedded_icc {
Some(read_icc_profile())
} else {
None
};
}
ImageHeaders
is quite straightforward. It reads magic signature (ff 0a
), dimension, metadata,
and optional ICC profile. Note that ICC profile is read only when required.
struct ImageSize
struct ImageSize {
let is_multiple_of_8 = read!(Bool);
pub let height = if is_multiple_of_8 {
let height_div8 = 1 + read!(u(5));
height_div8 * 8
} else {
1 + read!(U32(u(9), u(13), u(18), u(30)))
};
// Aspect ratio
let ratio = read!(u(3));
pub let width = match ratio {
0 => {
if is_multiple_of_8 {
let width_div8 = 1 + read!(u(5));
width_div8 * 8
} else {
1 + read!(U32(u(9), u(13), u(18), u(30)))
}
}
1 => height,
2 => height * 6 / 5,
3 => height * 4 / 3,
4 => height * 3 / 2,
5 => height * 16 / 9,
6 => height * 5 / 4,
7 => height * 2,
};
}
ImageSize
uses some technique to encode common image dimensions more densely.
- Most images has some kind of well-known aspect ratio. JPEG XL skips encoding the width if aspect ratio is in the set of well-known ratio: 1:1, 6:5, 4:3, 3:2, 16:9, 5:4, 2:1.
- For small images up to 256x256, it can use more compact encoding if both sides are multiple of 8. Exact conditions may differ if aspect ratio is taken into account.
For example, encoding 3840x2160
takes 19 bits:
- Height is multiple of 8, but it’s not small enough.
is_multiple_of_8 = false
. Takes 1 bit. - For encoding height, 13-bit variant is used since encoding
2160
requires at least 12 bits. Takes 2 + 13 = 15 bits. - Aspect ratio is 16:9, which is well-known. Takes 3 bits.
Note that width or height cannot be zero, and width is rounded down if the aspect ratio calculation results in a fractional value.
Reading Entropy Decoder Configuration
This chapter describes how to read struct EntropyDecoder
from a bitstream.
read!(struct EntropyDecoder(num_dist, lz77_allowed))
reads entropy decoder configuration, in the
following order:
- (Optional) LZ77 stream configuration.
- Distribution clustering.
- Which entropy coding type to use (prefix code or ANS).
- Hybrid integer configuration for each post-clustered distribution.
- Post-clustered symbol probability distributions.
Subchapters will explain each topic in detail.
Also, EntropyDecoder
has a helper method parse
:
/// Read an entropy decoder configuration with LZ77 allowed.
pub fn EntropyDecoder::parse(num_dist: u32) -> EntropyDecoder {
read!(struct EntropyDecoder(num_dist, true))
}
LZ77 Stream
Entropy-coded streams can optionally be marked as LZ77-enabled.
// struct EntropyDecoder(num_dist: u32, lz77_allowed: bool) {
let lz77_enabled = read!(Bool);
pub let lz77_config = if lz77_enabled {
// Prevent recursion going too deep
assert!(lz77_allowed);
let config = read!(struct Lz77Params(num_dist));
// NOTE: This mutates `num_dist`
num_dist += 1;
Some(config)
} else {
None
};
// ...
Where Lz77Params
is defined as:
struct Lz77Params(dist_ctx: u32) {
pub let min_symbol = read!(U32(224, 512, 4096, 8 + u(15)));
pub let min_length = read!(U32(3, 4, 5 + u(2), 9 + u(8)));
pub let dist_ctx = dist_ctx;
pub let len_conf = read!(struct HybridIntegerConfig(8));
}
… and HybridIntegerConfig
is defined in the corresponding chapter.
Now lz77_config
contains whether LZ77 is enabled, and if so, parameters for handling LZ77 symbols.
Note that num_dist
is increased by one if LZ77 is enabled. The last context is used to read LZ77
distance when LZ77 length symbol is encountered during decoding.
lz77_allowed
acts as a safeguard to prevent unbounded recursion; more on this in the next chapter.
Distribution Clustering
Entropy coding used by JPEG XL is adaptive, which means it has multiple distributions with varying symbol probability, and chooses appropriate one depending on the current context (hence adaptive). The number of contexts can be one or two, or as many as thousands. However we can’t provide distribution of every single context when there are thousands of those; it would be too costly.
Fortunately, distributions tend to form a cluster with similar symbol probability. Instead of signalling every single distribution, JPEG XL first reduces the number of distributions using distribution clustering.
// struct EntropyDecoder(num_dist: u32, lz77_allowed: bool) {
// ...
pub let (cluster_map, num_clusters) = read_clustering(num_dist);
// ...
Reading cluster map
We define a function read_clustering
which reads a cluster map:
pub fn read_clustering(num_dist: u32) -> (Vec<u32>, u32) {
if num_dist == 1 {
// Trivial case; no need to cluster
return vec![0];
}
let is_simple = read!(Bool);
let cluster = if is_simple {
read_simple(num_dist)
} else {
read_complex(num_dist)
};
let num_clusters = validate_cluster(&cluster);
(cluster, num_clusters)
}
Clustering is represented as a list of cluster indices. You can think it as a function; it maps
pre-clustering context [0, num_dist)
to cluster index [0, num_cluster)
, where num_cluster
is
defined implicitly from the largest element of the list. There’s no “hole” in the mapping, meaning
every integer in range 0..num_cluster
is present.
If is_simple == true
, distribution clustering is in simple encoding.
fn read_simple(num_dist: u32) -> Vec<u32> {
let mut cluster = Vec::new();
let nbits = read!(u(2));
for _ in 0..num_dist {
cluster.push(read!(u(nbits)));
}
cluster
}
If is_simple == false
, it’s rather complex:
fn read_complex(num_dist: u32) -> Vec<u32> {
let mut cluster = Vec::new();
let use_move_to_front = read!(Bool);
// Recursive entropy-coded stream
let lz77_allowed = num_dist > 2;
let mut decoder = read!(struct EntropyDecoder(1, lz77_allowed));
decoder.init();
for _ in 0..num_dist {
cluster.push(decoder.read_uint(0));
}
decoder.validate();
if use_move_to_front {
// Move-to-front encoding
inverse_move_to_front(&mut cluster);
}
cluster
}
Note that we’re reading another entropy-coded stream recursively. lz77_allowed
is there to prevent
additional recursion; if we allowed LZ77 there, the number of distribution would increase to 2,
which might trigger another recursive entropy-coded stream.
Move-to-front encoding
Complex cluster encoding may optionally use move-to-front encoding. It keeps the most recently used indices in lower value.
fn inverse_move_to_front(encoded_map: &mut [u32]) {
let mut mru: Vec<u32> = (0..256).collect();
for slot in encoded_map {
let idx = *slot as usize;
assert!(idx < 256);
let cluster_idx = mru[idx];
*slot = cluster_idx;
if idx > 0 {
// Move last used index to front
mru.copy_within(0..idx, 1);
mru[0] = cluster_idx;
}
}
}
This helps encoding cluster maps with runs of same index.
Cluster map validation
Finally, we validate that there’s no hole in the cluster map:
fn validate_cluster(cluster_map: &[u32]) -> u32 {
let num_clusters = *cluster_map.iter().max().unwrap() + 1;
let cluster_indices: HashSet<_> = cluster_map.iter().copied().collect();
assert_eq!(num_clusters, cluster_indices.len() as u32);
num_clusters
}
Entropy Coding Types
JPEG XL supports two types of entropy coding methods: prefix code, and asymmetric numeral systems.
// struct EntropyDecoder(num_dist: u32, lz77_allowed: bool) {
// ...
let use_prefix_code = read!(Bool);
pub let log_alphabet_size = if use_prefix_code {
15
} else {
read!(u(2)) + 5
}
// ...
Prefix Code
If use_prefix_code == true
, prefix code is used. Prefix code reduces the amount of encoded bits by
assigning shorter code to more frequent symbols. Each code can be as long as 15 bits. Distributions
are parsed in the exact same way as Brotli does; see Prefix Code chapter for details.
Asymmetric Numeral Systems
If use_prefix_code == false
, asymmetric numeral systems (ANS) is used. ANS achieves compression by
encoding symbols into an integer using, well, asymmetric numeral systems. Think of decimal numeral
system we all know. In this system, we have 10 symbols, 0 to 9, and each appended symbol increases
the magnitude of value equally – it’s symmetric. On the other hand, each symbol in ANS can affect
the encoded value completely asymmetrically; for example, in an extreme case where there are
frequent symbol 0 and rare symbol 1, symbol 0 could only increase the value by 0.02%, but the value
would grow about 4096 times larger if symbol 1 is appended.
JPEG XL uses streaming rANS variant with alias mapping; see Asymmetric Numeral Systems chapter for details.
Hybrid Integer
Hybrid integer is a mechanism to insert raw, non-entropy-coded bits into entropy decoded symbol. It can be useful when there are some bits that can’t be encoded efficiently using entropy coding, e.g. the signal is noisy and have almost equal probability in a range of value.
// struct EntropyDecoder(num_dist: u32, lz77_allowed: bool) {
// ...
pub let mut uint_configs = vec![];
for _ in 0..num_clusters {
uint_configs.push(read!(struct HybridIntegerConfig(log_alphabet_size)));
}
// ...
Hybrid integer config
struct HybridIntegerConfig(log_alphabet_size: u32) {
let exp_bits = add_one_log2_ceil(log_alphabet_size);
pub let split_exponent = read!(u(exp_bits));
pub let (msb_in_token, lsb_in_token) = if split_exponent == log_alphabet_size {
// Special case: no raw bits.
(0, 0)
} else {
let msb_bits = add_one_log2_ceil(split_exponent);
let msb_in_token = read!(u(msb_bits));
assert!(msb_in_token <= split_exponent);
let lsb_bits = add_one_log2_ceil(split_exponent - msb_in_token);
let lsb_in_token = read!(u(lsb_bits));
assert!(msb_in_token + lsb_in_token <= split_exponent);
(msb_in_token, lsb_in_token)
};
}
/// The number of bits needed to read a value up to `x`.
fn add_one_log2_ceil(x: u32) -> u32 {
// NOTE: This can be done using integer types only.
(x + 1).log2().ceil()
}
Hybrid integer config splits symbol (“token”) from entropy coded stream into three parts if the
symbol is larger than or equal to 1 << split_exponent
.
+-----------------------+
| token |
+---------+-------+-----+
| raw_exp | msb | lsb |
+---------+-------+-----+
^- split_exponent -^
Then it reads raw_bits
according to raw_exp
value, and constructs new symbol value.
+-+-------+------------+-----+
|1| msb | raw_bits | lsb |
+-+-------+------------+-----+
See [dedicated chapter] for details.
Symbol Probability Distributions
// struct EntropyDecoder(num_dist: u32, lz77_allowed: bool) {
// ...
pub let mut dists = Vec::new();
for _ in 0..num_clusters {
let dist = if use_prefix_code {
read!(struct PrefixDist(log_alphabet_size))
} else {
read!(struct AnsDist(log_alphabet_size))
};
dists.push(dist);
}
// ...
The decoder reads a set of prefix code distribution or ANS distribution, depending on the flag
use_prefix_code
read previously.