From f06b18b3ff009ef7dc90294fca674658ddf139bf Mon Sep 17 00:00:00 2001 From: elasota <1137273+elasota@users.noreply.github.com> Date: Sun, 19 Nov 2023 15:33:37 -0500 Subject: [PATCH] Specify offset 0 as invalid --- doc/decompressor_accepted_invalid_data.md | 14 ++++++++++++++ doc/zstd_compression_format.md | 5 ++++- 2 files changed, 18 insertions(+), 1 deletion(-) create mode 100644 doc/decompressor_accepted_invalid_data.md diff --git a/doc/decompressor_accepted_invalid_data.md b/doc/decompressor_accepted_invalid_data.md new file mode 100644 index 00000000000..f08f963d93e --- /dev/null +++ b/doc/decompressor_accepted_invalid_data.md @@ -0,0 +1,14 @@ +Decompressor Accepted Invalid Data +================================== + +This document describes the behavior of the reference decompressor in cases +where it accepts an invalid frame instead of reporting an error. + +Zero offsets converted to 1 +--------------------------- +If a sequence is decoded with `literals_length = 0` and `offset_value = 3` +while `Repeated_Offset_1 = 1`, the computed offset will be `0`, which is +invalid. + +The reference decompressor will process this case as if the computed +offset was `1`, including inserting `1` into the repeated offset list. \ No newline at end of file diff --git a/doc/zstd_compression_format.md b/doc/zstd_compression_format.md index 4a8d338b7a6..a8d4a0f35f5 100644 --- a/doc/zstd_compression_format.md +++ b/doc/zstd_compression_format.md @@ -929,7 +929,10 @@ There is an exception though, when current sequence's `literals_length = 0`. In this case, repeated offsets are shifted by one, so an `offset_value` of 1 means `Repeated_Offset2`, an `offset_value` of 2 means `Repeated_Offset3`, -and an `offset_value` of 3 means `Repeated_Offset1 - 1_byte`. +and an `offset_value` of 3 means `Repeated_Offset1 - 1`. + +In the final case, if `Repeated_Offset1 - 1` evaluates to 0, then the +data is considered corrupted. For the first block, the starting offset history is populated with following values : `Repeated_Offset1`=1, `Repeated_Offset2`=4, `Repeated_Offset3`=8,