[zstd] remove src minimal size limit on `decompressSizeHint` #118

XiaochenCui · 2022-05-23T04:34:58Z

We want to mandate the 'zstd' decompressed data to the buffer pre-allocated by us in some circumstances. (No matter whether its size is smaller than zstdFrameHeaderSizeMax.)

For example, in our OLAP system kernel, we have a unified buffer pool to manage all bytes allocated for queries. An accidentally created large array will have a fatal impact on the correctness and performance of the system. (In this case, any compressed data which have a size smaller than 18 would result in a newly constructed byte slice with a size of 1MB, which is intolerable.)

I had also considered other workaround:

provide a global option to control the manner of binary allocation, the pros are it will make our condition branches simpler, and the cons are the difficulty of this library will get increased, and it will be tricky when different components of a system want different strategies.
provide a new decompression API, which will break the consistence of our library.
provide a new arg in the old API, same as point 2.

We want to mandate the 'zstd' decompressed data to the buffer pre-allocated by us in some circumstances. (No matter whether its size is smaller than `zstdFrameHeaderSizeMax`.) For example, in our OLAP system kernel, we have a unified buffer pool to manage all bytes allocated for queries. An accidentally created large array will have a fatal impact on the correctness and performance of the system. (In this case, any compressed data which have a size smaller than 18 would result in a newly constructed byte slice with a size of 1MB, which is intolerable.) I had also considered other workaround: - provide a global option to control the manner of binary allocation, the pros are it will make our condition branches simpler, and the cons are the difficulty of this library will get increased, and it will be tricky when different components of a system want different strategies. - provide a new decompression API, which will break the consistence of our library. - provide a new arg in the old API, same as point 2.

XiaochenCui · 2022-05-23T04:37:33Z

Closed it for not passed tests

Restrict the minimal hint size to 1 to prevent slice access from `index out of range`.

XiaochenCui · 2022-05-23T04:54:00Z

Bug fixed

XiaochenCui · 2022-05-25T07:06:49Z

@Viq111
Request review or better solution on this issue sincerely.

Viq111 · 2022-05-26T14:59:52Z

Hello @XiaochenCui , thanks for the PR.
Would you have a example payload that would be smaller than 18 bytes and valid zstd ?
Indeed the 18 bytes for the frame is the max, we could change it to the min with is 2 bytes

XiaochenCui · 2022-05-29T09:01:28Z

Hello @XiaochenCui , thanks for the PR. Would you have a example payload that would be smaller than 18 bytes and valid zstd ? Indeed the 18 bytes for the frame is the max, we could change it to the min with is 2 bytes

Yeah, we have a compressed column in our OLAP system with a length of 16. It's generated by a binary with 52 zeros.

Here is the code:

originalData = make([]byte, 52)
logging.Printf("len of original data: %d", len(originalData))

compressedData = Compress(CompressZSTD, originalData)
logging.Printf("len of compressed data: %d, content: %v", len(compressedData), compressedData)

The Compress function is just a wrapper for zstd2.CompressLevel(...)

And here is the output:

2022-05-29 16:56:50 | INFO  | compress_test.go:90 > len of original data: 52
2022-05-29 16:56:50 | INFO  | compress_test.go:93 > len of compress data: 16, content: [40 181 47 253 32 52 61 0 0 8 0 1 0 224 116 66]

XiaochenCui · 2022-05-29T09:07:19Z

In zstd1.50, the zstd2.CompressLevel will always use our buffer for decompressed data.

However, in zstd1.52, it will generate a huge binary with size=1 million when we have compressed data with a size < 18, which is a critical problem for both performance and consistency. We downgrade the library to 1.50 to erase this trouble. But I really want to solve it from the root. (I thought the root cause is decompressSizeHint, tell me if it isn't)

XiaochenCui · 2022-05-29T09:13:18Z

Here is the code that is more convenient for you:

        originalData := make([]byte, 52)
	logging.Printf("len of original data: %d", len(originalData))

	// compressedData = Compress(CompressZSTD, originalData)
	compressedData, err := zstd2.CompressLevel(make([]byte, 0, len(originalData)), originalData, zstd2.DefaultCompression)
	if err != nil {
		panic(err)
	}
	logging.Printf("len of compressed data: %d, content: %v", len(compressedData), compressedData)`

XiaochenCui · 2022-06-02T10:51:15Z

@Viq111
I have updated the minimal size of hint to zstdFrameHeaderSizeMin, which is 2.

Btw, why do we have to "prevent DOS from maliciously-created payloads"? As I'm not good at application cryptography.

Fix #118 With the introduction of #115 to prevent zipbombs, the check was too strict and was checking a too large size boundary. This fixes it and adds a test

Viq111 · 2022-06-02T14:44:42Z

Hello!

Ah I've been working on the code in parallel yesterday, here is the PR with a test: #120 that should fix your issue (thanks for reporting!)

Btw, why do we have to "prevent DOS from maliciously-created payloads"? As I'm not good at application cryptography.
See the original PR that introduced the change: #115 (and related issue #60) for the explanation.

If that's ok, I will merge #120 instead of this one as it has the added test ?

XiaochenCui · 2022-06-04T04:11:49Z

Hello!

Ah I've been working on the code in parallel yesterday, here is the PR with a test: #120 that should fix your issue (thanks for reporting!)

Btw, why do we have to "prevent DOS from maliciously-created payloads"? As I'm not good at application cryptography.
See the original PR that introduced the change: #115 (and related issue #60) for the explanation.

If that's ok, I will merge #120 instead of this one as it has the added test ?

Sure, I've reviewed that pr and it's great! Thanks for tracking this issue.

This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [github.com/DataDog/zstd](https://github.com/DataDog/zstd) | indirect | patch | `v1.5.0` -> `v1.5.2` | --- ### Release Notes <details> <summary>DataDog/zstd</summary> ### [`v1.5.2`](https://github.com/DataDog/zstd/releases/tag/v1.5.2): zstd 1.5.2 [Compare Source](https://github.com/DataDog/zstd/compare/v1.5.2...v1.5.2) This release updates the upstream zstd version to [1.5.2](https://github.com/facebook/zstd/releases/tag/v1.5.2) ([https://github.com/DataDog/zstd/pull/116](https://github.com/DataDog/zstd/pull/116)) The update `1.5.0` -> `1.5.2` overall has a similar performance profile. Please note that depending on the workload, performance could vary by -10% / +10% ### [`v1.5.2+patch1`](https://github.com/DataDog/zstd/releases/tag/v1.5.2%2Bpatch1): zstd 1.5.2 - wrapper patches 1 [Compare Source](https://github.com/DataDog/zstd/compare/v1.5.0...v1.5.2) #### What's Changed - Fix unneededly allocated large decompression buffer by [@XiaochenCui](https://github.com/XiaochenCui) ([#118](https://github.com/DataDog/zstd/issues/118)) & [@Viq111](https://github.com/Viq111) in [https://github.com/DataDog/zstd/pull/120](https://github.com/DataDog/zstd/pull/120) - Add SetNbWorkers api to the writer code (see [#108](https://github.com/DataDog/zstd/issues/108)) by [@bsergean](https://github.com/bsergean) in [https://github.com/DataDog/zstd/pull/117](https://github.com/DataDog/zstd/pull/117) - For large workloads, the performance can be improved by 3-6x (see [https://github.com/DataDog/zstd/pull/117#issuecomment-1147812767](https://github.com/DataDog/zstd/pull/117#issuecomment-1147812767)) - `Write()` becomes async with workers > 1, make sure you read the method documentation before using #### New Contributors - [@bsergean](https://github.com/bsergean) made their first contribution in [https://github.com/DataDog/zstd/pull/117](https://github.com/DataDog/zstd/pull/117) - [@XiaochenCui](https://github.com/XiaochenCui) for his work on [https://github.com/DataDog/zstd/pull/118](https://github.com/DataDog/zstd/pull/118) that led to [#120](https://github.com/DataDog/zstd/issues/120) **Full Changelog**: DataDog/zstd@v1.5.2...v1.5.2+patch1 </details> --- ### Configuration 📅 **Schedule**: Branch creation - "before 3am on the first day of the month" (UTC), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).

XiaochenCui closed this May 23, 2022

[zstd][bug fix] restrict the minimal hint size to 1

0b1659e

Restrict the minimal hint size to 1 to prevent slice access from `index out of range`.

XiaochenCui reopened this May 23, 2022

set the hint to zstdFrameHeaderSizeMin when it's too small

0010070

Viq111 added a commit that referenced this pull request Jun 2, 2022

[zstd] Fix unneededly allocate large decompression buffer

43c36f4

Fix #118 With the introduction of #115 to prevent zipbombs, the check was too strict and was checking a too large size boundary. This fixes it and adds a test

Viq111 mentioned this pull request Jun 2, 2022

[zstd] Fix unneededly allocate large decompression buffer #120

Merged

XiaochenCui closed this Jun 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[zstd] remove src minimal size limit on `decompressSizeHint` #118

[zstd] remove src minimal size limit on `decompressSizeHint` #118

XiaochenCui commented May 23, 2022

XiaochenCui commented May 23, 2022

XiaochenCui commented May 23, 2022

XiaochenCui commented May 25, 2022

Viq111 commented May 26, 2022

XiaochenCui commented May 29, 2022

XiaochenCui commented May 29, 2022

XiaochenCui commented May 29, 2022

XiaochenCui commented Jun 2, 2022

Viq111 commented Jun 2, 2022

XiaochenCui commented Jun 4, 2022

[zstd] remove src minimal size limit on decompressSizeHint #118

[zstd] remove src minimal size limit on decompressSizeHint #118

Conversation

XiaochenCui commented May 23, 2022

XiaochenCui commented May 23, 2022

XiaochenCui commented May 23, 2022

XiaochenCui commented May 25, 2022

Viq111 commented May 26, 2022

XiaochenCui commented May 29, 2022

XiaochenCui commented May 29, 2022

XiaochenCui commented May 29, 2022

XiaochenCui commented Jun 2, 2022

Viq111 commented Jun 2, 2022

XiaochenCui commented Jun 4, 2022

[zstd] remove src minimal size limit on `decompressSizeHint` #118

[zstd] remove src minimal size limit on `decompressSizeHint` #118