-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[zstd] remove src minimal size limit on decompressSizeHint
#118
Conversation
We want to mandate the 'zstd' decompressed data to the buffer pre-allocated by us in some circumstances. (No matter whether its size is smaller than `zstdFrameHeaderSizeMax`.) For example, in our OLAP system kernel, we have a unified buffer pool to manage all bytes allocated for queries. An accidentally created large array will have a fatal impact on the correctness and performance of the system. (In this case, any compressed data which have a size smaller than 18 would result in a newly constructed byte slice with a size of 1MB, which is intolerable.) I had also considered other workaround: - provide a global option to control the manner of binary allocation, the pros are it will make our condition branches simpler, and the cons are the difficulty of this library will get increased, and it will be tricky when different components of a system want different strategies. - provide a new decompression API, which will break the consistence of our library. - provide a new arg in the old API, same as point 2.
Closed it for not passed tests |
Restrict the minimal hint size to 1 to prevent slice access from `index out of range`.
Bug fixed |
@Viq111 |
Hello @XiaochenCui , thanks for the PR. |
Yeah, we have a compressed column in our OLAP system with a length of 16. It's generated by a binary with 52 zeros. Here is the code: originalData = make([]byte, 52)
logging.Printf("len of original data: %d", len(originalData))
compressedData = Compress(CompressZSTD, originalData)
logging.Printf("len of compressed data: %d, content: %v", len(compressedData), compressedData) The And here is the output:
|
In zstd1.50, the However, in zstd1.52, it will generate a huge binary with size=1 million when we have compressed data with a size < 18, which is a critical problem for both performance and consistency. We downgrade the library to 1.50 to erase this trouble. But I really want to solve it from the root. (I thought the root cause is |
Here is the code that is more convenient for you: originalData := make([]byte, 52)
logging.Printf("len of original data: %d", len(originalData))
// compressedData = Compress(CompressZSTD, originalData)
compressedData, err := zstd2.CompressLevel(make([]byte, 0, len(originalData)), originalData, zstd2.DefaultCompression)
if err != nil {
panic(err)
}
logging.Printf("len of compressed data: %d, content: %v", len(compressedData), compressedData)` |
@Viq111 Btw, why do we have to "prevent DOS from maliciously-created payloads"? As I'm not good at application cryptography. |
Hello! Ah I've been working on the code in parallel yesterday, here is the PR with a test: #120 that should fix your issue (thanks for reporting!)
If that's ok, I will merge #120 instead of this one as it has the added test ? |
Sure, I've reviewed that pr and it's great! Thanks for tracking this issue. |
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [github.com/DataDog/zstd](https://github.com/DataDog/zstd) | indirect | patch | `v1.5.0` -> `v1.5.2` | --- ### Release Notes <details> <summary>DataDog/zstd</summary> ### [`v1.5.2`](https://github.com/DataDog/zstd/releases/tag/v1.5.2): zstd 1.5.2 [Compare Source](https://github.com/DataDog/zstd/compare/v1.5.2...v1.5.2) This release updates the upstream zstd version to [1.5.2](https://github.com/facebook/zstd/releases/tag/v1.5.2) ([https://github.com/DataDog/zstd/pull/116](https://github.com/DataDog/zstd/pull/116)) The update `1.5.0` -> `1.5.2` overall has a similar performance profile. Please note that depending on the workload, performance could vary by -10% / +10% ### [`v1.5.2+patch1`](https://github.com/DataDog/zstd/releases/tag/v1.5.2%2Bpatch1): zstd 1.5.2 - wrapper patches 1 [Compare Source](https://github.com/DataDog/zstd/compare/v1.5.0...v1.5.2) #### What's Changed - Fix unneededly allocated large decompression buffer by [@​XiaochenCui](https://github.com/XiaochenCui) ([#​118](https://github.com/DataDog/zstd/issues/118)) & [@​Viq111](https://github.com/Viq111) in [https://github.com/DataDog/zstd/pull/120](https://github.com/DataDog/zstd/pull/120) - Add SetNbWorkers api to the writer code (see [#​108](https://github.com/DataDog/zstd/issues/108)) by [@​bsergean](https://github.com/bsergean) in [https://github.com/DataDog/zstd/pull/117](https://github.com/DataDog/zstd/pull/117) - For large workloads, the performance can be improved by 3-6x (see [https://github.com/DataDog/zstd/pull/117#issuecomment-1147812767](https://github.com/DataDog/zstd/pull/117#issuecomment-1147812767)) - `Write()` becomes async with workers > 1, make sure you read the method documentation before using #### New Contributors - [@​bsergean](https://github.com/bsergean) made their first contribution in [https://github.com/DataDog/zstd/pull/117](https://github.com/DataDog/zstd/pull/117) - [@​XiaochenCui](https://github.com/XiaochenCui) for his work on [https://github.com/DataDog/zstd/pull/118](https://github.com/DataDog/zstd/pull/118) that led to [#​120](https://github.com/DataDog/zstd/issues/120) **Full Changelog**: DataDog/zstd@v1.5.2...v1.5.2+patch1 </details> --- ### Configuration 📅 **Schedule**: Branch creation - "before 3am on the first day of the month" (UTC), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNC43Ny4wIiwidXBkYXRlZEluVmVyIjoiMzQuNzcuMCJ9-->
We want to mandate the 'zstd' decompressed data to the buffer pre-allocated by us in some circumstances. (No matter whether its size is smaller than
zstdFrameHeaderSizeMax
.)For example, in our OLAP system kernel, we have a unified buffer pool to manage all bytes allocated for queries. An accidentally created large array will have a fatal impact on the correctness and performance of the system. (In this case, any compressed data which have a size smaller than 18 would result in a newly constructed byte slice with a size of 1MB, which is intolerable.)
I had also considered other workaround: