-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass pointers to C as unsafe.Pointer, not uintptr (memory corruption) #91
Conversation
The unsafe package says: "Conversion of a uintptr back to Pointer is not valid in general." This code was violating this rule, by passing uintptr value to C, which would then interpret them as pointers. This causes memory corruption: #90 This change replaces all uses of uintptr with unsafe.Pointer to avoid this memory corruption. This has the disadvantage of marking every argument as escaping to heap. This means the buffers used to call the zstd functions must be allocated on the heap. I suspect this should not be a huge problem, since I would expect that high performance code should already be managing its zstd buffers carefully. The bug is as follows: * In zstd_stream_test.go: payload := []byte("Hello World!") is marked as "does not escape to heap" (from go test -gcflags -m). Therefore, it is allocated on the stack. * The test calls Writer.Write, which converts the argument to uintptr: srcPtr = uintptr(unsafe.Pointer(&srcData[0])) * Writer.Write then calls Cgo: C.ZSTD_compressStream2_wrapper(...) * The Go runtime decides the stack needs to be larger, so it copies it to a new location. * (Another thread): The Go runtime decides to reuse the old stack location, so it replaces the "Hello World!" bytes with new data. * (Original thread): Calls zstd, which reads the wrong bytes. This change adds a test which nearly always crashes for me. While investigating the other uses of uintptr, I also was able to trigger a similar crash when calling CompressLevel, but only with: GODEBUG=efence=1 go test . I also added tests for `Ctx.CompressLevel` because it was not obviously being tested. I did not reproduce this problem with that function, but I suspect the same bug exists, since it uses the same pattern. For a minimal reproduction of this bug, see: https://github.com/evanj/cgouintptrbug
Based on this test, we should probably change the CircleCI tests to run the tests with |
See #91 A bug was found that can only be consistently reproduced with efrence=1 so enable it in tests so we make sure to catch it
See #91 A bug was found that can only be consistently reproduced with efrence=1 so enable it in tests so we make sure to catch it
See #91 We need to add a test with efence enabled. I tried to add it directly with the current suite but unfortunatly efence uses a lot of memory so we can only test it with smaller payload. Take the license as an example
See #91 We need to add a test with efence enabled. I tried to add it directly with the current suite but unfortunatly efence uses a lot of memory so we can only test it with smaller payload. Take the license as an example
Thanks for the PR! |
Rats you are right. I swear this was crashing reliably for me! Let me investigate a bit more... |
My test for The TestStreamCompressionDecompressionParallel test does not fail with |
This bug is very sensitive to stack sizes. Calling Decompress caused that function to copy the stack, which then avoided the bug.
Okay, I have the other test for stream compression crashing in CircleCI as well: I needed to use more goroutines for some reason. https://app.circleci.com/pipelines/github/DataDog/zstd/67/workflows/c164d97f-9db4-46d5-83cf-2c9fa37b447f/jobs/235 This bug interestingly seems to go away with In summary: this bug is very tricky, and may not happen in some cases! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for finding this!
It's a bit unfortunate we had to walk back on some of the optimizations but it definitely make sense given the potential memory corruptions.
Let me know if you are able to get a test we can consistently reproduce on CircleCI (Ideally we'd want that before merging this PR (and if needed we can cleanup the tests like stackDepth if it turns out it cannot reproduce the bug))
I ran the benchmark against the same payload we run in the tests (i.e: mr from http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia) which is a ~large file of 9Mb on a Macbook Pro 2019 and there is no notable change in benchmark. I'm guessing it will likely have a slight impact on very small payloads (when you need to do a lot of calls into cgo):
|
I believe the tests that I added in commit 88ff5e1 do reproduce consistently on CircleCI. I needed to increase the number of goroutines for the parallel test, then use the recursive function stack depth trick to get it to crash with https://app.circleci.com/pipelines/github/DataDog/zstd/64/workflows/e8bc7fb9-a336-4f30-b077-42c6da4206b4/jobs/225 |
Thanks! I ran the tests also on my branch and can confirm! https://app.circleci.com/pipelines/github/DataDog/zstd/79/workflows/a3fd4930-b663-4d4c-a57a-2a9c00740a70
|
See #91 We caught a nasty bug because of misusage of uintptr in Go. This was fixed in that PR but add a test case in the circleci matrix to make sure we catch those in the future if they happen again
Unfortunately linux32 limits us to 3GB of memory for 32 bits systems. We will need to disable the memory-hungry tests there. We also only need 4GB for the docker instance
@evanj I think I fixed the remaining test problems with the last 4 commits. |
It is very annoying that this bug depends on many implementation-specific details, so is hard to replicate. These test fixes look good to me. Thanks for checking and being so persistent with this! |
The unsafe package says: "Conversion of a uintptr back to Pointer is
not valid in general." This code was violating this rule, by passing
uintptr value to C, which would then interpret them as pointers. This
causes memory corruption: #90
This change replaces all uses of uintptr with unsafe.Pointer to avoid
this memory corruption. This has the disadvantage of marking every
argument as escaping to heap. This means the buffers used to call the
zstd functions must be allocated on the heap. I suspect this should
not be a huge problem, since I would expect that high performance
code should already be managing its zstd buffers carefully.
The bug is as follows:
marked as "does not escape to heap" (from go test -gcflags -m).
Therefore, it is allocated on the stack.
srcPtr = uintptr(unsafe.Pointer(&srcData[0]))
it to a new location.
location, so it replaces the "Hello World!" bytes with new data.
This change adds a test which nearly always crashes for me. While
investigating the other uses of uintptr, I also was able to trigger
a similar crash when calling CompressLevel, but only with:
GODEBUG=efence=1 go test .
I also added tests for
Ctx.CompressLevel
because it was notobviously being tested. I did not reproduce this problem with that
function, but I suspect the same bug exists, since it uses the same
pattern.
For a minimal reproduction of this bug, see:
https://github.com/evanj/cgouintptrbug