forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zstd: support external sequence producer API #4
Open
embg
wants to merge
4
commits into
terrelln:zstd-next
Choose a base branch
from
embg:quickassist_july_2024
base: zstd-next
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Import upstream zstd v1.5.5 to expose upstream's QAT integration. Import from upstream commit 58b3ef79 [0]. This is one commit before the tag v1.5.5-kernel [1], which is signed with upstream's signing key. The next patch in the series imports from v1.5.5-kernel, and is included in the series, rather than just importing directly from v1.5.5-kernel, because it is a non-trivial patch applied to improve the kernel's decompression speed. This commit contains 3 backported patches on top of v1.5.5: Two from the Linux copy of zstd, and one from upstream's `dev` branch. In addition to keeping the kernel's copy of zstd up to date, this update was requested by Intel to expose upstream zstd's external match provider API to the kernel, which allows QAT to accelerate the LZ match finding stage. This commit was generated by: export ZSTD=/path/to/repo/zstd/ export LINUX=/path/to/repo/linux/ cd "$ZSTD/contrib/linux-kernel" git checkout v1.5.5-kernel~ make import LINUX="$LINUX" I tested and benchmarked this commit on x86-64 with gcc-13.2.1 on an Intel i9-9900K by running my benchmark scripts that benchmark zstd's performance in btrfs and squashfs compressed filesystems. This commit improves compression speed, especially for higher compression levels, and regresses decompression speed. But the decompression speed regression is addressed by the next patch in the series. Component, Level, C. time delta, size delta, D. time delta Btrfs , 1, -1.9%, +0.0%, +9.5% Btrfs , 3, -5.6%, +0.0%, +7.4% Btrfs , 5, -4.9%, +0.0%, +5.0% Btrfs , 7, -5.7%, +0.0%, +5.2% Btrfs , 9, -5.7%, +0.0%, +4.0% Squashfs , 1, N/A, 0.0%, +11.6% I also boot tested with a zstd compressed kernel on i386 and aarch64. Link: facebook/zstd@58b3ef7 Link: https://github.com/facebook/zstd/tree/v1.5.5-kernel Signed-off-by: Nick Terrell <terrelln@fb.com>
Backport upstream commit c7269ad [0] to improve zstd decoding speed. Updating the kernel to zstd v1.5.5 earlier in this patch series regressed zstd decoding speed. This turned out to be because gcc was not unrolling the inner loops of the Huffman decoder which are executed a constant number of times [1]. This really hurts performance, as we expect this loop to be completely branch-free. This commit fixes the issue by unrolling the loop manually [2]. The commit fixes one more minor issue, which is to mask a variable shift by 0x3F. The shift was guaranteed to be less than 64, but gcc couldn't prove that, and emitted suboptimal code. Finally, the upstream commit added a build macro `HUF_DISABLE_FAST_DECODE` which is not used in the kernel, but is maintained to keep a clean import from upstream. This commit was generated from upstream signed tag v1.5.5-kernel [3] by: export ZSTD=/path/to/repo/zstd/ export LINUX=/path/to/repo/linux/ cd "$ZSTD/contrib/linux-kernel" git checkout v1.5.5-kernel make import LINUX="$LINUX" I ran my benchmark & test suite before and after this commit to measure the overall decompression speed benefit. It benchmarks zstd at several compression levels. These benchmarks measure the total time it takes to read data from the compressed filesystem. Component, Level, Read time delta Btrfs , 1, -7.0% Btrfs , 3, -3.9% Btrfs , 5, -4.7% Btrfs , 7, -5.5% Btrfs , 9, -2.4% Squashfs , 1, -9.1% Link: facebook/zstd@c7269ad Link: https://gist.github.com/terrelln/2e14ff1fb197102a08d7823d8044978d Link: https://gist.github.com/terrelln/a70bde22a2abc800691fb65c21eabc2a Link: https://github.com/facebook/zstd/tree/v1.5.5-kernel Signed-off-by: Nick Terrell <terrelln@fb.com>
The g_debuglevel variable in debug.c is only used when DEBUGLEVEL is defined to be above 2. This means by default there's no actual definition of this in the headers, so sparse is giving the following warning: lib/zstd/common/debug.c:24:5: warning: symbol 'g_debuglevel' was not declared. Should it be static? We can use the same check as in the header to remove this if it isn't going to be used, silencing the warning and removing a small bit of unused data. Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Nick Terrell <terrelln@fb.com>
Cherry-picks support for using zstd's external sequence producer API in the kernel. This unblocks the use of QuickAssist hardware acceleration for zstd in applications such as BTRFS. Three upstream PRs are included in this cherry-pick: * PR #3839: move ESP params into `ZSTD_CCtx_params` * PR #3854: add `ZSTD_CCtxParams_registerSequenceProducer` * PR #4064: expose relevant functions so they can be used in Linux To build this commit, I first cherry-picked the relevant upstream commits onto the upstream v1.5.5-kernel tag: ``` cd ~/repos/zstd git checkout tags/v1.5.5-kernel git cherry-pick -m 1 126ec2669c927b24acd38ea903a211c1b5416588 git cherry-pick c6cabf94417d84ebb5da62e05d8b8a9623763585 git cherry-pick 3242ac598e6f17d8008f6110337a3b4c1205842b ``` I then ran "make import" to copy the changes into my fork of Linux: ``` cd ~/repos/zstd/contrib/linux-kernel/ make import ``` Finally, I reverted an unrelated change which was pulled in by the import: ``` cd ~/repos/linux git commit git checkout HEAD^1 -- lib/zstd/common/debug.c git commit --amend ``` This unrelated change occurred due to a patch in zstd-next which was slightly modified when it landed upstream (post-1.5.5 release). I verified that the debug.c change is the only patch in zstd-next which is not a direct cherrypick from upstream. Signed-off-by: Elliot Gorokhovsky <embg@meta.com>
terrelln
pushed a commit
that referenced
this pull request
Mar 4, 2025
We have several places across the kernel where we want to access another task's syscall arguments, such as ptrace(2), seccomp(2), etc., by making a call to syscall_get_arguments(). This works for register arguments right away by accessing the task's `regs' member of `struct pt_regs', however for stack arguments seen with 32-bit/o32 kernels things are more complicated. Technically they ought to be obtained from the user stack with calls to an access_remote_vm(), but we have an easier way available already. So as to be able to access syscall stack arguments as regular function arguments following the MIPS calling convention we copy them over from the user stack to the kernel stack in arch/mips/kernel/scall32-o32.S, in handle_sys(), to the current stack frame's outgoing argument space at the top of the stack, which is where the handler called expects to see its incoming arguments. This area is also pointed at by the `pt_regs' pointer obtained by task_pt_regs(). Make the o32 stack argument space a proper member of `struct pt_regs' then, by renaming the existing member from `pad0' to `args' and using generated offsets to access the space. No functional change though. With the change in place the o32 kernel stack frame layout at the entry to a syscall handler invoked by handle_sys() is therefore as follows: $sp + 68 -> | ... | <- pt_regs.regs[9] +---------------------+ $sp + 64 -> | $t0 | <- pt_regs.regs[8] +---------------------+ $sp + 60 -> | $a3/argument #4 | <- pt_regs.regs[7] +---------------------+ $sp + 56 -> | $a2/argument #3 | <- pt_regs.regs[6] +---------------------+ $sp + 52 -> | $a1/argument #2 | <- pt_regs.regs[5] +---------------------+ $sp + 48 -> | $a0/argument #1 | <- pt_regs.regs[4] +---------------------+ $sp + 44 -> | $v1 | <- pt_regs.regs[3] +---------------------+ $sp + 40 -> | $v0 | <- pt_regs.regs[2] +---------------------+ $sp + 36 -> | $at | <- pt_regs.regs[1] +---------------------+ $sp + 32 -> | $zero | <- pt_regs.regs[0] +---------------------+ $sp + 28 -> | stack argument torvalds#8 | <- pt_regs.args[7] +---------------------+ $sp + 24 -> | stack argument torvalds#7 | <- pt_regs.args[6] +---------------------+ $sp + 20 -> | stack argument torvalds#6 | <- pt_regs.args[5] +---------------------+ $sp + 16 -> | stack argument #5 | <- pt_regs.args[4] +---------------------+ $sp + 12 -> | psABI space for $a3 | <- pt_regs.args[3] +---------------------+ $sp + 8 -> | psABI space for $a2 | <- pt_regs.args[2] +---------------------+ $sp + 4 -> | psABI space for $a1 | <- pt_regs.args[1] +---------------------+ $sp + 0 -> | psABI space for $a0 | <- pt_regs.args[0] +---------------------+ holding user data received and with the first 4 frame slots reserved by the psABI for the compiler to spill the incoming arguments from $a0-$a3 registers (which it sometimes does according to its needs) and the next 4 frame slots designated by the psABI for any stack function arguments that follow. This data is also available for other tasks to peek/poke at as reqired and where permitted. Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See commit message