-
-
Notifications
You must be signed in to change notification settings - Fork 21.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize cubic hermite algorithm in AudioStreamPlaybackResampled #83536
Conversation
Quality is unchanged except for floating point rounding error. Here's a test with a 11khz ogg audio file. Before: 2023-10-18_00-48-25.mp4After: 2023-10-18_00-50-43.mp4 |
So far in my own tests I haven't noticed an audible difference. |
Hi @wareya! Thanks for your first PR! We don't actually have an audio maintainer/active contributor currently, that explains why nobody took a look at your PR earlier. As @DeeJayLSP said, I don't either hear anything different, so this is really good. But from what I understand, it's the same computation, just with less multiplications, right? Thank you again! |
Yep, that's right! The only differences are floating point precision differences. |
Yeah I've been inactive but I still get emails. There are a few well-known implementations of hermite resampling, and last I checked ours wasn't optimal. There's some info here. There have been a few issues over the years with our resampling, so I'd double-check this PR against the audio samples given in those GH issues, especially the very low-frequency kick drum sample. Not sure exactly where it is. I link to the issues in this answer. I haven't looked at the code (again, I'm pretty inactive) but as long as we don't switch away from hermite resampling this is probably fine. (i.e. whatever we change it to should be algebraically equivalent) hermite resampling isn't numerically unstable afaik, so it's fine. I'm also semi-unconvinced this will do much for perf because resampling is an extremely cache-friendly operation so I imagine it's already pretty fast. |
Thanks @ellenhp for the links! And from the code I saw, it doesn't look like it did switch away from hermite resampling. |
One thing that could be fun is conditionally using SIMD in the resampling code. I don't think it would be too difficult because the l/r channels are already interleaved so you could just use an f32x2 intrinsic for SSE or NEON or whatever. Once that code is optimal it should basically never change again, so it's a good candidate for SIMD intrinsics. Strongly doubt the speed improvement is gonna be worth it, but it would likely be a power-efficiency win. That would probably need a proposal though, and I think Juan and maybe also Pedro would want to review. Not a change that's likely to get through easily. Audio changes in general are already pretty slow to get reviews on. Kinda my bad but I'm prioritizing maps work in my spare time. |
For a first-time contributor, I would merge as is, then optimize it afterwards. |
I've actually contributed before, but it was on the Godot 3 branch, so I didn't get a Contributor badge. (I think contributed commits have to be on the main branch to count? Not sure.) I'll get this rebased and updated, it doesn't seem to have any conflicts. Test compiling might take about an hour, though, on my desktop. |
@wareya Are you familiar then with SIMD? If not, no problem, as I said, we could merge this PR as is. |
Nope, not in the slightest! |
Strongly recommend not trying to add SIMD stuff to this PR. It will need a proposal and is much more complicated and more likely to get stalled. |
Rebased and updated. |
This can also reuse However, I remember So it would be the best way to just make this PR use |
I ran wareya's test script with some modifications. A new mode was added like this:
I also applied the optimization to Results:
It should be noted that the first approach was already being used for a long time, I believe this was done on purpose for optimization unlike some of the old audio code like the one #89071 improved. Edit: probably because in this implementation, h11, z, h01 and h10 are calculated only once for both left and right channels, while in the test above there's one calculation per channel. I have doubts about precision if this gets implemented on |
Addressed review comment, rebased, and pushed. |
Thanks! And congrats for your first merged Godot contribution 🎉 |
Replace the algorithm with one that performs fewer multiplications. The variable names use the nomenclature from the wikipedia page.
Benchmarked with this test file: https://gist.github.com/wareya/2b6183176fa95d391399722f09fd75cb
Compiled something like:
g++ fptest.cpp -Wall -O3 -msse2 -I ../godot/godot4/ -I ../godot/godot4/platform/windows/ -o resample_new_cubic.exe -D MODE_NEW_CUBIC
Benchmark results (lower is better, zero order hold and linear shown for reference):