Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize cubic hermite algorithm in AudioStreamPlaybackResampled #83536

Merged
merged 1 commit into from
Sep 8, 2024

Conversation

wareya
Copy link
Contributor

@wareya wareya commented Oct 18, 2023

Replace the algorithm with one that performs fewer multiplications. The variable names use the nomenclature from the wikipedia page.

Benchmarked with this test file: https://gist.github.com/wareya/2b6183176fa95d391399722f09fd75cb

Compiled something like: g++ fptest.cpp -Wall -O3 -msse2 -I ../godot/godot4/ -I ../godot/godot4/platform/windows/ -o resample_new_cubic.exe -D MODE_NEW_CUBIC

Benchmark results (lower is better, zero order hold and linear shown for reference):

wareya@Toriaezu MSYS /c/users/wareya/dev/scrap
$ ./resample_zoh.exe
0.503047

wareya@Toriaezu MSYS /c/users/wareya/dev/scrap
$ ./resample_linear.exe
0.692028

wareya@Toriaezu MSYS /c/users/wareya/dev/scrap
$ ./resample_new_cubic.exe
1.262155

wareya@Toriaezu MSYS /c/users/wareya/dev/scrap
$ ./resample_old_cubic.exe
1.420782

@wareya wareya requested a review from a team as a code owner October 18, 2023 04:59
@wareya
Copy link
Contributor Author

wareya commented Oct 18, 2023

Quality is unchanged except for floating point rounding error. Here's a test with a 11khz ogg audio file.

Before:

2023-10-18_00-48-25.mp4

After:

2023-10-18_00-50-43.mp4

@DeeJayLSP
Copy link
Contributor

So far in my own tests I haven't noticed an audible difference.

@adamscott
Copy link
Member

adamscott commented Mar 12, 2024

Hi @wareya! Thanks for your first PR!

We don't actually have an audio maintainer/active contributor currently, that explains why nobody took a look at your PR earlier.

As @DeeJayLSP said, I don't either hear anything different, so this is really good. But from what I understand, it's the same computation, just with less multiplications, right?

Thank you again!

@wareya
Copy link
Contributor Author

wareya commented Mar 12, 2024

Yep, that's right! The only differences are floating point precision differences.

@ellenhp
Copy link
Contributor

ellenhp commented Mar 12, 2024

We don't actually have an audio maintainer/active contributor currently, that explains why nobody took a look at your PR earlier.

Yeah I've been inactive but I still get emails. There are a few well-known implementations of hermite resampling, and last I checked ours wasn't optimal. There's some info here. There have been a few issues over the years with our resampling, so I'd double-check this PR against the audio samples given in those GH issues, especially the very low-frequency kick drum sample. Not sure exactly where it is. I link to the issues in this answer.

I haven't looked at the code (again, I'm pretty inactive) but as long as we don't switch away from hermite resampling this is probably fine. (i.e. whatever we change it to should be algebraically equivalent) hermite resampling isn't numerically unstable afaik, so it's fine. I'm also semi-unconvinced this will do much for perf because resampling is an extremely cache-friendly operation so I imagine it's already pretty fast.

@adamscott
Copy link
Member

Thanks @ellenhp for the links!

And from the code I saw, it doesn't look like it did switch away from hermite resampling.

@ellenhp
Copy link
Contributor

ellenhp commented Mar 12, 2024

One thing that could be fun is conditionally using SIMD in the resampling code. I don't think it would be too difficult because the l/r channels are already interleaved so you could just use an f32x2 intrinsic for SSE or NEON or whatever. Once that code is optimal it should basically never change again, so it's a good candidate for SIMD intrinsics. Strongly doubt the speed improvement is gonna be worth it, but it would likely be a power-efficiency win. That would probably need a proposal though, and I think Juan and maybe also Pedro would want to review. Not a change that's likely to get through easily. Audio changes in general are already pretty slow to get reviews on. Kinda my bad but I'm prioritizing maps work in my spare time.

@adamscott
Copy link
Member

One thing that could be fun is conditionally using SIMD in the resampling code. I don't think it would be too difficult because the l/r channels are already interleaved so you could just use an f32x2 intrinsic for SSE or NEON or whatever.

For a first-time contributor, I would merge as is, then optimize it afterwards.

On another subject, @wareya, could you rebase your branch?

@wareya
Copy link
Contributor Author

wareya commented Mar 12, 2024

I've actually contributed before, but it was on the Godot 3 branch, so I didn't get a Contributor badge. (I think contributed commits have to be on the main branch to count? Not sure.)

I'll get this rebased and updated, it doesn't seem to have any conflicts. Test compiling might take about an hour, though, on my desktop.

@adamscott
Copy link
Member

@wareya Are you familiar then with SIMD? If not, no problem, as I said, we could merge this PR as is.

@wareya
Copy link
Contributor Author

wareya commented Mar 12, 2024

Nope, not in the slightest!

@ellenhp
Copy link
Contributor

ellenhp commented Mar 12, 2024

Strongly recommend not trying to add SIMD stuff to this PR. It will need a proposal and is much more complicated and more likely to get stalled.

@wareya
Copy link
Contributor Author

wareya commented Mar 12, 2024

Rebased and updated.

@TokageItLab
Copy link
Member

TokageItLab commented Mar 16, 2024

This can also reuse Math::cubic_interpolate() in the same way as #89071.

However, I remember Math::cubic_interpolate_in_time() is optimized (with Barry and Goldman's pyramidal formulation which uses the difference between each point without constants as in this PR), but Math::cubic_interpolate() is not.

So it would be the best way to just make this PR use Math::cubic_interpolate() and send another PR that optimizes Math::cubic_interpolate().

@DeeJayLSP
Copy link
Contributor

DeeJayLSP commented Mar 16, 2024

I ran wareya's test script with some modifications. A new mode was added like this:

#ifdef MODE_MATH_CUBIC
    p_buffer[i] = AudioFrame(Math::cubic_interpolate(y1[0], y2[0], y0[0], y3[0], mu), Math::cubic_interpolate(y1[1], y2[1], y0[1], y3[1], mu));
#endif

I also applied the optimization to Math::cubic_interpolate to test Tokage's suggestion.

Results:

Mode Duration (s)
Old cubic 0.920070
New cubic 0.788788
Original cubic_interpolate 0.899504
Optimized cubic_interpolate 0.896148

cubic_interpolate could be used on #89071 because it was iterating on float. We are working with the AudioFrame struct here, so in order to use it you either write one with AudioFrames or use two interpolates.

It should be noted that the first approach was already being used for a long time, I believe this was done on purpose for optimization unlike some of the old audio code like the one #89071 improved.

Edit: probably because in this implementation, h11, z, h01 and h10 are calculated only once for both left and right channels, while in the test above there's one calculation per channel.

I have doubts about precision if this gets implemented on Math::cubic_interpolate().

@wareya
Copy link
Contributor Author

wareya commented Aug 22, 2024

Addressed review comment, rebased, and pushed.

@akien-mga akien-mga modified the milestones: 4.x, 4.4 Sep 7, 2024
@akien-mga akien-mga merged commit 73a0f6e into godotengine:master Sep 8, 2024
18 checks passed
@akien-mga
Copy link
Member

Thanks! And congrats for your first merged Godot contribution 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants