Optimize cubic hermite algorithm in AudioStreamPlaybackResampled #83536

wareya · 2023-10-18T04:59:01Z

Replace the algorithm with one that performs fewer multiplications. The variable names use the nomenclature from the wikipedia page.

Benchmarked with this test file: https://gist.github.com/wareya/2b6183176fa95d391399722f09fd75cb

Compiled something like: g++ fptest.cpp -Wall -O3 -msse2 -I ../godot/godot4/ -I ../godot/godot4/platform/windows/ -o resample_new_cubic.exe -D MODE_NEW_CUBIC

Benchmark results (lower is better, zero order hold and linear shown for reference):

wareya@Toriaezu MSYS /c/users/wareya/dev/scrap
$ ./resample_zoh.exe
0.503047

wareya@Toriaezu MSYS /c/users/wareya/dev/scrap
$ ./resample_linear.exe
0.692028

wareya@Toriaezu MSYS /c/users/wareya/dev/scrap
$ ./resample_new_cubic.exe
1.262155

wareya@Toriaezu MSYS /c/users/wareya/dev/scrap
$ ./resample_old_cubic.exe
1.420782

wareya · 2023-10-18T05:00:13Z

Quality is unchanged except for floating point rounding error. Here's a test with a 11khz ogg audio file.

Before:

2023-10-18_00-48-25.mp4

After:

2023-10-18_00-50-43.mp4

DeeJayLSP · 2024-03-12T18:23:29Z

So far in my own tests I haven't noticed an audible difference.

adamscott · 2024-03-12T19:03:57Z

Hi @wareya! Thanks for your first PR!

We don't actually have an audio maintainer/active contributor currently, that explains why nobody took a look at your PR earlier.

As @DeeJayLSP said, I don't either hear anything different, so this is really good. But from what I understand, it's the same computation, just with less multiplications, right?

Thank you again!

wareya · 2024-03-12T19:49:43Z

Yep, that's right! The only differences are floating point precision differences.

ellenhp · 2024-03-12T19:58:35Z

We don't actually have an audio maintainer/active contributor currently, that explains why nobody took a look at your PR earlier.

Yeah I've been inactive but I still get emails. There are a few well-known implementations of hermite resampling, and last I checked ours wasn't optimal. There's some info here. There have been a few issues over the years with our resampling, so I'd double-check this PR against the audio samples given in those GH issues, especially the very low-frequency kick drum sample. Not sure exactly where it is. I link to the issues in this answer.

I haven't looked at the code (again, I'm pretty inactive) but as long as we don't switch away from hermite resampling this is probably fine. (i.e. whatever we change it to should be algebraically equivalent) hermite resampling isn't numerically unstable afaik, so it's fine. I'm also semi-unconvinced this will do much for perf because resampling is an extremely cache-friendly operation so I imagine it's already pretty fast.

adamscott · 2024-03-12T20:04:20Z

Thanks @ellenhp for the links!

And from the code I saw, it doesn't look like it did switch away from hermite resampling.

ellenhp · 2024-03-12T20:12:22Z

One thing that could be fun is conditionally using SIMD in the resampling code. I don't think it would be too difficult because the l/r channels are already interleaved so you could just use an f32x2 intrinsic for SSE or NEON or whatever. Once that code is optimal it should basically never change again, so it's a good candidate for SIMD intrinsics. Strongly doubt the speed improvement is gonna be worth it, but it would likely be a power-efficiency win. That would probably need a proposal though, and I think Juan and maybe also Pedro would want to review. Not a change that's likely to get through easily. Audio changes in general are already pretty slow to get reviews on. Kinda my bad but I'm prioritizing maps work in my spare time.

adamscott · 2024-03-12T20:18:35Z

One thing that could be fun is conditionally using SIMD in the resampling code. I don't think it would be too difficult because the l/r channels are already interleaved so you could just use an f32x2 intrinsic for SSE or NEON or whatever.

For a first-time contributor, I would merge as is, then optimize it afterwards.

On another subject, @wareya, could you rebase your branch?

wareya · 2024-03-12T20:22:40Z

I've actually contributed before, but it was on the Godot 3 branch, so I didn't get a Contributor badge. (I think contributed commits have to be on the main branch to count? Not sure.)

I'll get this rebased and updated, it doesn't seem to have any conflicts. Test compiling might take about an hour, though, on my desktop.

adamscott · 2024-03-12T20:23:36Z

@wareya Are you familiar then with SIMD? If not, no problem, as I said, we could merge this PR as is.

wareya · 2024-03-12T20:23:51Z

Nope, not in the slightest!

ellenhp · 2024-03-12T20:26:30Z

Strongly recommend not trying to add SIMD stuff to this PR. It will need a proposal and is much more complicated and more likely to get stalled.

wareya · 2024-03-12T22:19:32Z

Rebased and updated.

TokageItLab · 2024-03-16T07:16:22Z

This can also reuse Math::cubic_interpolate() in the same way as #89071.

However, I remember Math::cubic_interpolate_in_time() is optimized (with Barry and Goldman's pyramidal formulation which uses the difference between each point without constants as in this PR), but Math::cubic_interpolate() is not.

So it would be the best way to just make this PR use Math::cubic_interpolate() and send another PR that optimizes Math::cubic_interpolate().

DeeJayLSP · 2024-03-16T14:19:21Z

I ran wareya's test script with some modifications. A new mode was added like this:

#ifdef MODE_MATH_CUBIC
    p_buffer[i] = AudioFrame(Math::cubic_interpolate(y1[0], y2[0], y0[0], y3[0], mu), Math::cubic_interpolate(y1[1], y2[1], y0[1], y3[1], mu));
#endif

I also applied the optimization to Math::cubic_interpolate to test Tokage's suggestion.

Results:

Mode	Duration (s)
Old cubic	0.920070
New cubic	0.788788
Original `cubic_interpolate`	0.899504
Optimized `cubic_interpolate`	0.896148

cubic_interpolate could be used on #89071 because it was iterating on float. We are working with the AudioFrame struct here, so in order to use it you either write one with AudioFrames or use two interpolates.

It should be noted that the first approach was already being used for a long time, I believe this was done on purpose for optimization unlike some of the old audio code like the one #89071 improved.

Edit: probably because in this implementation, h11, z, h01 and h10 are calculated only once for both left and right channels, while in the test above there's one calculation per channel.

I have doubts about precision if this gets implemented on Math::cubic_interpolate().

servers/audio/audio_stream.cpp

wareya · 2024-08-22T22:40:28Z

Addressed review comment, rebased, and pushed.

akien-mga · 2024-09-08T21:25:05Z

Thanks! And congrats for your first merged Godot contribution 🎉

wareya requested a review from a team as a code owner October 18, 2023 04:59

wareya mentioned this pull request Oct 18, 2023

Make AudioStreamPlaybackWAV inherit from AudioStreamPlaybackResampled #83483

Closed

AThousandShips added enhancement topic:audio labels Oct 18, 2023

AThousandShips added this to the 4.x milestone Oct 18, 2023

Calinou added the performance label Oct 18, 2023

wareya force-pushed the new_cubic branch from 18f34a9 to 772b7d5 Compare March 12, 2024 21:31

DeeJayLSP reviewed Mar 16, 2024

View reviewed changes

servers/audio/audio_stream.cpp Outdated Show resolved Hide resolved

Optimize cubic hermite algorithm in AudioStreamPlaybackResampled

94b31c1

wareya force-pushed the new_cubic branch from 772b7d5 to 94b31c1 Compare August 22, 2024 22:40

reduz approved these changes Sep 7, 2024

View reviewed changes

akien-mga modified the milestones: 4.x, 4.4 Sep 7, 2024

akien-mga merged commit 73a0f6e into godotengine:master Sep 8, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize cubic hermite algorithm in AudioStreamPlaybackResampled #83536

Optimize cubic hermite algorithm in AudioStreamPlaybackResampled #83536

wareya commented Oct 18, 2023 •

edited

Loading

wareya commented Oct 18, 2023

DeeJayLSP commented Mar 12, 2024

adamscott commented Mar 12, 2024 •

edited

Loading

wareya commented Mar 12, 2024

ellenhp commented Mar 12, 2024

adamscott commented Mar 12, 2024

ellenhp commented Mar 12, 2024

adamscott commented Mar 12, 2024

wareya commented Mar 12, 2024

adamscott commented Mar 12, 2024

wareya commented Mar 12, 2024

ellenhp commented Mar 12, 2024

wareya commented Mar 12, 2024

TokageItLab commented Mar 16, 2024 •

edited

Loading

DeeJayLSP commented Mar 16, 2024 •

edited

Loading

wareya commented Aug 22, 2024

akien-mga commented Sep 8, 2024

Optimize cubic hermite algorithm in AudioStreamPlaybackResampled #83536

Optimize cubic hermite algorithm in AudioStreamPlaybackResampled #83536

Conversation

wareya commented Oct 18, 2023 • edited Loading

wareya commented Oct 18, 2023

DeeJayLSP commented Mar 12, 2024

adamscott commented Mar 12, 2024 • edited Loading

wareya commented Mar 12, 2024

ellenhp commented Mar 12, 2024

adamscott commented Mar 12, 2024

ellenhp commented Mar 12, 2024

adamscott commented Mar 12, 2024

wareya commented Mar 12, 2024

adamscott commented Mar 12, 2024

wareya commented Mar 12, 2024

ellenhp commented Mar 12, 2024

wareya commented Mar 12, 2024

TokageItLab commented Mar 16, 2024 • edited Loading

DeeJayLSP commented Mar 16, 2024 • edited Loading

wareya commented Aug 22, 2024

akien-mga commented Sep 8, 2024

wareya commented Oct 18, 2023 •

edited

Loading

adamscott commented Mar 12, 2024 •

edited

Loading

TokageItLab commented Mar 16, 2024 •

edited

Loading

DeeJayLSP commented Mar 16, 2024 •

edited

Loading