-
-
Notifications
You must be signed in to change notification settings - Fork 21.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of certain physics queries when using Jolt Physics #101071
Conversation
I'd be curious to hear your take on this @jrouwe, and whether you think it's worthwhile to begin with. I'm personally so used to seeing this type of container be used in larger game-related codebases that I didn't even bother to do any profiling when making this same change to the extension, and I must admit the improvements were not as drastic as I had imagined. But again, that probably depends greatly on the platform/machine as well. The fact that Godot fairly liberally allocates memory elsewhere (through the default system allocator) does make this change seem a bit "against the grain" though. |
I suppose it's also worth discussing whether this should even be its own container or instead just a custom allocator. Ideally it would be a custom allocator for However, I did find the custom |
I think avoiding memory allocations is very useful and a good way to speed things up! A custom allocator was indeed what I was thinking about too and I thought it would be fun to write one: Turns out it wasn't fun at all, mainly because Jolt can fall back to using Edit: The PR has been merged now, feel free to use this allocator or to use what you have already written. |
Yeah, that looks about as fun as I remember it being. 😅 I'm glad you caught both the rebinding and non-stateless gotchas. I appreciate the addition though. I'm not sure I can justify bumping Jolt just for this, but I suppose we could switch to your |
You could also only add |
3fd173e
to
69add56
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
69add56
to
f1a6627
Compare
f1a6627
to
5d2a54e
Compare
Thanks! |
(This addresses the to-do list item in #99895 titled "Get rid of heap allocations in shape queries when requested hits are less or equal to default" and brings the Jolt Physics module in line with the Godot Jolt extension, which also utilizes this optimization, albeit implemented differently.)
Since most physics queries in Godot have a default upper limit to how many hits they can return (e.g. 32 hits for
intersect_point
) there is a ripe opportunity for omitting the memory allocations associated with storing these hits in the cases where we don't in fact exceed this default limit, which is almost certainly the vast majority of cases.This pull request adds a new container to the Jolt Physics module calledJoltInlineVector
, which acts as a hybrid between a stack-allocated buffer and a heap-allocated one, and switches from the former to the latter once a certain templatized capacity has been exceeded. These containers are sometimes referred to as a "small vector" in other codebases.EDIT: This PR now instead uses the backported
JPH::STLLocalAllocator<T, N>
from #102614, as discussed below, which allowsJPH::Array
to become an inline/small array, behaving exactly like theJoltInlineVector
implementation that was here previously.This is then used in the
JoltQueryCollector*Multi
classes, which themselves take a templatized default capacity appropriate for the callsite, allowing us to omit1 the memory allocations associated with storing the hits for the following physics queries:PhysicsDirectSpaceState3D.cast_motion
PhysicsDirectSpaceState3D.collide_shape
PhysicsDirectSpaceState3D.intersect_point
PhysicsDirectSpaceState3D.intersect_shape
PhysicsBody3D.move_and_collide
PhysicsBody3D.test_move
PhysicsServer3D.body_test_motion
The actual performance benefits of this will likely vary greatly depending on the platform, but in my measurements on Windows (both in Superluminal and measuring with
QueryPerformanceCounter
) I'm seeing roughly an average reduction of 10-15% CPU time, with certain larger spikes (presumably from occasional page faults) disappearing completely.Here's a before-and-after profiling of
PhysicsDirectSpaceState3D.cast_motion
, where the motion vector spans most of the level in GDQuest's Robo Blast demo:I tried to keep the implementation of this new container as minimal as possible, but I'll admit it turned out to be a few lines longer than I hoped it would be. I do still think this optimization is worthwhile though, since physics queries tend to be plentiful in a lot of games.Footnotes
Note that even with this optimization there are still memory allocations happening when performing physics queries from scripts, as the script interface relies on things like
TypedArray<Dictionary>
for its results, which still allocate plenty of memory, so this by no means removes all allocations from the queries listed here. ↩