-
-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scripts to bake and play hundreds of thousands of GPU vertex animations #10866
Comments
I've modified this design in my game project due to the severe overhead of MultiMeshInstance3D.SetInstanceTransform and its Color and CustomData variants. While the design in the initial proposal works reasonably well, it shows its limits once around 145,000 actors as seen in this demo. The CPU time spikes above 33ms, with around 16ms being spent in this chunk of code
This is admittedly expected given that setting any data in the MultiMesh API modifies GPU state, which not only entails the overhead of transferring bytes, but also that of interrupting the GPU's work. A single one of these calls is trivial, but hundreds of thousands of them quickly reaches a limit. Thus, the solution is to figure out a way to upload all MultiMesh data in a single GPU state change, and that is using ImageTextures and Images along with MultiMeshes. The key is that ImageTexture's update method counts as a single GPU state change. That means that changing the GPU state of a MultiMesh and its hundreds of thousands of instances has the effective overhead of just the CPU to GPU transfer rate. An average machine with 16GB per second would mean that 16MB of GPU state change costs just 1ms. Which means updating 1 million custom data per frame (remember that custom data is used to store animation frame data) costs 1ms. And updating 1 million transforms per frame costs 4ms. To illustrate one such application of this, see this diagram This diagram also shoehorns in an "equipment variation" scheme, but if that's considered out of scope, then focus only on a single model. For a single animated model per actor, there's two ImageTextures. One holding transforms, and one holding a single float as the index to the particular transform. Use the image format that consumes the least amount of bytes. Then, every time an actor wants to render itself and all its "equipment variations", set the corresponding transform data on the next Image pixel, and for each Image representing a model like Body, Sword, Shield, etc, add a pixel with the index to that transform data. This setup not only minimizes the amount of bytes transferred to the GPU (all the animated models share the actor's transform), but the overhead of GPU state changes is reduced from O(n) to just O(1) thanks to the ImageTexture.Update(Image) method. GPU transfer time for 288,000 actors in my latest build takes 3ms, compared to 16ms for 144,000. Once the data is in these textures, accessing it is trivial in whatever shader you implement. Here's my implementation that relies on a super minimal transform consisting of a Vector3 position and a single float representing 2D rotation.
It's a small shader library that reads from various ImageTexture input and converts it to Transforms, frame data, anything. Note that this means that users aren't just limited to Transforms, Color, and CustomData anymore like in vanilla MultiMeshes - you can have any reasonable number of additional data to supply to a shader. Now, the resources for this feature are: |
Describe the project you are working on
I am developing a real time tactics game, currently capable of supporting up to 30,000 animated 3D soldiers fighting in 1,200 units. However, this was not possible using out-of-the box Godot functionality which leads to the next section
Describe the problem or limitation you are having in your project
The out-of-the-box Godot functionality to tackle this task of animating tens of thousands of 3D soldiers would initially be a MeshInstance3D and AnimationPlayer pair for every single soldier. However, this would result in massive CPU and GPU time spent both submitting draw calls and computing skeletal mesh animation. The next attempt would be to use MultiMeshInstance3D, but it lacks direct AnimationPlayer integration and can only render one mesh anyway. Thus, it's not possible to easily animate a large number of 3D meshes unless significant work is done.
Describe the feature / enhancement and how it helps to overcome the problem or limitation
The feature is based off of the GPU vertex animation tooling I've already implemented. It consists of three modules:
This suite of tools and scripts solve the following problems:
Here is a demonstration of the results of using these modules
https://www.youtube.com/watch?v=OTbQH3k0q6Q
For 5,000 critters, CPU Time is 2ms, and GPU Time is 10ms for a crowded screen
Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams
The above section gives an introduction to how the three modules work. Here are some code snippets or images to illustrate them in action
Here's an example of a resource definition
Arranging the transform and frame number for the CUSTOM_DATA
Uploading transform and custom data to multimesh, and limiting the number of visible instances
Shader for MultiMeshInstance3D
If this enhancement will not be used often, can it be worked around with a few lines of script?
This feature requires some trial and error to implement the first time, costing developers many weeks for a feature that is common in game requiring many animated 3D meshes
Is there a reason why this should be core and not an add-on in the asset library?
This provides a high requested feature as seen from these posts:
https://forum.godotengine.org/t/how-to-instance-animations/46857
https://godotforums.org/d/19323-anyone-have-luck-with-implementing-gpu-instancing
https://www.reddit.com/r/godot/comments/8d54yy/anyone_have_luck_with_implementing_gpu_instancing/
https://www.reddit.com/r/godot/comments/11d0iot/15000_zombies_rendered_in_godot_on_my_macbook/
The author of the last one actually managed to implement it, but hasn't shared implementation details yet, leaving curious developers in the dark. So this commonly requested feature, which is tooling to easily manage GPU vertex animations to animate tens of thousands of 3D meshes, doesn't exist at the moment. However, if it does, then developers can eliminate weeks of development time to leverage the freedom that comes with being able to animate huge numbers of 3D meshes.
This feature could also be a core feature because it relies entirely on existing Godot public API (basically, no internal calls to the engine's C++ code). This makes maintenance quite easy since nothing fancy is being done.
The text was updated successfully, but these errors were encountered: