Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate varying count when compiling shaders #102792

Merged
merged 1 commit into from
Feb 14, 2025

Conversation

clayjohn
Copy link
Member

Fixes: #99587

This avoids crashing on devices when a number of varyings greater than the device limit is used.

NVidia devices were especially prone to this crash because they enforce the varying location that is assigned by the shader. Whereas AMD ignores the location and just counts up the total number of varyings. In the Forward+ scene shader we reserve 14 varyings, but we often use less than that. It is very rare that we use all 14. In the MRP, we only use 3. Therefore, on AMD devices, users are free to use up to 29 varyings, so they didn't crash with the MRP. Further, AMD devices only care about varyings that are actually used. If you define a varying and don't use it, it won't count towards the limit. With NVidia, it doesn't matter if the varying is used or not, it counts towards the limit and you can get a crash when you exceed the limit, even if you only use a couple in practice.

Accordingly, we need to be very restrictive and cut users off once they exceed the limit, whether or not the varyings are used. We also need to count all of our possible internal varyings against the limit, even if they are not used in practice.

Finally, the Vulkan spec states that built ins (like gl_FragCoord, gl_FrontFacing, etc.) count against the limit. However, in practice, that doesn't seem to be the case (although I have only tested on AMD). At some point we may need to also detect the use of built ins and count them against the limit. For now I think the current validation should be sufficient.

This PR accurately prints an error when compiling the shader and points to the offending varying. It does not throw an error inside the shader editor and highlight the line in the editor. I have an idea to fix this limitation and provide a nicer error experience (highlighting the line, popping up the error etc.), but it will require significant refactoring that isn't appropriate for a beta release.

Getting proper shader editor errors

Normally, shader compilation is kicked off in the RenderingServer. When the shader code is updated in the server it calls ShaderCompiler::compile() which calls ShaderLanguage::compile() to parse the shader. ShaderCompiler objects are created in the rendering server with all the information they need to compile. This includes information about how many varyings the internal shader template reserves for its own use.

The problem is that the shader editor doesn't actually compile the shader. It calls `ShaderLanguage::compile() to parse the shader and check for errors. Therefore, it doesn't actually have access to any of the information that we normally need to compile the shader like how many varyings the internal shader template reserves for its own use.

Therefore, when we check for errors in the shader editor, the parser assumes the number of reserved varyings is 0 and allows the user to allocate up to the device limit. With this PR, the shader will fail to compile and an error will be printed to the output, so we are safe, but the editor won't throw an error and point the user to the offending line.

The solution is to create a function in the RenderingServer that takes a shader type and returns the number of varyings reserved by the shader template. Unfortunately, due to the structure of the RD renderers, that information is spread all over the place and it will require refactoring to expose this function. Therefore, I'd like to wait until a subsequent release. I think waiting is fine since this is not an error that users will run into often. The most restrictive shader is the Forward+ scene shader which reserves 14 varyings, leaving 18 for users on most systems.

@clayjohn clayjohn added this to the 4.4 milestone Feb 13, 2025
@clayjohn clayjohn requested a review from a team as a code owner February 13, 2025 00:32
@clayjohn clayjohn force-pushed the varying-crash branch 2 times, most recently from 679d63b to 9ab6282 Compare February 13, 2025 01:35
Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally on macOS 15.2 with Metal and Vulkan (M4 Max), it works as expected. Code looks good to me.

SHADER ERROR: Too many varyings used in shader. 33 used. Maximum supported is 31
          at: (null) (:13)
ERROR: Shader compilation failed.
   at: set_code (servers/rendering/renderer_rd/forward_clustered/scene_shader_forward_clustered.cpp:152)

Prior to this PR, the material from the MRP of #99587 rendered correctly, but it wasn't portable across GPUs. Now, it displays as a fallback material since shader compilation fails.

Note that when using Compatibility, the error appears in the editor with a limit of 16 varyings (instead of 31):

Varyings

No such error is visible when using Forward+/Mobile, but you still get a shader compilation error in terminal output when the material is first rendered.

@clayjohn
Copy link
Member Author

Note that when using Compatibility, the error appears in the editor with a limit of 16 varyings (instead of 31):

That's because that is the max supported on the M4 max when using OpenGL!

Copy link
Contributor

@stuartcarnie stuartcarnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked what MoltenVK used, and apparently it is this section

CleanShot 2025-02-14 at 05 49 15@2x

https://github.com/KhronosGroup/MoltenVK/blob/1263da798724772a5f1fdf8b097f0a4311afc56c/MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm#L2892

and

https://github.com/KhronosGroup/MoltenVK/blob/1263da798724772a5f1fdf8b097f0a4311afc56c/MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm#L2902

Which is 124 for Apple4+ GPUs. Based on how you calculate it for Vulkan, 124 / 4 == 31. I then read the 5 footnote, which is:

A vector counts as n scalars, where n is the number of components in the vector. The iOS and tvOS feature sets only reach the maximum number of inputs if you don’t exceed the maximum number of
input components. For example, you can have 60 float inputs (components), but you can’t have 60 float4 inputs, which total 240 components.

Seems the safe bet is to assume 4 components and divide by 4, which I presume is what that term is for. Coincidentally, it still comes to 31 😊

@clayjohn
Copy link
Member Author

Seems the safe bet is to assume 4 components and divide by 4, which I presume is what that term is for. Coincidentally, it still comes to 31 😊

That's right. Plus we have the problem of using GLSL which uses defined locations (instead of just exposing registers like D3D and Metal) which assume basically a 4-component vector per location.

This avoids crashing on devices when a number of varyings greater than the device limit is used.

For now this accurately prints an error when compiling the shader, but the error text only pops up in the editor if the number of user varyings is above the limit.
@Repiteo Repiteo merged commit b607110 into godotengine:master Feb 14, 2025
20 checks passed
@Repiteo
Copy link
Contributor

Repiteo commented Feb 14, 2025

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Editor crash on project open with specific shader using varyings - NVIDIA GPU on Windows
4 participants