Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spread direct lighting calculation for LightmapGI over several submissions #102257

Merged
merged 1 commit into from
Feb 3, 2025

Conversation

clayjohn
Copy link
Member

@clayjohn clayjohn commented Jan 31, 2025

Fixes: #101391
Fixes: #98180

The root of the problem here is TDR on Windows which is an OS level interrupt that intentionally crashes GPU contexts that it suspects are frozen. The TDR limit by default is very low on Windows, it can be adjusted (and many users choose to do so), but we don't expect that all users will dig around in the Windows registry in order to do so.

Prior to this PR, we calculate all direct lighting in a single command buffer by doing one compute dispatch per atlas slice. Contrast that to indirect lighting where we do one command buffer for each ray, for each region, for each slice). In 4.3 and earlier we didn't have any issues with this simplistic approach because direct lighting was very simple, it either cast one ray for hard shadows or rendering/lightmapping/bake_quality/medium_quality_ray_count (depending on the quality level) for soft shadows. With the introduction of shadow map antialsing in #95828, we multiply the shadow rays by 16, and now with transparency support in #99538 we also multiply the number of rays by rendering/lightmapping/bake_performance/max_transparency_rays.

All that is to say, we do a lot more work in the direct light compute invocations, so we need to spread the work out over multiple command buffers.

This PR does that simply by splitting the work up over regions like we do for indirect lighting so they can take advantage of rendering/lightmapping/bake_performance/region_size

@jcostello
Copy link
Contributor

The root of the problem here is TDR on Windows which is an OS level interrupt that intentionally crashes GPU contexts that it suspects are frozen

Does it fix it for linux?

@clayjohn
Copy link
Member Author

The root of the problem here is TDR on Windows which is an OS level interrupt that intentionally crashes GPU contexts that it suspects are frozen

Does it fix it for linux?

Possibly. I wasn't able to reproduce a crash on linux even on integrated graphics

@clayjohn clayjohn force-pushed the lightmapgi-crash branch 2 times, most recently from 26d9cbc to c85b399 Compare January 31, 2025 20:22
@SpockBauru
Copy link

With so many crashes on the same day I started to suspect on my device, so tested the hundred trees MRP on the Shadowmap PR (189c8eb). It bakes without issues.
Went back to 4.3 stable and set the directional light to static. Also baked without issues.

Now I tested the windows artifact (c85b399) and it does not crash \o/. But unfortunately, Shadowmaks are not working anymore.
Later, I tried to set Shadowmask off and set the DirectionalLight to static. Still no luck, seems like shadows does not work correctly.

Here is a comparison:

Shadowmask PR (189c8eb) and 4.3 (DirectionalLight "static") This PR (c85b399)
shadowmaks_PR This_PR_c85b399

@GustJc
Copy link
Contributor

GustJc commented Jan 31, 2025

But unfortunately, Shadowmaks are not working anymore.

I've noticed the same behavior. It doesn't crash anymore, but the shadows are weird. Not sure if it is related to this fix or not.

_shadow.mp4

@clayjohn
Copy link
Member Author

@SpockBauru thank you for testing and confirming that it no longer crashes for you. I have a theory about why shadowmask broke. I'll take a look using your MRP

@wojtekpil
Copy link
Contributor

Did also a test on relatively large scene. I did a bake with ultra quality, single directional light with angular distance = 0.5 Bake took almost 2 hours without issues. I can confirm no crashes on Windows on this PR for me!

…sions to avoid TDR on Windows devices

Also add percentage progress for direct lighting step
@clayjohn
Copy link
Member Author

clayjohn commented Feb 1, 2025

@SpockBauru @GustJc Shadows are back! Thanks for catching that. Please test again once the CI finishes building and let me know if you have any problems

@GustJc
Copy link
Contributor

GustJc commented Feb 1, 2025

@SpockBauru @GustJc Shadows are back! Thanks for catching that. Please test again once the CI finishes building and let me know if you have any problems

All working now! Great work!

image

@SpockBauru
Copy link

No crashes and now Shadowmask work! \o/

Tested the Forward+, Mobile and Compatibility renderers. No crashes and shadowmask now have a even higher quality thanks to the transparency, yay!

Made the size of the terrain lightmap 1.8x higher, no crashes and now the smadowmask shows even the brunches of the trees! This was not possible before! \o/

Baking times doubled, but this is expected due the high amount of transparent materials in this scene.

Shadowmasks increased by 1.8x with transparency!
Shadowmask_fixed_1 8x

Awesome work! No crashes at all! I believe that #101391 can be finally closed. Excelent!

Now the only remaining issues with this MRP are the tunnel regression in #102164 and I just noted the wrong colors with static DynamicLight in #102203, but these belongs to other PRs.

@ricky-daniel13
Copy link

This seems like a great idea. not using soft-shadows when baking made lightmapping way less crash prone on previous versions of 4.x, probably because of the more memory and time consumed for the more rays, so this seems to be the correct approach in making the lightmapper more reliable.

@jcostello
Copy link
Contributor

Crash at the begining of the bake is fixed. How ever in the backyard distro scene with 0.5 texel density or higher I get this error:

VRAM before the crash:

Usage: 4539MiB /  6144MiB
godot.linuxbsd.editor.dev.x86_64 :    4117MiB

Error:

[1] /lib/x86_64-linux-gnu/libc.so.6(+0x42990) [0x734006e42990] (??:0)
[2] RenderingDeviceDriverVulkan::command_queue_execute_and_present(RenderingDeviceDriver::CommandQueueID, VectorView<RenderingDeviceDriver::SemaphoreID>, VectorView<RenderingDeviceDriver::CommandBufferID>, VectorView<RenderingDeviceDriver::SemaphoreID>, RenderingDeviceDriver::FenceID, VectorView<RenderingDeviceDriver::SwapChainID>) (/home/juan/dev/godot/drivers/vulkan/rendering_device_driver_vulkan.cpp:2597 (discriminator 2))
[3] RenderingDevice::execute_chained_cmds(bool, RenderingDeviceDriver::FenceID, RenderingDeviceDriver::SemaphoreID) (/home/juan/dev/godot/servers/rendering/rendering_device.cpp:6459 (discriminator 3))
[4] RenderingDevice::_execute_frame(bool) (/home/juan/dev/godot/servers/rendering/rendering_device.cpp:6486)
[5] RenderingDevice::submit() (/home/juan/dev/godot/servers/rendering/rendering_device.cpp:6232)
[6] LightmapperRD::bake(Lightmapper::BakeQuality, bool, float, int, int, float, float, int, bool, bool, bool, Lightmapper::GenerateProbes, Ref<Image> const&, Basis const&, bool (*)(float, String const&, void*, bool), void*, float, float) (/home/juan/dev/godot/modules/lightmapper_rd/lightmapper_rd.cpp:1889)
[7] LightmapGI::bake(Node*, String, bool (*)(float, String const&, void*, bool), void*) (/home/juan/dev/godot/scene/3d/lightmap_gi.cpp:1269)
[8] LightmapGIEditorPlugin::_bake_select_file(String const&) (/home/juan/dev/godot/editor/plugins/lightmap_gi_editor_plugin.cpp:73 (discriminator 2))
[9] LightmapGIEditorPlugin::_bake() (/home/juan/dev/godot/editor/plugins/lightmap_gi_editor_plugin.cpp:129 (discriminator 2))
[10] void call_with_variant_args_helper<__UnexistingClass>(__UnexistingClass*, void (__UnexistingClass::*)(), Variant const**, Callable::CallError&, IndexSequence<>) (/home/juan/dev/godot/./core/variant/binder_common.h:320)
[11] void call_with_variant_args_dv<__UnexistingClass>(__UnexistingClass*, void (__UnexistingClass::*)(), Variant const**, int, Callable::CallError&, Vector<Variant> const&) (/home/juan/dev/godot/./core/variant/binder_common.h:463)
[12] MethodBindT<>::call(Object*, Variant const**, int, Callable::CallError&) const (/home/juan/dev/godot/./core/object/method_bind.h:345 (discriminator 1))
[13] Object::callp(StringName const&, Variant const**, int, Callable::CallError&) (/home/juan/dev/godot/core/object/object.cpp:849 (discriminator 1))
[14] Callable::callp(Variant const**, int, Variant&, Callable::CallError&) const (/home/juan/dev/godot/core/variant/callable.cpp:69 (discriminator 1))
[15] Object::emit_signalp(StringName const&, Variant const**, int) (/home/juan/dev/godot/core/object/object.cpp:1237)
[16] Node::emit_signalp(StringName const&, Variant const**, int) (/home/juan/dev/godot/scene/main/node.cpp:4021)
[17] Error Object::emit_signal<>(StringName const&) (/home/juan/dev/godot/./core/object/object.h:933)
[18] BaseButton::_pressed() (/home/juan/dev/godot/scene/gui/base_button.cpp:143)
[19] BaseButton::on_action_event(Ref<InputEvent>) (/home/juan/dev/godot/scene/gui/base_button.cpp:186)

@clayjohn
Copy link
Member Author

clayjohn commented Feb 1, 2025

Crash at the begining of the bake is fixed. How ever in the backyard distro scene with 0.5 texel density or higher I get this error:

VRAM before the crash:

Usage: 4539MiB /  6144MiB
godot.linuxbsd.editor.dev.x86_64 :    4117MiB

Error:

[1] /lib/x86_64-linux-gnu/libc.so.6(+0x42990) [0x734006e42990] (??:0)
[2] RenderingDeviceDriverVulkan::command_queue_execute_and_present(RenderingDeviceDriver::CommandQueueID, VectorView<RenderingDeviceDriver::SemaphoreID>, VectorView<RenderingDeviceDriver::CommandBufferID>, VectorView<RenderingDeviceDriver::SemaphoreID>, RenderingDeviceDriver::FenceID, VectorView<RenderingDeviceDriver::SwapChainID>) (/home/juan/dev/godot/drivers/vulkan/rendering_device_driver_vulkan.cpp:2597 (discriminator 2))
[3] RenderingDevice::execute_chained_cmds(bool, RenderingDeviceDriver::FenceID, RenderingDeviceDriver::SemaphoreID) (/home/juan/dev/godot/servers/rendering/rendering_device.cpp:6459 (discriminator 3))
[4] RenderingDevice::_execute_frame(bool) (/home/juan/dev/godot/servers/rendering/rendering_device.cpp:6486)
[5] RenderingDevice::submit() (/home/juan/dev/godot/servers/rendering/rendering_device.cpp:6232)
[6] LightmapperRD::bake(Lightmapper::BakeQuality, bool, float, int, int, float, float, int, bool, bool, bool, Lightmapper::GenerateProbes, Ref<Image> const&, Basis const&, bool (*)(float, String const&, void*, bool), void*, float, float) (/home/juan/dev/godot/modules/lightmapper_rd/lightmapper_rd.cpp:1889)
[7] LightmapGI::bake(Node*, String, bool (*)(float, String const&, void*, bool), void*) (/home/juan/dev/godot/scene/3d/lightmap_gi.cpp:1269)
[8] LightmapGIEditorPlugin::_bake_select_file(String const&) (/home/juan/dev/godot/editor/plugins/lightmap_gi_editor_plugin.cpp:73 (discriminator 2))
[9] LightmapGIEditorPlugin::_bake() (/home/juan/dev/godot/editor/plugins/lightmap_gi_editor_plugin.cpp:129 (discriminator 2))
[10] void call_with_variant_args_helper<__UnexistingClass>(__UnexistingClass*, void (__UnexistingClass::*)(), Variant const**, Callable::CallError&, IndexSequence<>) (/home/juan/dev/godot/./core/variant/binder_common.h:320)
[11] void call_with_variant_args_dv<__UnexistingClass>(__UnexistingClass*, void (__UnexistingClass::*)(), Variant const**, int, Callable::CallError&, Vector<Variant> const&) (/home/juan/dev/godot/./core/variant/binder_common.h:463)
[12] MethodBindT<>::call(Object*, Variant const**, int, Callable::CallError&) const (/home/juan/dev/godot/./core/object/method_bind.h:345 (discriminator 1))
[13] Object::callp(StringName const&, Variant const**, int, Callable::CallError&) (/home/juan/dev/godot/core/object/object.cpp:849 (discriminator 1))
[14] Callable::callp(Variant const**, int, Variant&, Callable::CallError&) const (/home/juan/dev/godot/core/variant/callable.cpp:69 (discriminator 1))
[15] Object::emit_signalp(StringName const&, Variant const**, int) (/home/juan/dev/godot/core/object/object.cpp:1237)
[16] Node::emit_signalp(StringName const&, Variant const**, int) (/home/juan/dev/godot/scene/main/node.cpp:4021)
[17] Error Object::emit_signal<>(StringName const&) (/home/juan/dev/godot/./core/object/object.h:933)
[18] BaseButton::_pressed() (/home/juan/dev/godot/scene/gui/base_button.cpp:143)
[19] BaseButton::on_action_event(Ref<InputEvent>) (/home/juan/dev/godot/scene/gui/base_button.cpp:186)

That is a crash in indirect lighting.

Can you try reducing rendering/lightmapping/bake_performance/max_rays_per_pass and rendering/lightmapping/bake_performance/region_size to see if they fix the crash?

@jcostello
Copy link
Contributor

That is a crash in indirect lighting.

Can you try reducing rendering/lightmapping/bake_performance/max_rays_per_pass and rendering/lightmapping/bake_performance/region_size to see if they fix the crash?

That fix the issue and let me bake. Should I create a new issue to prevent this to happen on the default settings?

@SpockBauru
Copy link

That fix the issue and let me bake. Should I create a new issue to prevent this to happen on the default settings?

As soon as this PR gets merged, I think it is better to just include in #102243 since these settings make baking times longer.

Copy link
Contributor

@BlueCube3310 BlueCube3310 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on Windows with the TPS demo, baking lightmaps no longer crashes

@Repiteo Repiteo merged commit 3d25533 into godotengine:master Feb 3, 2025
19 checks passed
@Repiteo
Copy link
Contributor

Repiteo commented Feb 3, 2025

Thanks!

@clayjohn clayjohn deleted the lightmapgi-crash branch February 3, 2025 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants