Betsy: Add caching and BC1 compression support #95915

BlueCube3310 · 2024-08-21T18:46:37Z

Adds BC1 compression to Betsy, which greatly improves the image quality:

etcpak	betsy

In terms of import time, ~~this is currently slightly slower than etcpak due to creating a RenderingDevice for every compress operation (additional 100-400ms on a debug build). The compression itself takes about the same time.~~

Edit: After implementing caching, it's now ~1.75x faster than etcpak. Layered textures now also import significantly faster.

fire · 2024-08-21T19:54:43Z

I recommend we merge only when it is clearly faster like after we have a batching system for creating the compute

clayjohn · 2024-08-21T22:29:22Z

For the RD I would probably just make the local RenderingDevice a static variable, then initialize it the first time the function is called (will need a mutex for this). And then free it uninitialize_betsy_module

The alternative would be to try to run this on the main RenderingDevice. But then you have to ensure that the whole function is called on the rendering thread (and it won't work with the Compatibility renderer anyway)

BlueCube3310 · 2024-08-22T09:48:40Z

RenderingDevice, shader and pipeline caching is now implemented. Betsy is now ~1.75x faster than etcpak.

Layered textures (such as lightmaps) now compress even faster:
1024x1024x7 lightmap (with cache):

BlueCube3310 · 2024-08-22T11:12:29Z

Caching can now be toggled from the project settings

clayjohn · 2024-08-23T03:19:12Z

I think that the entire compression function might need to be behind a mutex now that I think about it. I think we will have issues if you try to compress multiple textures from multiple different threads as there is state that gets changed by submit and sync.

I'm going on holidays, but I would like @DarioSamo to chime in with his thoughts. As this will also be impacted by #90400.

I think the current code works because of the following

Textures are all compressed from the same calling thread (I think?)
If multiple threads call the function they each have their own device
RenderingDevice has its own mutex that is locked for both submit and sync.

2 is removed by this PR. 3 is removed by the ubershader PR. And 1 might not be true, and even if it is, it might eventually change as we optimize the import process and multi thread more parts of the engine.

I think we probably have 2 robust solutions:

Lazily allocate a mutex/Rd for each thread that calls this function. This would mean you could potentially understand up with N RDs if you have N threads.
Add a call queue to Betsy so that Betsy is essentially single threaded and just accepts tasks in a queue

clayjohn · 2024-09-11T19:29:32Z

I discussed this with @DarioSamo and we agreed that option 2 is probably the way to go. That means we should add a call queue and always call compress_betsy from a dedicated thread. We can use a setup similar to how we do multithreaded rendering for the servers (I.e. create a thread on the WorkerThreadPool).

I know that blows the complexity of this up. Do you want to give that a try, or would you prefer that I add that on top of what you have already done?

BlueCube3310 · 2024-09-11T22:01:30Z

I'm not too familiar with multithreading/synchronization yet, so I'd prefer if someone more experienced than me handled this.

The latest push only rebased to the master branch

clayjohn · 2024-09-12T00:53:27Z

Here is a little MRP that reproduces the threading issue I was worried about: multithread-compress.zip

When running this I get error spam and then I lose the Vulkan device and it crashes

clayjohn · 2024-09-12T21:05:15Z

Here is a commit adding a CommandQueue to Betsy clayjohn@74a1c85. What this does is route all commands to a dedicated thread. That way we can ensure that only one texture is being compressed at a time. I also took the chance to update the caching a bit (we now cache the sampler and the DXT1 buffer).

Unfortunately, there is a bug in master right now that causes a deadlock when exiting the engine with this commit #96931

Once https://github.com/godotengine/godot/issues/96931is resolved, we can add my commit on top of this PR and then merge everything at once.

Edit: #96959 fixes the issue!

clayjohn · 2024-09-13T17:29:35Z

I've pushed an additional commit with the CommandQueue implementation. I've tested it locally and it seems to work fine.

Once #96959 is merged, then this PR should be good as well

clayjohn · 2024-09-13T18:22:51Z

Taking a look at performance, it seems now that the most expensive parts of Betsy are:

The format conversion (for a 4k texture just going from RGB8 to RGBA8 250-300ms which is easily half of the total compression time)
Creating the temporary textures in the mipmap loop

For a 4k texture the timing is roughly as follows:

Format conversion (RGB8->RGBA8): 252 ms
Draw call CPU: 85ms   // (calculate offset and size, create textures, create uniform set, set push constant, dispatch)
Draw call GPU: 37ms
Copy back to CPU: 3ms

Since everything runs in lockstep, the total cost is 377ms (That is slightly wrong as I am only including the first mip level for the compression, but I include all mips for the format conversion).

I bet we can get the whole thing under 100ms. For simple RGB->RGBA conversion we should be able to handle them on the GPU (i.e. by reading the RGB format and then by writing out to RGBA). This would be almost free if we do it in the same dispatch as the compression. Then, for the per-draw call costs we have two choices:

We can overlap the GPU and the CPU work by delaying calling sync() and doing the readback until right before we do the next dispatch.
We can modify Betsy to compress all mip levels at once. So we only have to setup the GPU texture once, then compress all levels. This should allow the GPU to be fully saturated which will help with some of the overhead.

In either case we will only really benefit when using mipmaps. The single Mip textures won't be impacted. Given that the format conversion is the most significant part. I suggest we do that before even considering optimizing the loop

Also, all this is for a future PR. These potential optimizations shouldn't block this PR

BlueCube3310 · 2024-09-15T07:47:27Z

Thank you for the help! I've tested it locally and can confirm it works correctly

clayjohn · 2024-09-15T22:34:27Z

Note for other maintainers. This PR runs Betsy on a long-running background thread using the same technique as multithreaded physics/rendering (i.e. uses a dedicated thread, but yields to the WTP). This means that it will be a good test case for our WTP as multithreaded servers are currently all disabled by default. The flipside is that if we accidentally break the WTP, we have code running in production that will break. So big changes to WTP and MT are slightly more risky

akien-mga · 2024-09-16T11:51:31Z

Thanks!

BlueCube3310 requested review from a team as code owners August 21, 2024 18:46

BlueCube3310 force-pushed the betsy-bc1 branch 2 times, most recently from 4192f74 to 225cd4b Compare August 21, 2024 18:55

Calinou added enhancement topic:import performance labels Aug 21, 2024

Calinou added this to the 4.x milestone Aug 21, 2024

BlueCube3310 force-pushed the betsy-bc1 branch from 225cd4b to 6ecfabb Compare August 22, 2024 09:45

BlueCube3310 force-pushed the betsy-bc1 branch from 6ecfabb to e749ccf Compare August 22, 2024 09:49

BlueCube3310 changed the title ~~Betsy: Add BC1 compression support~~ Betsy: Add caching and BC1 compression support Aug 22, 2024

BlueCube3310 force-pushed the betsy-bc1 branch from e749ccf to 55838ca Compare August 22, 2024 11:11

BlueCube3310 requested review from a team as code owners August 22, 2024 11:11

clayjohn mentioned this pull request Aug 31, 2024

LightmapGI: Pack L1 SH coefficients for directional lightmaps #96114

Merged

3 tasks

BlueCube3310 mentioned this pull request Sep 7, 2024

[Experimental] Add Betsy GPU texture compressor #91150

Closed

11 tasks

Betsy: Add caching and BC1 compression support

606eedb

BlueCube3310 force-pushed the betsy-bc1 branch from 55838ca to 606eedb Compare September 11, 2024 22:00

Add CommandQueue to Betsy

74a1c85

clayjohn approved these changes Sep 15, 2024

View reviewed changes

clayjohn modified the milestones: 4.x, 4.4 Sep 15, 2024

akien-mga merged commit 67c9708 into godotengine:master Sep 16, 2024
20 checks passed

clayjohn mentioned this pull request Sep 16, 2024

WorkerThreadPool: Revamp interaction with ScriptServer #96959

Merged

This was referenced Nov 7, 2024

Initially dragging textures (PNG, JPG) from FileSystem into shader parameters lags #88060

Open

Dragging files (textures, ...) is extremely slow to their respective fields in the Inspector. #78471

Open

Calinou mentioned this pull request Nov 22, 2024

Betsy: Add BC3 and BC5 support #99537

Merged

BlueCube3310 deleted the betsy-bc1 branch December 1, 2024 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Betsy: Add caching and BC1 compression support #95915

Betsy: Add caching and BC1 compression support #95915

BlueCube3310 commented Aug 21, 2024 •

edited

Loading

fire commented Aug 21, 2024

clayjohn commented Aug 21, 2024

BlueCube3310 commented Aug 22, 2024 •

edited

Loading

BlueCube3310 commented Aug 22, 2024

clayjohn commented Aug 23, 2024

clayjohn commented Sep 11, 2024

BlueCube3310 commented Sep 11, 2024

clayjohn commented Sep 12, 2024 •

edited

Loading

clayjohn commented Sep 12, 2024 •

edited

Loading

clayjohn commented Sep 13, 2024

clayjohn commented Sep 13, 2024

BlueCube3310 commented Sep 15, 2024

clayjohn commented Sep 15, 2024

akien-mga commented Sep 16, 2024

Betsy: Add caching and BC1 compression support #95915

Betsy: Add caching and BC1 compression support #95915

Conversation

BlueCube3310 commented Aug 21, 2024 • edited Loading

fire commented Aug 21, 2024

clayjohn commented Aug 21, 2024

BlueCube3310 commented Aug 22, 2024 • edited Loading

BlueCube3310 commented Aug 22, 2024

clayjohn commented Aug 23, 2024

clayjohn commented Sep 11, 2024

BlueCube3310 commented Sep 11, 2024

clayjohn commented Sep 12, 2024 • edited Loading

clayjohn commented Sep 12, 2024 • edited Loading

clayjohn commented Sep 13, 2024

clayjohn commented Sep 13, 2024

BlueCube3310 commented Sep 15, 2024

clayjohn commented Sep 15, 2024

akien-mga commented Sep 16, 2024

BlueCube3310 commented Aug 21, 2024 •

edited

Loading

BlueCube3310 commented Aug 22, 2024 •

edited

Loading

clayjohn commented Sep 12, 2024 •

edited

Loading

clayjohn commented Sep 12, 2024 •

edited

Loading